Skip to content

Add blockSerializedSize() function (size on disk without compression)#8952

Merged
alexey-milovidov merged 2 commits intoClickHouse:masterfrom
azat:getSerializedSize
Feb 3, 2020
Merged

Add blockSerializedSize() function (size on disk without compression)#8952
alexey-milovidov merged 2 commits intoClickHouse:masterfrom
azat:getSerializedSize

Conversation

@azat
Copy link
Member

@azat azat commented Feb 1, 2020

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add blockSerializedSize() function (size on disk without compression)

Detailed description / Documentation draft:
Sometimes it is useful to know how much does this data will take on
disk, with blockSerializedSize() you can know this (although without
compression).

This can be a major knowledge for various aggregation functions that
tracking some state (i.e. uniqCombined).

P.S. not sure that this is the best naming though, hence any suggestions are welcome! At first I was thinking about getSizeOf(), but this can be confused with size in memory, while they are differs

@azat azat force-pushed the getSerializedSize branch from f0a04e8 to 2dd6c35 Compare February 1, 2020 19:46
@alexey-milovidov
Copy link
Member

This function is very specific. It works on blocks rather than individual values that is very confusing.

Maybe name it blockSerializedSize that will resemble blockSize? And we can allow multiple agruments.

@azat azat force-pushed the getSerializedSize branch from 2dd6c35 to e1a07ab Compare February 2, 2020 04:41
@azat
Copy link
Member Author

azat commented Feb 2, 2020

This function is very specific. It works on blocks rather than individual values that is very confusing.

Indeed, nice catch, thanks!

Maybe name it blockSerializedSize that will resemble blockSize? And we can allow multiple agruments.

Done

@azat azat changed the title Add getSerializedSize() function (size on disk without compression) Add blockSerializedSize() function (size on disk without compression) Feb 2, 2020
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Feb 2, 2020

Ok. Almost ready.
The usage of MemoryWriteBuffer is (1) overkill, (2) incorrect.

  1. We should invent NullWriteBuffer that will do nothing on nextImpl method (next will just move cursor to the front and overwrite data). The size of buffer can be smaller, example: 16K for better cache locality.
  2. You use the offset method that gives the offset inside current buffer. But what we need is count method that returns total number of written bytes.

@azat azat force-pushed the getSerializedSize branch 2 times, most recently from c5bddeb to de5766b Compare February 3, 2020 08:25
azat added 2 commits February 3, 2020 11:26
Sometimes it is useful to know how much does this data will take on
disk, with blockSerializedSize() you can know this (although without
compression).

This can be a major knowledge for various aggregation functions that
tracking some state (i.e. uniqCombined).
@azat azat force-pushed the getSerializedSize branch from de5766b to e89ceae Compare February 3, 2020 08:26
@azat
Copy link
Member Author

azat commented Feb 3, 2020

The usage of MemoryWriteBuffer is (1) overkill, (2) incorrect.

Agree, done

@alexey-milovidov alexey-milovidov merged commit e24926d into ClickHouse:master Feb 3, 2020
@azat azat deleted the getSerializedSize branch February 3, 2020 15:02
@tavplubix tavplubix added the pr-feature Pull request with new product feature label Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants