bloomfilter: use libdivide to compute the bit location#79800
bloomfilter: use libdivide to compute the bit location#79800alexey-milovidov merged 4 commits intoClickHouse:masterfrom
Conversation
|
Thanks, looks good! Can we provide an isolated performance test? See |
|
@alexey-milovidov what would you like to see in a new test? There's already |
|
Performance tests didn't detect a statistically significant change. But in one run, there was 5.9% speed-up for |
7b47a42
bloomfilter: use libdivide to compute the bit location
bloomfilter: use libdivide to compute the bit location
This PR improves bloom filter performance.
The existing code computes the word and bit index using a straight-forward
div. This is a problem because 1. the word location is a necessary dependency for the rest of the loop body, so this stalls the pipeline and 2.divs are very slow, obviously.Thankfully, ClickHouse already has
libdivide, so converting this to amulusing the reciprocal value is easy. While doing this, also removed onemulin the body and turned it into anadd.Lastly, ensured that word_size is marked as a constant, so the compiler can always emit shifts and masks for the word_size divisions and modulo ops.
All of this together makes the primary bottleneck in the BloomFilter functions CityHash, as it should be.
We have inspected this as carefully as possible to ensure mathematical equivalence and have tested it on existing data with no issues.
In our very large production cluster, we saw index time per byte in merges drop by ~45%:
Query:
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):