bloomfilter: use libdivide to compute the bit location by dkratunov · Pull Request #79800 · ClickHouse/ClickHouse

dkratunov · 2025-05-02T19:24:31Z

This PR improves bloom filter performance.

The existing code computes the word and bit index using a straight-forward div. This is a problem because 1. the word location is a necessary dependency for the rest of the loop body, so this stalls the pipeline and 2. divs are very slow, obviously.

Thankfully, ClickHouse already has libdivide, so converting this to a mul using the reciprocal value is easy. While doing this, also removed one mul in the body and turned it into an add.

Lastly, ensured that word_size is marked as a constant, so the compiler can always emit shifts and masks for the word_size divisions and modulo ops.

All of this together makes the primary bottleneck in the BloomFilter functions CityHash, as it should be.
We have inspected this as carefully as possible to ensure mathematical equivalence and have tested it on existing data with no issues.

In our very large production cluster, we saw index time per byte in merges drop by ~45%:

Query:

SELECT
  toStartOfInterval (event_time, interval 5 minute) time,
  sum(
    ProfileEvents[
      'MergeTreeDataWriterSkipIndicesCalculationMicroseconds'
    ]
  ) / sum(size_in_bytes) index_us_per_byte
FROM
  clusterAllReplicas('{cluster}', system.part_log)
WHERE
  table = 'main'
  and event_date > now() - interval 14 day
  and $__timeFilter(event_time)
  and event_type = 'MergeParts'
  and size_in_bytes > 0
group by
  1
order by
  1 asc

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Performance improvements to all bloom filter types.

CLAassistant · 2025-05-02T19:24:38Z

All committers have signed the CLA.

clickhouse-gh · 2025-05-03T00:06:16Z

Workflow [PR], commit [2b1cd06]

alexey-milovidov · 2025-05-03T17:48:30Z

Thanks, looks good! Can we provide an isolated performance test? See tests/performance.

dkratunov · 2025-05-05T18:12:34Z

@alexey-milovidov what would you like to see in a new test? There's already tests/performance/bloom_filter_{insert,select}.xml which I think should show the effect.

alexey-milovidov · 2025-05-16T12:43:31Z

Performance tests didn't detect a statistically significant change. But in one run, there was 5.9% speed-up for bloom_filter_insert: https://clickhouse-builds.s3.amazonaws.com/PRs/79800/2b1cd06bf623af194426fe42bb6c5a4d90d59bad/performance_comparison_amd_release_master_head_3_3/report.html and in another, 2.6% slow-down on bloom_filter_select: https://clickhouse-builds.s3.amazonaws.com/PRs/79800/2b1cd06bf623af194426fe42bb6c5a4d90d59bad/performance_comparison_amd_release_master_head_2_3/report.html

bloomfilter: use libdivide to compute the bit location

Backports ClickHouse#79800

bloomfilter: use libdivide to compute the bit location

575ca49

alexey-milovidov added the can be tested Allows running workflows for external contributors label May 3, 2025

clickhouse-gh bot added the pr-performance Pull request with some performance improvements label May 3, 2025

alexey-milovidov added 2 commits May 3, 2025 19:44

Update BloomFilter.cpp

f698a0d

Update BloomFilter.cpp

06872fb

alexey-milovidov self-assigned this May 3, 2025

Merge branch 'master' into faster-blooms

2b1cd06

alexey-milovidov added this pull request to the merge queue May 16, 2025

Merged via the queue into ClickHouse:master with commit 7b47a42 May 16, 2025
118 of 120 checks passed

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 16, 2025

KochetovNicolai added the pr-must-backport-cloud label Jun 2, 2025

robot-ch-test-poll added the pr-backports-created-cloud deprecated label, NOOP label Jun 4, 2025

robot-clickhouse added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Jul 2, 2025

zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Oct 3, 2025

Merge pull request ClickHouse#79800 from dkratunov/faster-blooms

b0e0633

bloomfilter: use libdivide to compute the bit location

zvonand mentioned this pull request Oct 3, 2025

24.8 Backport of 79800: bloomfilter: use libdivide to compute the bit location Altinity/ClickHouse#1062

Merged

13 tasks

zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Oct 6, 2025

Merge pull request ClickHouse#79800 from dkratunov/faster-blooms

90dd658

bloomfilter: use libdivide to compute the bit location

zvonand mentioned this pull request Oct 16, 2025

24.8.14 Stable Pre-release PR Altinity/ClickHouse#1085

Merged

joelynch added a commit to aiven/ClickHouse that referenced this pull request Oct 23, 2025

bloomfilter: use libdivide to compute the bit location

49f3fbe

Backports ClickHouse#79800

joelynch added a commit to aiven/ClickHouse that referenced this pull request Oct 24, 2025

bloomfilter: use libdivide to compute the bit location

b953380

Backports ClickHouse#79800

Khatskevich pushed a commit to aiven/ClickHouse that referenced this pull request Nov 12, 2025

bloomfilter: use libdivide to compute the bit location

5f36fce

Backports ClickHouse#79800

Khatskevich pushed a commit to aiven/ClickHouse that referenced this pull request Nov 30, 2025

bloomfilter: use libdivide to compute the bit location

ae9a333

Backports ClickHouse#79800

Khatskevich pushed a commit to aiven/ClickHouse that referenced this pull request Mar 3, 2026

bloomfilter: use libdivide to compute the bit location

9f8f1bb

Backports ClickHouse#79800

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bloomfilter: use libdivide to compute the bit location#79800

bloomfilter: use libdivide to compute the bit location#79800
alexey-milovidov merged 4 commits intoClickHouse:masterfrom
dkratunov:faster-blooms

dkratunov commented May 2, 2025 •

edited

Loading

Uh oh!

CLAassistant commented May 2, 2025 •

edited

Loading

Uh oh!

clickhouse-gh bot commented May 3, 2025 •

edited

Loading

Uh oh!

alexey-milovidov commented May 3, 2025

Uh oh!

dkratunov commented May 5, 2025

Uh oh!

alexey-milovidov commented May 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

dkratunov commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

CLAassistant commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clickhouse-gh bot commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov commented May 3, 2025

Uh oh!

dkratunov commented May 5, 2025

Uh oh!

alexey-milovidov commented May 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dkratunov commented May 2, 2025 •

edited

Loading

CLAassistant commented May 2, 2025 •

edited

Loading

clickhouse-gh bot commented May 3, 2025 •

edited

Loading