Skip to content

NumericIndexedVector: compressed aggregations & point-wise operations on sparse vectors#74193

Merged
rienath merged 22 commits intoClickHouse:masterfrom
FriendLey:numeric_index_vector
Jun 17, 2025
Merged

NumericIndexedVector: compressed aggregations & point-wise operations on sparse vectors#74193
rienath merged 22 commits intoClickHouse:masterfrom
FriendLey:numeric_index_vector

Conversation

@FriendLey
Copy link
Contributor

@FriendLey FriendLey commented Jan 5, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

NumericIndexedVector: new vector data-structure backed by bit-sliced, Roaring-bitmap compression, together with more than 20 functions for building, analysing and point-wise arithmetic. Can cut storage and speed up joins, filters and aggregations on sparse data. Implements #70582 and “Large-Scale Metric Computation in Online Controlled Experiment Platform” paper by T. Xiong and Y. Wang from VLDB 2024

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@CLAassistant
Copy link

CLAassistant commented Jan 5, 2025

CLA assistant check
All committers have signed the CLA.

@hanfei1991 hanfei1991 added pr-feature Pull request with new product feature can be tested Allows running workflows for external contributors labels Jan 5, 2025
@robot-ch-test-poll2
Copy link
Contributor

robot-ch-test-poll2 commented Jan 5, 2025

This is an automated comment for commit 44daf19 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here❌ failure
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report❌ failure
Successful checks
Check nameDescriptionStatus
Docs checkBuilds and tests the documentation✅ success

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Feb 7, 2025

Workflow [PR], commit [285a187]

@FriendLey FriendLey force-pushed the numeric_index_vector branch from 57838ae to 8cfe289 Compare February 7, 2025 09:56
@FriendLey FriendLey marked this pull request as ready for review February 7, 2025 09:57
@FriendLey FriendLey force-pushed the numeric_index_vector branch 4 times, most recently from 52c4f88 to dfbe3e9 Compare February 11, 2025 07:43
@ucasfl ucasfl requested a review from rschu1ze February 11, 2025 12:22
@rschu1ze rschu1ze assigned rschu1ze and rienath and unassigned rschu1ze Feb 11, 2025
@rschu1ze
Copy link
Member

Asked @rienath for a review.

@FriendLey FriendLey force-pushed the numeric_index_vector branch from 45de950 to 9ba9178 Compare February 17, 2025 07:18
@rienath
Copy link
Member

rienath commented Feb 26, 2025

@FriendLey thanks for the PR! I am currently reading the paper that you linked in the issue. It might take some time to review this, but be sure – I'm on it!

@FriendLey
Copy link
Contributor Author

@FriendLey thanks for the PR! I am currently reading the paper that you linked in the issue. It might take some time to review this, but be sure – I'm on it!

Thanks for the update! I appreciate you taking the time to review it.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Apr 1, 2025

Dear @rienath, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@rienath rienath self-assigned this Apr 1, 2025
Copy link
Member

@rienath rienath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great PR! Impressive work overall, and the performance improvements highlighted in the paper are especially exciting!

I’ve reviewed the code thoroughly. Most suggestions focus on improving consistency, clarity, maintainability, and alignment with existing conventions in the codebase. The core logic looks solid, and it's clear a lot of thought went into the implementation.

A few higher-level points to consider:

  • Performance tests: Since your paper includes benchmarks, it would be fantastic to have those integrated under tests/performance as well. This would help us to quantify the speed ups and track the speed going forward. You can refer to this guide for details.

  • Documentation: The headers could benefit from additional comments or brief design notes. The PR is quite complicated and the logic isn't always immediately obvious. That will make it easier for others to understand and build on this work.

  • CI: Looks like a few checks are failing. Could you take a look when you have a moment?

@rienath
Copy link
Member

rienath commented Apr 16, 2025

Hi @FriendLey @ucasfl! Just a quick check-in, this PR is still on our radar. Are you planning to continue working on it? Let us know if you need any help!

@ucasfl
Copy link
Collaborator

ucasfl commented Apr 16, 2025

Hi @FriendLey @ucasfl! Just a quick check-in, this PR is still on our radar. Are you planning to continue working on it? Let us know if you need any help!

Yes, @FriendLey is working on it.

@FriendLey FriendLey force-pushed the numeric_index_vector branch 3 times, most recently from 6509696 to a19deda Compare May 17, 2025 03:26
@FriendLey FriendLey force-pushed the numeric_index_vector branch from 15d6cd5 to 09f4e93 Compare May 19, 2025 11:52
@rienath
Copy link
Member

rienath commented May 19, 2025

Thanks for all the updates so far! Could you please avoid rebasing or force-pushing this branch once review has started? When you rewrite commits mid-review, my comments go stale and I have to review everything again. I really appreciate your help in keeping the review efficient @FriendLey

@FriendLey
Copy link
Contributor Author

@rienath I have processed all the new comments. Please review them again when you are free.

@FriendLey FriendLey requested a review from rienath June 10, 2025 12:54
@FriendLey FriendLey requested a review from rienath June 12, 2025 02:30
@rienath
Copy link
Member

rienath commented Jun 13, 2025

@FriendLey could you have a look at this converstation?

@FriendLey
Copy link
Contributor Author

@FriendLey could you have a look at this converstation?

@rienath I have corrected the description in the document, please review it when you are free.

@rienath
Copy link
Member

rienath commented Jun 16, 2025

@FriendLey there is a failing test in CI (03458_numeric_indexed_vector_operations_bit_promote). It looks like the result of numericIndexedVectorToMap(numericIndexedVectorPointwiseDivide(vec_1, 2)) may be undeterministic as the test produces correct result in other runs

@FriendLey
Copy link
Contributor Author

@FriendLey there is a failing test in CI (03458_numeric_indexed_vector_operations_bit_promote). It looks like the result of numericIndexedVectorToMap(numericIndexedVectorPointwiseDivide(vec_1, 2)) may be undeterministic as the test produces correct result in other runs

I am working on this. I guess it is because fraction_bit_num is lost when transferring data from the replica node to the coordinator node in cloud scenario.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Jun 17, 2025

Workflow [PR], commit [1cb079c]

Summary:

job_name test_name status info comment
Stateless tests (amd_tsan, 3/3) failure
01507_clickhouse_server_start_with_embedded_config FAIL
Stateless tests (amd_tsan, s3 storage, 1/3) failure
02922_server_exit_code FAIL

@FriendLey
Copy link
Contributor Author

@rienath The failed test has been fixed, please review this pr when you are free.

@rienath
Copy link
Member

rienath commented Jun 17, 2025

@FriendLey thanks. Indeed, the test is green now. For my interest: how did rhs.isEmpty() check in pointwiseAddInplace(...) fix the issue with the numericIndexedVectorPointwiseDivide(...) function, which does not call the former

@FriendLey
Copy link
Contributor Author

@FriendLey thanks. Indeed, the test is green now. For my interest: how did rhs.isEmpty() check in pointwiseAddInplace(...) fix the issue with the numericIndexedVectorPointwiseDivide(...) function, which does not call the former

@rienath The reason for the test failure is not the division operation, but the process of reading data in the ParallelReplicas scenario. In ParallelReplicas, the original data will be read by multiple replicas and then merged at the coordinator node.

In the default constructor of BSINumericIndexedVector, integer_bit_num and fraction_bit_num are set to 40 and 24 respectively when ValueType is Float64.

When read with ParallelReplicas, assume there are 2 replicas. If the result of one replica is empty, the data transmitted to the coordinator node is:

d1: empty with integer_bit_num = 40 and fraction_bit_num = 24
d2: non-empty, integer_bit_num = 32, fraction_bit_num = 0 (set via parameters)

The problem lies in the order of merge. Because the bit position has promote operation (promoteBitPrecisionInplace
) before merge(actually call pointwiseAddInplace), the result vector fraction_bit_num of d1.merge(d2) and d2.merge(d1) will be different:

d1.merge(d2): fraction_bit_num = 0
d2.merge(d1): fraction_bit_num = 24 (old version)

Different fraction_bit_num results in different results of subsequent operations.

@rienath
Copy link
Member

rienath commented Jun 17, 2025

Stateless tests (amd_tsan, 3/3) — Failed: 1, Passed: 2627, Skipped: 56

  • 01507_clickhouse_server_start_with_embedded_config

Stateless tests (amd_tsan, s3 storage, 1/3) — Failed: 1, Passed: 2590, Skipped: 123

  • 02922_server_exit_code

Both fail for reasons not related to this PR

@FriendLey thank you for all the work!

@rienath rienath enabled auto-merge June 17, 2025 13:41
@rienath rienath added this pull request to the merge queue Jun 17, 2025
Merged via the queue into ClickHouse:master with commit 8b78e12 Jun 17, 2025
118 of 123 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 17, 2025
@rienath rienath changed the title add logic of numeric index vector, implemented by bsi and bitmap. --issue=#70582 NumericIndexedVector: compressed aggregations & point-wise operations on sparse vectors Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants