Distributed execution: better split tasks#87508
Merged
scanhex12 merged 42 commits intoClickHouse:masterfrom Nov 14, 2025
Merged
Distributed execution: better split tasks#87508scanhex12 merged 42 commits intoClickHouse:masterfrom
scanhex12 merged 42 commits intoClickHouse:masterfrom
Conversation
Contributor
|
Workflow [PR], commit [e0d5e8d] Summary: ❌
|
Member
|
It should be implemented in the following way, not related to row groups or data lakes:
The coordinator knows file sizes and, therefore, knows how many buckets each file will need. Then distributes the work for each bucket in each file. We can say "virtual buckets" to better understand the concept, because the actual split of the file and the way how it is split is abstracted. |
d200cb8 to
f5f4240
Compare
Avogar
requested changes
Sep 30, 2025
a3d7980 to
fe3b556
Compare
divanik
reviewed
Oct 9, 2025
tests/integration/test_storage_iceberg_with_spark/test_cluster_table_function.py
Show resolved
Hide resolved
src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp
Outdated
Show resolved
Hide resolved
kssenii
reviewed
Oct 10, 2025
Avogar
reviewed
Oct 29, 2025
Avogar
reviewed
Oct 29, 2025
Member
Avogar
left a comment
There was a problem hiding this comment.
Just a few small comments and I am ready to approve
Avogar
approved these changes
Oct 29, 2025
Member
Avogar
left a comment
There was a problem hiding this comment.
Great work! Just some very minor final suggestions
Member
Author
|
Latest times measurements on integration test with parquet v3 (first time is bucket-level splitting): |
Removed unused ErrorCodes namespace from IInputFormat.cpp
Merged
via the queue into
ClickHouse:master
with commit Nov 14, 2025
4bed2ad
123 of 130 checks passed
zvonand
pushed a commit
to Altinity/ClickHouse
that referenced
this pull request
Dec 16, 2025
…ion_better_spread Distributed execution: better split tasks
25 tasks
zvonand
pushed a commit
to Altinity/ClickHouse
that referenced
this pull request
Dec 17, 2025
…ion_better_spread Distributed execution: better split tasks
zvonand
pushed a commit
to Altinity/ClickHouse
that referenced
this pull request
Dec 19, 2025
…ion_better_spread Distributed execution: better split tasks
zvonand
added a commit
to Altinity/ClickHouse
that referenced
this pull request
Dec 22, 2025
Antalya 25.8.12 Backport of ClickHouse#87508: Distributed execution: better split tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Distributed execution: better split tasks by row groups IDs, not by files.
Documentation entry for user-facing changes