Skip to content

Support for Iceberg partition pruning bucket transform#79262

Merged
divanik merged 25 commits intomasterfrom
divanik/addBucketPartitionTransform
May 12, 2025
Merged

Support for Iceberg partition pruning bucket transform#79262
divanik merged 25 commits intomasterfrom
divanik/addBucketPartitionTransform

Conversation

@divanik
Copy link
Member

@divanik divanik commented Apr 16, 2025

Changelog category:

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add icebergHash and icebergBucketTransform functions. Support data files pruning in Iceberg tables partitioned with bucket transfom.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@divanik divanik requested a review from Copilot April 16, 2025 13:40
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Apr 16, 2025

Workflow [PR], commit [33b24dd]

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Apr 16, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 7 changed files in this pull request and generated 1 comment.

Files not reviewed (4)
  • src/Functions/icebergBucketTransform.cpp: Language not supported
  • src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.cpp: Language not supported
  • tests/queries/0_stateless/03411_iceberg_bucket.reference: Language not supported
  • tests/queries/0_stateless/03411_iceberg_bucket.sql: Language not supported

@divanik
Copy link
Member Author

divanik commented Apr 16, 2025

This PR was Partially inspired by:
https://github.com/apache/iceberg/blob/6e8718113c08aebf76d8e79a9e2534c89c73407a/api/src/test/java/org/apache/iceberg/transforms/TestBucketing.java
https://github.com/apache/iceberg/blob/6e8718113c08aebf76d8e79a9e2534c89c73407a/api/src/main/java/org/apache/iceberg/util/BucketUtil.java

It was really difficult to understand how the functions should be implemented without the reference implementation (not all the aspects are clear from specification)

Please, pay attention to both the reference implementation as well as the iceberg specification. Feel free to ask any questions if you notice any inconsistencies of our implementation with a reference one

@hanfei1991 hanfei1991 self-assigned this Apr 22, 2025
@divanik divanik self-assigned this May 7, 2025
@divanik divanik removed their assignment May 7, 2025
Copy link
Member Author

@divanik divanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments

@divanik
Copy link
Member Author

divanik commented May 12, 2025

Performance comparison check can't be restarted because of some issues, but it seems unrelated to my changes (and is not supposed to block MC,afaik)

@divanik divanik added this pull request to the merge queue May 12, 2025
Merged via the queue into master with commit 498a84a May 12, 2025
117 of 122 checks passed
@divanik divanik deleted the divanik/addBucketPartitionTransform branch May 12, 2025 15:36
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label May 12, 2025
ianton-ru pushed a commit to Altinity/ClickHouse that referenced this pull request May 19, 2025
…PartitionTransform

Support for Iceberg partition pruning bucket transform
ianton-ru pushed a commit to Altinity/ClickHouse that referenced this pull request Jun 4, 2025
…PartitionTransform

Support for Iceberg partition pruning bucket transform
Enmk added a commit to Altinity/ClickHouse that referenced this pull request Jun 5, 2025
25.3 Antalya - Iceberg: Backport of ClickHouse#79262 - Support for Iceberg partition pruning bucket transform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants