Skip to content

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform #88090

Merged
shankar-iyer merged 6 commits intoClickHouse:masterfrom
shankar-iyer:fix_is_deleted_with_filter
Oct 9, 2025
Merged

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform #88090
shankar-iyer merged 6 commits intoClickHouse:masterfrom
shankar-iyer:fix_is_deleted_with_filter

Conversation

@shankar-iyer
Copy link
Member

@shankar-iyer shankar-iyer commented Oct 4, 2025

Resolves #65180 and #69164.

Currently, all ranges in a query with FINAL clause on a ReplacingMergeTree table with is_deleted column have to be processed by FINAL merge transform.

Add a FilterTransform correspdoning to is_deleted = 0 for parts & ranges thus enabling parallel processing and skip FINAL Merge transform. The parts & ranges that qualify are 1) Partitions with only a single part when do_not_merge_across_partitions_select_final is True 2) Non-intersecting ranges identified by the existing split ranges into intersecting/non-intersecting optimization.

Performance example : Link

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a [user-readable short description]

  • SELECT query with FINAL clause on.a ReplacingMergeTree table with the is_deleted column now executes faster because of improved parallelization from 2 existing optimizations : 1) do_not_merge_across_partitions_select_final optimization for partitions of the table that have only a single part 2) Split other selected ranges of the table into intersecting / non-intersecting and only intersecting ranges have to pass through FINAL merging transform.

@shankar-iyer shankar-iyer marked this pull request as draft October 4, 2025 02:51
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Oct 4, 2025

Workflow [PR], commit [de19bf5]

Summary:

job_name test_name status info comment
Integration tests (amd_asan, old analyzer, 3/6) failure
test_ytsaurus/test_tables.py::test_ytsaurus_select_subset_of_columns FAIL
Integration tests (amd_asan, old analyzer, 4/6) failure
test_insert_into_distributed/test.py::test_inserts_single_replica_local_internal_replication FAIL
test_insert_into_distributed/test.py::test_inserts_single_replica_internal_replication FAIL
test_insert_into_distributed/test.py::test_inserts_single_replica_no_internal_replication FAIL
test_insert_into_distributed/test.py::test_prefer_localhost_replica FAIL
test_insert_into_distributed/test.py::test_table_function FAIL
Integration tests (amd_binary, 4/5) failure
test_ytsaurus/test_tables.py::test_ytsaurus_select_subset_of_columns FAIL
Integration tests (amd_tsan, 3/6) failure
test_ytsaurus/test_tables.py::test_ytsaurus_select_subset_of_columns FAIL

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Oct 4, 2025
@shankar-iyer shankar-iyer marked this pull request as ready for review October 5, 2025 23:33
@SmitaRKulkarni SmitaRKulkarni self-assigned this Oct 6, 2025
@@ -0,0 +1,72 @@
-- Test for https://github.com/ClickHouse/ClickHouse/pull/76978
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shankar-iyer Maybe makes sense to add performance test? What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will do in a follow on PR. Test replacing_final_non_intersecting.xml looks to be a good reference.

Copy link
Member

@SmitaRKulkarni SmitaRKulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest all LGTM

@shankar-iyer
Copy link
Member Author

Previous CI run was 100% green! But latest CI run has some integration test failures, verified that they are unrelated to the PR (test_ytsaurus and test_insert_into_distributed) : #88259

@shankar-iyer shankar-iyer enabled auto-merge October 9, 2025 05:04
@shankar-iyer shankar-iyer added this pull request to the merge queue Oct 9, 2025
Merged via the queue into ClickHouse:master with commit b58d5e0 Oct 9, 2025
118 of 123 checks passed
@shankar-iyer shankar-iyer deleted the fix_is_deleted_with_filter branch October 9, 2025 06:35
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 9, 2025
mkmkme pushed a commit to Altinity/ClickHouse that referenced this pull request Nov 6, 2025
…with_filter

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Nov 7, 2025
25.3.8 Backport of ClickHouse#88090 - Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Dec 8, 2025
…with_filter

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Dec 15, 2025
…with_filter

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Dec 16, 2025
25.8.12 Backport of ClickHouse#88090: Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Dec 23, 2025
25.8.12 Backport of ClickHouse#88090: Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Dec 24, 2025
25.8.13 Backport of ClickHouse#88090: Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Jan 15, 2026
…with_filter

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Jan 22, 2026
…with_filter

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Jan 28, 2026
25.8.15 Backport of ClickHouse#88090: Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Feb 5, 2026
…with_filter

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Feb 10, 2026
24.8.14 Backport of ClickHouse#88090: Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Mar 17, 2026
…with_filter

Optimize ReplacingMergeTree is_deleted FINAL queries by adding a filter expression transform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FINAL on table with is_deleted works 10-30 times slower than expected

5 participants