Skip to content

Trivial count optimization for iceberg#78090

Merged
alesapin merged 6 commits intomasterfrom
trivial_count_for_iceberg
Mar 24, 2025
Merged

Trivial count optimization for iceberg#78090
alesapin merged 6 commits intomasterfrom
trivial_count_for_iceberg

Conversation

@alesapin
Copy link
Member

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implement trivial count optimization for Iceberg. Now queries with count() and without any filters should be faster. Closes #77639.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@alesapin
Copy link
Member Author

Example with my debug build in eu while Iceberg table is stored in us-west-2:

alesapin-workstation :) set optimize_trivial_count_query=0                                                                                                                                                                                                    
                                                                                                                                                                                                                                                              
SET optimize_trivial_count_query = 0                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                              
Query id: 0a170f59-66ef-4805-9a1e-ad5241de2bd4                                                                                                                                                                                                                
                                                                                                                                                                                                                                                              
Ok.                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                              
0 rows in set. Elapsed: 0.001 sec.                                                                                                                                                                                                                            

alesapin-workstation :) select count() from test.`iceberg-benchmark.hitsiceberg`                                                                                                                                                                              
                                                                                                                                                                                                                                                              
SELECT count()                                                                                                                                                                                                                                                
FROM test.`iceberg-benchmark.hitsiceberg`                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                              
Query id: ac327a15-953f-4d79-b1a9-d42b8f3c9e12                                                                                                                                                                                                                
                                                                                                                                                                                                                                                              
   ┌──count()─┐                                                                                                                                                                                                                                               
1. │ 99997497 │ -- 100.00 million                                                                                                                                                                                                                             
   └──────────┘                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                              
1 row in set. Elapsed: 7.964 sec. Processed 100.00 million rows, 10.01 GB (12.56 million rows/s., 1.26 GB/s.)                                                                                                                                                 
Peak memory usage: 38.78 MiB.                                                                                                                                                                                                                                 

alesapin-workstation :) set optimize_trivial_count_query=1      

SET optimize_trivial_count_query = 1

Query id: 9b079198-188d-47a1-98f7-edaa71c5a25d

Ok.

0 rows in set. Elapsed: 0.001 sec. 

alesapin-workstation :) select count() from test.`iceberg-benchmark.hitsiceberg`

SELECT count()
FROM test.`iceberg-benchmark.hitsiceberg`

Query id: 4a5e6911-ed94-48bb-8b00-829614ce22df

   ┌──count()─┐
1. │ 99997497 │ -- 100.00 million
   └──────────┘

1 row in set. Elapsed: 1.839 sec. 

Still I think something is not right and it should be faster, but it's not because of trivial count optimization.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 21, 2025

Workflow [PR], commit [3932a27]

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Mar 21, 2025
@nikitamikhaylov nikitamikhaylov self-assigned this Mar 22, 2025
@alesapin alesapin added this pull request to the merge queue Mar 24, 2025
Merged via the queue into master with commit 1fc46c1 Mar 24, 2025
120 of 123 checks passed
@alesapin alesapin deleted the trivial_count_for_iceberg branch March 24, 2025 16:18
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 24, 2025
ianton-ru pushed a commit to Altinity/ClickHouse that referenced this pull request Apr 11, 2025
…_iceberg

Trivial count optimization for iceberg
ianton-ru pushed a commit to Altinity/ClickHouse that referenced this pull request May 23, 2025
…_iceberg

Trivial count optimization for iceberg
{
result += *column_info.bytes_size;
found = true;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to obtain the bytes_size only from one column here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Trivial Count Optimization for Iceberg

5 participants