Skip to content

fix threads count for inserts#88339

Merged
CheSema merged 3 commits intomasterfrom
chesema-fix-inserts-threads
Oct 13, 2025
Merged

fix threads count for inserts#88339
CheSema merged 3 commits intomasterfrom
chesema-fix-inserts-threads

Conversation

@CheSema
Copy link
Member

@CheSema CheSema commented Oct 10, 2025

It is bad for performance. We should use max_threads for all inserts.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Oct 10, 2025

Workflow [PR], commit [b5ca921]

Summary:

job_name test_name status info comment
Performance Comparison (amd_release, master_head, 2/3) error

@CheSema CheSema changed the title fix set threads fix threads count for inserts Oct 10, 2025
@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 10, 2025
QueryPipeline pipeline = QueryPipeline(std::move(chain));

pipeline.setNumThreads(max_insert_threads);
pipeline.setNumThreads(max_threads);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this change make the max_insert_threads setting not work as expected? Or is the reasoning that this is not a INSERT SELECT so we should not use the max_insert_threads setting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets define what max_insert_threads does first.

max_insert_threads makes pipeline more concurrent, that it, no more. It has really bad naming. The threads count should be defined by max_threads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line has been introduced by me and by mistake. The existed tests cover only insert-select queries, so I missed it. In short future I will remove/rename max_insert_threads setting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I see that it is not in the insert-select code path. Thanks for explaining.

create table testX (A Int64) engine=MergeTree order by tuple();

create materialized view testXA engine=MergeTree order by tuple() as select sleep(0.1) from testX;
create materialized view testXB engine=MergeTree order by tuple() as select sleep(0.2), throwIf(A=1) from testX;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of throwIf(A=1)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would fail with exception when A = 1.
Not strictly necessary here, But I prefer when different mv behave differently in tests cases.

@george-larionov george-larionov self-assigned this Oct 10, 2025
Copy link
Member

@george-larionov george-larionov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CheSema CheSema added this pull request to the merge queue Oct 13, 2025
@CheSema CheSema added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Oct 13, 2025
Merged via the queue into master with commit b072887 Oct 13, 2025
121 of 123 checks passed
@CheSema CheSema deleted the chesema-fix-inserts-threads branch October 13, 2025 10:47
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 13, 2025
robot-ch-test-poll4 added a commit that referenced this pull request Oct 13, 2025
Cherry pick #88339 to 25.7: fix threads count for inserts
robot-ch-test-poll4 added a commit that referenced this pull request Oct 13, 2025
Cherry pick #88339 to 25.8: fix threads count for inserts
robot-ch-test-poll4 added a commit that referenced this pull request Oct 13, 2025
Cherry pick #88339 to 25.9: fix threads count for inserts
@CheSema CheSema added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Oct 13, 2025
clickhouse-gh bot added a commit that referenced this pull request Oct 13, 2025
Backport #88339 to 25.7: fix threads count for inserts
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Oct 13, 2025
CheSema added a commit that referenced this pull request Oct 13, 2025
Backport #88339 to 25.8: fix threads count for inserts
CheSema added a commit that referenced this pull request Oct 13, 2025
Backport #88339 to 25.9: fix threads count for inserts
@azat
Copy link
Member

azat commented Oct 15, 2025

@CheSema is it possible that after this change pushing to MVs can be reordered? Looks like this is why test_async_load_databases/test.py::test_materialized_views_cascaded_multiple started to fail (#88572)

QueryPipeline pipeline = QueryPipeline(std::move(chain));

pipeline.setNumThreads(max_insert_threads);
pipeline.setNumThreads(max_threads);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CheSema Should we respect parallel_view_processing as well and use max_threads only if parallel_view_processing=1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should not, parallel_view_processing just turning max_insert_threads into 1 when materilized views are involved. parallel_view_processing was added to avoid some bugs with deduplication which are fixed now.

@CheSema
Copy link
Member Author

CheSema commented Oct 17, 2025

Sema Checherinda is it possible that after this change pushing to MVs can be reordered? Looks like this is why test_async_load_databases/test.py::test_materialized_views_cascaded_multiple started to fail (#88572)

It would be more likely, but in general there is no guaranty about order even with 1 thread.

antaljanosbenjamin added a commit that referenced this pull request Nov 7, 2025
- decrease max threads from 10 to 7
- increase the number of rows from 50 to 200

Both of these changes should increase the chance that the query can utilize all threads.

Furthermore, we also accept if the query utilizes one less thread. It is still more than `max_insert_threads`, so it still effectively checks that `max_insert_threads` doesn't constrain the number of threads. See #88339 (comment) for more information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants