Rework parallel replicas settings#63151
Conversation
|
This is an automated comment for commit 7b19c65 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
|
Currently, there are four modes for parallel replicas:
Confusingly, the "sample key" mode is available even without We should introduce a new setting "parallel_replicas_mode" which will decide between these modes. |
|
@alexey-milovidov Parallel replicas modes 1, 3, and 4 are currently working on top of distributed table. While task-based parallel replicas work on top of MergeTree. We should unify this as well. |
nikitamikhaylov
left a comment
There was a problem hiding this comment.
Let's make the parallel replicas with custom key work on top of MergeTree first and introduce the setting parallel_replicas_mode: #63521
Unifying settings and implementing custom key on top of MergeTree are 2 different things and can be done independently |
81445ce to
e73981e
Compare
877a77c to
03e0a27
Compare
a2b1420 to
bd581df
Compare
b635db2 to
1532779
Compare
275fab3 to
cf9f64e
Compare
|
Useful finding: |
d64f139 to
b72088e
Compare
999f2ed to
328db3d
Compare
328db3d to
0d730b9
Compare
e449909 to
7b19c65
Compare
| {"join_to_sort_minimum_perkey_rows", 0, 40, "The lower limit of per-key average rows in the right table to determine whether to rerange the right table by key in left or inner join. This setting ensures that the optimization is not applied for sparse table keys"}, | ||
| {"join_to_sort_maximum_table_rows", 0, 10000, "The maximum number of rows in the right table to determine whether to rerange the right table by key in left or inner join"}, | ||
| {"allow_experimental_join_right_table_sorting", false, false, "If it is set to true, and the conditions of `join_to_sort_minimum_perkey_rows` and `join_to_sort_maximum_table_rows` are met, rerange the right table by key to improve the performance in left or inner hash join"}, | ||
| {"mongodb_throw_on_unsupported_query", false, true, "New setting."}, |
There was a problem hiding this comment.
Why do we have different values for a new setting?
| bool is_plain_merge_tree = storage && storage->isMergeTree() && !storage->supportsReplication(); | ||
| if (is_plain_merge_tree && settings[Setting::allow_experimental_parallel_reading_from_replicas] > 0 | ||
| && !settings[Setting::parallel_replicas_for_non_replicated_merge_tree]) | ||
| && !settings[Setting::allow_experimental_parallel_reading_from_replicas]) |
There was a problem hiding this comment.
Is this a typo? This AND condition will always be false. Or this is fine?
…-beta Rework parallel replicas settings
…-beta Rework parallel replicas settings
Fix if condition in #63151
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Reworked settings that control the behavior of parallel replicas algorithms. A quick recap: ClickHouse has four different algorithms for parallel reading involving multiple replicas, which is reflected in the setting
parallel_replicas_mode, the default value for it isread_tasksAdditionally, the toggle-switch setting
enable_parallel_replicashas been added.This PR is backward-incompatible for the following parallel replicas mode:
Which means that this feature cannot be used correctly in a mixed-versioned cluster.
This closes: #63521
Documentation entry for user-facing changes
CI Settings (Only check the boxes if you know what you are doing):