Adaptive mark_segment_size for parallel replicas#68424
Conversation
|
This is an automated comment for commit 6357750 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
| /// Here we take max of two numbers: | ||
| /// * (min_marks_per_task * threads) = the number of marks we request from the coordinator each time - there is no point to have segments smaller than one unit of work for a replica | ||
| /// * (sum_marks / number_of_replicas^2) - we use consistent hashing for work distribution (including work stealing). If we have a really slow replica | ||
| /// everything up to (1/number_of_replicas) portion of its work will be stolen by other replicas. And it owns (1/number_of_replicas) share of total number of marks. |
There was a problem hiding this comment.
Shouldn't it be like this?
| /// everything up to (1/number_of_replicas) portion of its work will be stolen by other replicas. And it owns (1/number_of_replicas) share of total number of marks. | |
| /// everything up to ((number_of_replicas-1)/number_of_replicas) portion of its work will be stolen by other replicas. And it owns (1/number_of_replicas) share of total number of marks. |
There was a problem hiding this comment.
yeah, i should prob say 'except' instead of 'up to'
|
|
|
waiting for #69602 to merge with master |
|
|
||
| static constexpr auto DBMS_PARALLEL_REPLICAS_PROTOCOL_VERSION = 3; | ||
| static constexpr auto DBMS_MIN_SUPPORTED_PARALLEL_REPLICAS_PROTOCOL_VERSION = 3; | ||
| static constexpr auto DBMS_PARALLEL_REPLICAS_PROTOCOL_VERSION = 4; |
There was a problem hiding this comment.
It looks like the previous approach - using only CH client protocol version, was the right one. Every time, we need to change the PR protocol, it's necessary to change the client protocol version anyway.
It's because of the case when first request is sent by a version with a bigger protocol number than the recipient.
Let's consider 2 replicas communication with PR protocol v2 and v1 (v2 > v1, v1 >= DBMS_MIN_SUPPORTED_PARALLEL_REPLICAS_PROTOCOL_VERSION).
In this case, v2 replica needs to communicate with v1 replicas by v1 protocol. But v2 replicas can understand it only from client protocol version of v1 replica. This means that every time we change PR protocol, we have to increase CH client protocol version to cover this case. Which, in turn, makes PR protocol version unnecessary
By using purely PR protocol, we'd have to introduce a handshake phase, during which peers agree which protocol version to use - which is an additional RTT, i.e. waste of time/network.
There was a problem hiding this comment.
Sorry, miss that the parallel replicas protocol version within this PR is part of client protocol now. So, we're getting it during client protocol handshake, - so, the PR protocol version of remote peer is known at first PR request to the peer.
Fix parallel replicas protocol after #68424
…_replicas Adaptive mark_segment_size for parallel replicas
…_replicas Adaptive mark_segment_size for parallel replicas
…_replicas Adaptive mark_segment_size for parallel replicas
…_replicas Adaptive mark_segment_size for parallel replicas
Fix parallel replicas protocol after ClickHouse#68424
…_replicas Adaptive mark_segment_size for parallel replicas
Fix parallel replicas protocol after ClickHouse#68424
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
New algorithm to determine the unit of marks distribution between replicas by consistent hash. Different numbers of marks chosen for different read patterns to improve performance.
The essence of the changes:
CI Settings (Only check the boxes if you know what you are doing):