Allow to attach partition from table with different partition expression when destination partition expression doesn't re-partition#39507
Conversation
|
Current implementation doesn't cover attaching to non partitioned table because It looks like adding something like the below in the Plus, when issuing Attaching to non partitioned table was requested in this issue comment. UPDATE: UPDATE: This feature will not be included in this PR. UPDATE: This feature is now supported :). |
| if (part_desc->src_table_part->info.partition_id != part_desc->new_part_info.partition_id) | ||
| { | ||
| auto src_part = part_desc->src_table_part; | ||
| const auto & src_storage = src_part->storage; | ||
|
|
||
| auto metadata_manager = std::make_shared<PartMetadataManagerOrdinary>(src_part.get()); | ||
| IMergeTreeDataPart::MinMaxIndex min_max_index; | ||
|
|
||
| min_max_index.load(src_part->storage, metadata_manager); | ||
|
|
||
| MergeTreePartition new_partition; | ||
|
|
||
| new_partition.create(metadata_snapshot, min_max_index.getBlock(src_storage), 0u, getContext()); | ||
|
|
||
| /// This will generate unique name in scope of current server process. | ||
| Int64 temp_index = insert_increment.get(); | ||
|
|
||
| auto partition_id = new_partition.getID(*this); | ||
|
|
||
| MergeTreePartInfo dst_part_info(partition_id, temp_index, temp_index, src_part->info.level); | ||
|
|
||
| auto [res_part, temporary_part_lock] = cloneAndLoadPartOnSameDiskWithDifferentPartitionKey( | ||
| src_part, | ||
| TMP_PREFIX + "clone_", | ||
| part_desc->new_part_info, | ||
| metadata_snapshot, | ||
| new_partition, | ||
| min_max_index, | ||
| clone_params, | ||
| getContext()->getReadSettings(), | ||
| getContext()->getWriteSettings()); | ||
|
|
||
| part_desc->res_part = std::move(res_part); | ||
| part_desc->temporary_part_lock = std::move(temporary_part_lock); | ||
| } | ||
| else | ||
| { | ||
| auto [res_part, temporary_part_lock] = cloneAndLoadDataPartOnSameDisk( | ||
| part_desc->src_table_part, | ||
| TMP_PREFIX + "clone_", | ||
| part_desc->new_part_info, | ||
| metadata_snapshot, | ||
| clone_params, | ||
| getContext()->getReadSettings(), | ||
| getContext()->getWriteSettings()); | ||
|
|
||
| part_desc->res_part = std::move(res_part); | ||
| part_desc->temporary_part_lock = std::move(temporary_part_lock); | ||
| } |
There was a problem hiding this comment.
It would be nice to avoid copy-paste (looks like it's exactly the same for MergeTree and ReplicatedMergeTree)
There was a problem hiding this comment.
Where do you suggest I put this code? Two new overloads for MergeTreeData::cloneAndLoadDataPartOnSameDisk and MergeTreeData::cloneAndLoadPartOnSameDiskWithDifferentPartitionKey or inside MergeTreeDataPartCloner. I think I would go with the former
There was a problem hiding this comment.
I don't have a strong opinion on this
There was a problem hiding this comment.
I have introduced a method in the base class that encapsulates the logic that preceeds cloneAndLoadPartOnSameDiskWithDifferentPartitionKey. This change removed some duplication, but not everything.
The decision of which method to call cloneAndLoadDataPartOnSameDisk or cloneAndLoadPartOnSameDiskWithDifferentPartitionKey still happens on caller site. The main reason for this was to avoid having a boolean parameter that would control internal behavior.
If you consider it a must, I can implement it.
| if (is_partition_exp_different) | ||
| { | ||
| auto metadata_manager = std::make_shared<PartMetadataManagerOrdinary>(src_part.get()); | ||
| IMergeTreeDataPart::MinMaxIndex min_max_index; | ||
|
|
||
| min_max_index.load(src_data, metadata_manager); | ||
|
|
||
| MergeTreePartition new_partition; | ||
|
|
||
| new_partition.create(metadata_snapshot, min_max_index.getBlock(src_data), 0u, getContext()); | ||
|
|
||
| partition_id = new_partition.getID(*this); | ||
|
|
||
| MergeTreePartInfo dst_part_info(partition_id, index, index, src_part->info.level); | ||
|
|
||
| auto [dst_part, part_lock] = cloneAndLoadPartOnSameDiskWithDifferentPartitionKey( | ||
| src_part, | ||
| TMP_PREFIX, | ||
| dst_part_info, | ||
| metadata_snapshot, | ||
| new_partition, | ||
| min_max_index, | ||
| clone_params, | ||
| query_context->getReadSettings(), | ||
| query_context->getWriteSettings()); | ||
|
|
||
| dst_parts.emplace_back(dst_part); | ||
| dst_parts_locks.emplace_back(std::move(part_lock)); | ||
| } | ||
| else | ||
| { | ||
| MergeTreePartInfo dst_part_info(partition_id, index, index, src_part->info.level); | ||
|
|
||
| auto [dst_part, part_lock] = cloneAndLoadDataPartOnSameDisk( | ||
| src_part, | ||
| TMP_PREFIX, | ||
| dst_part_info, | ||
| metadata_snapshot, | ||
| clone_params, | ||
| query_context->getReadSettings(), | ||
| query_context->getWriteSettings()); | ||
|
|
||
| dst_parts.emplace_back(dst_part); | ||
| dst_parts_locks.emplace_back(std::move(part_lock)); | ||
| } |
| def create_table(node, table_name, replicated): | ||
| replica = node.name | ||
| engine = ( | ||
| f"ReplicatedMergeTree('/clickhouse/tables/1/{table_name}', '{replica}')" | ||
| if replicated | ||
| else "MergeTree()" |
There was a problem hiding this comment.
JFYI It's easy to write functional tests with multiple replicas, you can create them like:
create table t1 ... engine=ReplicatedMergeTree('/clickhouse/tables/{database}/{table_name}', '1');
create table t2 ... engine=ReplicatedMergeTree('/clickhouse/tables/{database}/{table_name}', '2');
There was a problem hiding this comment.
I did introduce some stateless tests for replicated merge tree: https://github.com/ClickHouse/ClickHouse/pull/39507/files#diff-2285549b3c6a98c7292c0dd01646ea60e35a6af06683f52ee9a57144ca0af100R313.
I don't quite remember the reasoning behind these integration, but I would guess at the time I was trying to validate it also worked on different servers / clusters?
|
@tavplubix I have updated the In order to avoid performing this task twice, I refactored the code so |
|
Stateless tests (release, analyzer) - the new test has failed |
…h to decide if it is a different partition exp
|
@tavplubix hi. Can you merge it? |
|
Integration tests (tsan) [4/6] - #59082 |
|
@arthurpassos contrats! |
Algorithm is divided into 2 steps:
Partition compatibility check
In order for 2 partition expressions to be considered compatible, they must meet one of the following:
Monotonicity check does not work for functions with more than 1 variable argument. Therefore, it is a must that all expressions in the partition expression contain at most 1 variable argument.
Clone
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Allow partitions from tables with different partition expressions to be attached when the destination table partition expression doesn't re-partition/ split the part.
Closes #13826