S3Queue ordered mode with hive partitioning#81040
S3Queue ordered mode with hive partitioning#81040kssenii merged 55 commits intoClickHouse:masterfrom
Conversation
db2db26 to
2f7d37e
Compare
|
Integration tests for aarch64 interrupted by timeout, looks like some infrastructure-related issue. |
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Show resolved
Hide resolved
|
Workflow [PR], commit [92edb60] Summary: ❌
|
9471804 to
bdce2c4
Compare
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorageQueue/ObjectStorageQueueOrderedFileMetadata.cpp
Outdated
Show resolved
Hide resolved
|
Failed fast test |
|
Yes, looks unrelated, but it blocks other tests from starting so at the moment no other tests were actually run and will not be, so could you please push some empty commit to restart the checks? |
| @pytest.mark.parametrize("hosts", [1, 2]) | ||
| @pytest.mark.parametrize("processing_threads_num", [1, 16]) | ||
| @pytest.mark.parametrize("buckets", [1, 4]) | ||
| @pytest.mark.parametrize("engine_name", ["S3Queue", | ||
| "AzureQueue", | ||
| ]) |
There was a problem hiding this comment.
Could you please split this test into separate ones? Flaky check fails because a single test runs for too long https://s3.amazonaws.com/clickhouse-test-reports/PRs/81040/27bf90da21b8557c55b73a9ae98cdfc01707b626//integration_tests_amd_asan_flaky/job.log
Also please avoid force-push from now on, as I've fixed private synchronization conflicts and a force push will reset my fixes.
This reverts commit 8a7eb4d.
|
Need to wait after each data portion. |
|
Found a bug in test, now must be faster. |
|
Failed tests are looked unrelated |
fefb9e1
| if (code.value() == Coordination::Error::ZBADVERSION && retry_count < retry_limit) | ||
| { | ||
| ++retry_count; | ||
| LOG_INFO(log, "Keeper Bad Version error, other node wrote something, retry {}", retry_count); |
There was a problem hiding this comment.
@ianton-ru, hi, could you please clarify why do we need this check? As we use persistent nodes for processing/buckets nodes, this error should not happen, otherwise it would mean duplicated data which must not happen and this retry could just hide some bugs
There was a problem hiding this comment.
Scenario with multiple nodes:
- two hive partition,
part1with filefile11andpart2with filefile21. - records in Keeper have version 1
- node1 processes files
file11andfile21 - meanwhile new file
file12is added inpart1 - node2 processes file
file12 - node1 complete processing, starts to create
recordslist. Extract versions for parts, both have version 1 - node2 complete processing, starts to create
recordslist, withversion1too - node2 commits
recordswithout errors, record in Keeper forpart1gets version 2 - node1 tries to commit
recordsas single transaction, gets errorBad Versionforpart1
Without hive partitions node1 does not need to write something after that, because node2 wrote more actual information about last processing file.
But with hive partitions node1 still need to write last processing file forpart2.
There was a problem hiding this comment.
So you mean node1 was processing file /part1/file11 concurrently with node2 processing file /part1/file12?
But if file11 and file12 share the same processed_node_path, then they are in the same bucket, and a single bucket can only be processed by a single processor, not allowing any concurrent processing (persistent bucket lock must maintain this invariant). Then the situation you describe should not happen.
There was a problem hiding this comment.
Hmm. Looks like ClickHouse locks buckets only when number of buckets is greater than 1 (when useBucketsForProcessing returns true).
Is configuration with multiple nodes and single bucket valid?
There was a problem hiding this comment.
Is configuration with multiple nodes and single bucket valid?
Not valid as it can lead to inconsistent processing (in some failure scenarios we can end up with files which will never be processed) and retries will not work properly. Currently if processing_threads_num is >1 (it will almost always be so as we take default from the number of cpu cores) and buckets is disabled, then buckets are anyway enforced automatically as well. Moreover, in cloud we do not allow to create tables without bucket-based processing:
ClickHouse/src/Storages/ObjectStorageQueue/ObjectStorageQueueMetadata.cpp
Lines 509 to 515 in 07a29cf
There was a problem hiding this comment.
There was a problem hiding this comment.
Oh. I use "buckets=1" in tests, getBucketsNum returns 1, because buckets is not zero.
There was a problem hiding this comment.
It's a gray zone. useBucketsForProcessing checks that buckets_num>1.
There was a problem hiding this comment.
useBucketsForProcessing checks that buckets_num>1
Yes, but ObjectStorageQueueMetadata sets buckets_num as table_metadata->getBucketsNum, which returns what I've sent in the above message https://github.com/ClickHouse/clickhouse-private/blob/d442af4b207906e5c3600e35eb82742a1d0f2d97/src/Storages/ObjectStorageQueue/ObjectStorageQueueTableMetadata.h#L102-L106
I use "buckets=1" in tests,
In this case yes, it will not work, this is an omission which needs to be fixed. But by default buckets = 0, so if user does not touch this setting and only changes processing_threads_num, then buckets must be correctly set to the value of processing_threads_num
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Tracking hive partitioning for ordered mode in S3Queue. Resolves #71161
Documentation entry for user-facing changes
Creates record in Keeper with last successfully processed file for every hive partition. Do not modify bucket mechanism, logic changed only inside bucket, so separate synchronization is not required.