Skip to content

Fix error message for a broken distributed batches ("While sending batch")#44907

Merged
serxa merged 1 commit intoClickHouse:masterfrom
azat:dist/async-INSERT-error-message-fix
Feb 7, 2023
Merged

Fix error message for a broken distributed batches ("While sending batch")#44907
serxa merged 1 commit intoClickHouse:masterfrom
azat:dist/async-INSERT-error-message-fix

Conversation

@azat
Copy link
Member

@azat azat commented Jan 4, 2023

There was an error from the begginning that does not respect file_indices, and iterate only over file_index_to_path, while this is not correct, since there can be less files then in file_index_to_path, and this is what file_indices for.

Note, that only an error message was wrong, logic was fine. You can verify this by the logs:

2022.12.07 11:55:50.951976 [ 39217 ] {} <Debug> default.dist.DirectoryMonitor: Sending a batch of 10 files to localhost:9000 (128.42 thousand rows, 36.32 MiB bytes).
2022.12.07 11:55:50.953762 [ 39217 ] {} <Error> default.dist.DirectoryMonitor: Code: 516. DB::Exception: Received from localhost:9000. DB::Exception: Interserver authentication failed. Stack trace:
...
: While sending batch, nums: 62, files: /work6/clickhouse/data/default/dist/shard1_replica1/66827258.bin

As you can see "Sending a batch of 10 files" but "nums: 62"

Fixes: #23856
Refs: #41813

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@azat
Copy link
Member Author

azat commented Jan 4, 2023

Closed in favor of #44922

@azat azat closed this Jan 4, 2023
azat added a commit to azat/ClickHouse that referenced this pull request Jan 5, 2023
In ClickHouse#43406 metrics was broken for a clean start, since they where not
initialized from disk, but metrics for broken files was never
initialized from disk.

Fix this and rework how DirectoryMonitor works with file system:
- do not iterate over directory before each send, do this only once on
  init, after the map of files will be updated from the INSERT
- call fs::create_directories() from the ctor for "broken" folder to
  avoid excessive calls
- cache "broken" paths

This patch also fixes possible issue when current_batch can be processed
multiple times (second time will be an exception), since if there is
existing current_batch.txt after processing it you should remove it
instantly.

Plus this patch implicitly fixes issues with logging, that logs
incorrect number of files in case of error (see ClickHouse#44907 for details).

Signed-off-by: Azat Khuzhin <[email protected]>
@azat azat deleted the dist/async-INSERT-error-message-fix branch February 3, 2023 12:53
@azat azat restored the dist/async-INSERT-error-message-fix branch February 3, 2023 12:53
@azat azat reopened this Feb 3, 2023
…tch")

There was an error from the begginning that does not respect
file_indices, and iterate only over file_index_to_path, while this is
not correct, since there can be less files then in file_index_to_path,
and this is what file_indices for.

Note, that only an error message was wrong, logic was fine. You can
verify this by the logs:

    2022.12.07 11:55:50.951976 [ 39217 ] {} <Debug> default.dist.DirectoryMonitor: Sending a batch of 10 files to localhost:9000 (128.42 thousand rows, 36.32 MiB bytes).
    2022.12.07 11:55:50.953762 [ 39217 ] {} <Error> default.dist.DirectoryMonitor: Code: 516. DB::Exception: Received from localhost:9000. DB::Exception: Interserver authentication failed. Stack trace:
    ...
    : While sending batch, nums: 62, files: /work6/clickhouse/data/default/dist/shard1_replica1/66827258.bin

As you can see "Sending a batch of 10 files" but "nums: 62"

Fixes: ClickHouse#23856
Refs: ClickHouse#41813
Signed-off-by: Azat Khuzhin <[email protected]>
@azat azat force-pushed the dist/async-INSERT-error-message-fix branch from 80eccb0 to a196f99 Compare February 3, 2023 12:54
@azat
Copy link
Member Author

azat commented Feb 3, 2023

Since #44922 had been reverted, reopening this.

@azat azat marked this pull request as ready for review February 3, 2023 12:55
@serxa serxa self-assigned this Feb 6, 2023
@serxa
Copy link
Member

serxa commented Feb 7, 2023

All failed tests are unrelated

@serxa serxa merged commit 950c04c into ClickHouse:master Feb 7, 2023
@azat azat deleted the dist/async-INSERT-error-message-fix branch February 7, 2023 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants