Skip to content

Fix "While sending batch" error message is too long (on Distributed async send)#41813

Merged
vdimir merged 1 commit intoClickHouse:masterfrom
zhongyuankai:fix_dist_msg
Oct 6, 2022
Merged

Fix "While sending batch" error message is too long (on Distributed async send)#41813
vdimir merged 1 commit intoClickHouse:masterfrom
zhongyuankai:fix_dist_msg

Conversation

@zhongyuankai
Copy link
Contributor

@zhongyuankai zhongyuankai commented Sep 27, 2022

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

When the batch sending fails for some reason, it cannot be automatically recovered, and if it is not processed in time, it will lead to accumulation, and the printed error message will become longer and longer, which will cause the http thread to block.

@zhongyuankai
Copy link
Contributor Author

image

@robot-ch-test-poll robot-ch-test-poll added the pr-bugfix Pull request with bugfix, not backported by default label Sep 27, 2022
@vdimir vdimir added the can be tested Allows running workflows for external contributors label Sep 27, 2022
@vdimir vdimir self-assigned this Sep 27, 2022
@azat
Copy link
Member

azat commented Jan 4, 2023

The real problem was #23856, that does not respect file_indices, see #44907

azat added a commit to azat/ClickHouse that referenced this pull request Feb 3, 2023
…tch")

There was an error from the begginning that does not respect
file_indices, and iterate only over file_index_to_path, while this is
not correct, since there can be less files then in file_index_to_path,
and this is what file_indices for.

Note, that only an error message was wrong, logic was fine. You can
verify this by the logs:

    2022.12.07 11:55:50.951976 [ 39217 ] {} <Debug> default.dist.DirectoryMonitor: Sending a batch of 10 files to localhost:9000 (128.42 thousand rows, 36.32 MiB bytes).
    2022.12.07 11:55:50.953762 [ 39217 ] {} <Error> default.dist.DirectoryMonitor: Code: 516. DB::Exception: Received from localhost:9000. DB::Exception: Interserver authentication failed. Stack trace:
    ...
    : While sending batch, nums: 62, files: /work6/clickhouse/data/default/dist/shard1_replica1/66827258.bin

As you can see "Sending a batch of 10 files" but "nums: 62"

Fixes: ClickHouse#23856
Refs: ClickHouse#41813
Signed-off-by: Azat Khuzhin <[email protected]>
@zhongyuankai zhongyuankai deleted the fix_dist_msg branch December 22, 2023 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants