Skip to content

Fix leaving possibly broken distributed connections in the pool#93171

Closed
azat wants to merge 1 commit intoClickHouse:masterfrom
azat:fix-broken-dist-connections-v3
Closed

Fix leaving possibly broken distributed connections in the pool#93171
azat wants to merge 1 commit intoClickHouse:masterfrom
azat:fix-broken-dist-connections-v3

Conversation

@azat
Copy link
Member

@azat azat commented Dec 29, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix leaving broken distributed connections in the pool

Previous fix (#92807) introduce another issue, due to which the connection for distributed queries can be preserved in the pool even after some exception (usually it is socket timeout), which leads to a reconnect in the next query, or even query failure.

The problem was that RemoteQueryExecutor uses finished flag to decide should it call finish() again in RemoteSource, but this finished flag is also used in dtor of RemoteQueryExecutor, and if the query is not finished it is aborted, which does not work with this new logic.

Let's use got_exception_from_replica instead, which does not affect dtor. And it is OK to call finish() again in RemoteSource, since subsequent will be no-op.

Fixes: #92807

Previous fix (ClickHouse#92807) introduce another issue, due to which the connection for
distributed queries can be preserved in the pool even after some
exception (usually it is socket timeout), which leads to a reconnect in
the next query, or even query failure.

The problem was that `RemoteQueryExecutor` uses `finished` flag to
decide should it call `finish()` again in `RemoteSource`, but this
`finished` flag is also used in dtor of `RemoteQueryExecutor`, and if
the query is not finished it is aborted, which does not work with this
new logic.

Let's use `got_exception_from_replica` instead, which does not affect
dtor. And it is OK to call `finish()` again in `RemoteSource`, since
subsequent will be no-op.
@azat azat force-pushed the fix-broken-dist-connections-v3 branch from 210bdfb to 2a7aa41 Compare December 29, 2025 17:31
@azat azat changed the title Fix leaving broken distributed connections in the pool Fix leaving possibly broken distributed connections in the pool Dec 29, 2025
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Dec 29, 2025

Workflow [PR], commit [2a7aa41]

Summary:

job_name test_name status info comment
Fast test failure
02001_dist_on_dist_WithMergeableStateAfterAggregation FAIL cidb
03713_replicated_columns_in_external_data_bug FAIL cidb
01814_distributed_push_down_limit FAIL cidb
02350_views_max_insert_threads FAIL cidb
Some queries hung FAIL cidb
Finish Workflow failure
python3 ./ci/jobs/scripts/workflow_hooks/new_tests_check.py failure
Build (amd_debug) dropped
Build (amd_asan) dropped
Build (amd_tsan) dropped
Build (amd_msan) dropped
Build (amd_ubsan) dropped
Build (amd_binary) dropped
Build (arm_asan) dropped
Build (arm_binary) dropped

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Dec 29, 2025
@azat
Copy link
Member Author

azat commented Jan 30, 2026

This has been fixed differently in #95466 (via was_cancelled from tryCancel)

@azat azat closed this Jan 30, 2026
@azat azat deleted the fix-broken-dist-connections-v3 branch January 30, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants