Skip to content

Fix contention in CancellationChecker thread#95563

Merged
serxa merged 4 commits intomasterfrom
fix-contention-on-canclellation-checker
Feb 2, 2026
Merged

Fix contention in CancellationChecker thread#95563
serxa merged 4 commits intomasterfrom
fix-contention-on-canclellation-checker

Conversation

@serxa
Copy link
Copy Markdown
Member

@serxa serxa commented Jan 29, 2026

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

There is no need to wake-up cancelling worker thread on most of the changes to the set of queries with timeouts

@serxa serxa requested a review from Copilot January 29, 2026 18:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the CancellationChecker thread by reducing unnecessary wake-ups and mutex contention. The key insight is that the worker thread doesn't need to be notified on every query removal since it will naturally wake up and recalibrate its timeout.

Changes:

  • Removed unused removeQueryFromSet method declaration and implementation
  • Modified appendTask to only notify the worker thread when a newly added task becomes the earliest one
  • Inlined query removal logic in appendDoneTasks and removed the notification call with a comment explaining why it's safe

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/Interpreters/CancellationChecker.h Removed unused removeQueryFromSet method declaration
src/Interpreters/CancellationChecker.cpp Inlined query removal logic, removed unnecessary notifications, and added conditional notification for new earliest tasks

Comment thread src/Interpreters/CancellationChecker.cpp
Comment thread src/Interpreters/CancellationChecker.cpp
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jan 29, 2026

Workflow [PR], commit [f43a395]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, ParallelReplicas, s3 storage, parallel) failure
03812_join_order_ubsan_overflow FAIL cidb, issue ISSUE CREATED
Stress test (amd_tsan) failure
Logical error: Unexpected return type from A. Expected B. Got C. Action: (STID: 1611-3d19) FAIL cidb, issue ISSUE EXISTS
Upgrade check (amd_asan) failure
Error message in clickhouse-server.log (see upgrade_error_messages.txt) FAIL cidb IGNORED
Upgrade check (amd_msan) failure
Error message in clickhouse-server.log (see upgrade_error_messages.txt) FAIL cidb IGNORED

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jan 29, 2026
@antonio2368 antonio2368 self-assigned this Jan 29, 2026
@serxa
Copy link
Copy Markdown
Member Author

serxa commented Feb 2, 2026

There is crash and test failures, but I dont think it is related to the PR. Lets recheck

Error:
Received signal 6
---

Stack trace:
? @ 0x0000000000091118
pthread_cond_wait @ 0x0000000000093a41
./base/poco/Foundation/src/Event_POSIX.cpp:94: Poco::EventImpl::waitImpl() @ 0x0000000021e3b482
inlined from ./base/poco/Foundation/include/Poco/Event.h:92: Poco::Event::wait()
inlined from ./src/Common/ThreadPool.h:348: ThreadFromGlobalPoolImpl<true, true>::join()
./ci/tmp/build/./src/Interpreters/DDLWorker.cpp:161: DB::DDLWorker::shutdown() @ 0x00000000191a03e6
./ci/tmp/build/./src/Databases/DatabaseReplicatedWorker.cpp:142: DB::DatabaseReplicatedDDLWorker::shutdown() @ 0x00000000184ab789
./ci/tmp/build/./src/Databases/DatabaseReplicated.cpp:2065: DB::DatabaseReplicated::stopReplication() @ 0x000000001849aa33
./ci/tmp/build/./src/Databases/DatabaseReplicated.cpp:2070: DB::DatabaseReplicated::shutdown() @ 0x000000001849ab50
./ci/tmp/build/./src/Interpreters/DatabaseCatalog.cpp:299: DB::DatabaseCatalog::shutdownImpl(std::function<void ()>) @ 0x00000000191c602c
./ci/tmp/build/./src/Interpreters/DatabaseCatalog.cpp:995: DB::DatabaseCatalog::shutdown(std::function<void ()>) @ 0x00000000191cd9cd
./ci/tmp/build/./src/Interpreters/Context.cpp:876: DB::ContextSharedPart::shutdown() @ 0x00000000190d4314
./ci/tmp/build/./programs/server/Server.cpp:1467: DB::Server::main(std::vector<String, std::allocator<String>> const&)::$_16::operator()() const @ 0x000000001388e348
inlined from ./base/base/../base/scope_guard.h:101: BasicScopeGuard<DB::Server::main(std::vector<String, std::allocator<String>> const&)::$_16>::invoke()
inlined from ./base/base/../base/scope_guard.h:50: ~BasicScopeGuard
./ci/tmp/build/./programs/server/Server.cpp:3088: DB::Server::main(std::vector<String, std::allocator<String>> const&) @ 0x0000000013872c7a
./ci/tmp/build/./base/poco/Util/src/Application.cpp:315: Poco::Util::Application::run() @ 0x0000000021f286b1
./ci/tmp/build/./programs/server/Server.cpp:704: DB::Server::run() @ 0x0000000013857272
./ci/tmp/build/./programs/server/Server.cpp:493: mainEntryClickHouseServer(int, char**) @ 0x0000000013854895
./ci/tmp/build/./programs/main.cpp:380: main @ 0x000000000b086633
__pow_finite @ 0x0000000000029d90
__remainder_finite @ 0x0000000000029e40
_start @ 0x000000000b07076e

@serxa serxa added this pull request to the merge queue Feb 2, 2026
@serxa serxa added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Feb 2, 2026
Merged via the queue into master with commit 13742c2 Feb 2, 2026
129 of 134 checks passed
@serxa serxa deleted the fix-contention-on-canclellation-checker branch February 2, 2026 14:10
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 2, 2026
@robot-clickhouse robot-clickhouse added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Feb 2, 2026
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 2, 2026
clickhouse-gh bot added a commit that referenced this pull request Feb 3, 2026
Backport #95563 to 26.1: Fix contention in CancellationChecker thread
clickhouse-gh bot added a commit that referenced this pull request Feb 3, 2026
Backport #95563 to 25.11: Fix contention in CancellationChecker thread
serxa added a commit that referenced this pull request Feb 3, 2026
Backport #95563 to 25.8: Fix contention in CancellationChecker thread
serxa added a commit that referenced this pull request Feb 3, 2026
Backport #95563 to 25.12: Fix contention in CancellationChecker thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants