Skip to content

Fix lock inversion in ProcessList#96182

Merged
alexey-milovidov merged 2 commits intomasterfrom
fix-lock-inversion
Feb 7, 2026
Merged

Fix lock inversion in ProcessList#96182
alexey-milovidov merged 2 commits intomasterfrom
fix-lock-inversion

Conversation

@antonio2368
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix possible deadlock in ProcessList. It can happen because of possible lock inversion if memory overcommit tracker triggers when we are adding task to cancellation checker.

The bug was introduced here #95535
We need to backport the fix also.

Another option is to avoid double locking in other place also, but it requires more thoughtful approach as we can easily introduce nasty bugs if we are not careful.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 6, 2026

Workflow [PR], commit [048247a]

Summary:

job_name test_name status info comment
Stress test (amd_tsan) failure
Logical error: Block structure mismatch in A stream: different number of columns: (STID: 0993-2da1) FAIL cidb, issue
Stress test (arm_asan) failure
Logical error: Block structure mismatch in A stream: different number of columns: (STID: 0993-38e6) FAIL cidb, issue
Stress test (amd_msan) failure
Logical error: Block structure mismatch in A stream: different number of columns: (STID: 0993-38e6) FAIL cidb, issue
Finish Workflow failure
python3 ./ci/jobs/scripts/workflow_hooks/new_tests_check.py failure

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Feb 6, 2026
@yariks5s yariks5s added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Feb 6, 2026
@yariks5s yariks5s requested a review from Copilot February 6, 2026 17:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a potential deadlock in ProcessList teardown by preventing lock inversion between ProcessList::mutex, CancellationChecker, and the overcommit tracker.

Changes:

  • Add OvercommitTracker include to enable thread-local blocking.
  • Block OvercommitTracker while appending done tasks to CancellationChecker during ProcessListEntry destruction.

Comment thread src/Interpreters/ProcessList.cpp Outdated
Comment thread src/Interpreters/ProcessList.cpp Outdated
Comment on lines +402 to +403
/// We need to block the overcommit tracker here to avoid lock inversion because OvercommitTracker takes a lock on the ProcessList::mutex.
/// When task is added, we lock the ProcessList::mutex, and then the CancellatioCheker mutex.
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation is a bit confusing/mismatched with the destructor path (this code runs on destruction, not on task add). Consider rewording to describe the actual inversion: appendDoneTasks() locks CancellationChecker, and if overcommit triggers inside that path it may try to lock ProcessList::mutex, which can invert with the code path that holds ProcessList::mutex and then takes CancellationChecker.

Suggested change
/// We need to block the overcommit tracker here to avoid lock inversion because OvercommitTracker takes a lock on the ProcessList::mutex.
/// When task is added, we lock the ProcessList::mutex, and then the CancellatioCheker mutex.
/// We need to block the overcommit tracker here to avoid lock inversion between CancellationChecker and ProcessList::mutex.
/// appendDoneTasks() acquires the CancellationChecker mutex; if overcommit is triggered inside that path it may try to lock ProcessList::mutex,
/// which would invert the order with other code paths that hold ProcessList::mutex first and then access CancellationChecker.

Copilot uses AI. Check for mistakes.
@alexey-milovidov alexey-milovidov merged commit 66c75f7 into master Feb 7, 2026
126 of 133 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-lock-inversion branch February 7, 2026 03:27
@robot-ch-test-poll robot-ch-test-poll added pr-synced-to-cloud The PR is synced to the cloud repo pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Feb 7, 2026
clickhouse-gh bot added a commit that referenced this pull request Feb 7, 2026
Backport #96182 to 25.12: Fix lock inversion in ProcessList
antonio2368 added a commit that referenced this pull request Feb 9, 2026
Backport #96182 to 25.8: Fix lock inversion in ProcessList
antonio2368 added a commit that referenced this pull request Feb 9, 2026
Backport #96182 to 25.11: Fix lock inversion in ProcessList
antonio2368 added a commit that referenced this pull request Feb 9, 2026
Backport #96182 to 26.1: Fix lock inversion in ProcessList
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants