Skip to content

Fix LOGICAL_ERROR due to patch parts column order mismatch#99164

Merged
CurtizJ merged 5 commits intoClickHouse:masterfrom
pamarcos:fix-patch-parts-column-order-mismatch
Mar 18, 2026
Merged

Fix LOGICAL_ERROR due to patch parts column order mismatch#99164
CurtizJ merged 5 commits intoClickHouse:masterfrom
pamarcos:fix-patch-parts-column-order-mismatch

Conversation

@pamarcos
Copy link
Copy Markdown
Member

@pamarcos pamarcos commented Mar 10, 2026

Linked issues

Issue

In addPatchPartsColumns, patch column names are collected into a NameSet (std::unordered_set) and then converted to a Names vector. The iteration order of std::unordered_set is non-deterministic and depends on the standard library's hash table implementation. When two patch parts have different system column sets (Merge mode uses {_part, _part_offset, _data_version}, Join mode uses {_block_number, _block_offset, _data_version}), the different hash table layouts can cause user columns to iterate in different orders. getUpdatedHeader then compares patch headers positionally via assertCompatibleHeader, which throws LOGICAL_ERROR on column name mismatch.

The fix

Sort columns via header.sortColumns() in getUpdatedHeader before the cross-patch assertCompatibleHeader comparison — a defensive second layer, since the downstream consumers (createPatchForColumn, applyPatchesToBlockRaw) use name-based lookups and are unaffected by column order.

The test

Add a patch_parts_reverse_column_order failpoint that deliberately reverses column order for odd-indexed patches, making the bug reproducible regardless of the standard library's hash table behavior. The regression test creates a scenario with both Merge and Join mode patches and verifies correct query results.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix LOGICAL_ERROR due to patch parts column order mismatch

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Touches MergeTree patch-part read/apply paths; while intended to be order-only, it affects how patch blocks are constructed/compared and could impact lightweight update/query behavior if assumptions change.

Overview
Fixes a LOGICAL_ERROR when applying multiple patch parts by making patch-block column ordering deterministic.

Patch-part column lists are now sorted before reading, and getUpdatedHeader defensively sorts filtered patch headers prior to cross-patch assertCompatibleHeader checks. A new patch_parts_reverse_column_order failpoint and stateless regression test reproduce the prior mismatch between Merge- and Join-mode patches and verify stable results.

Written by Cursor Bugbot for commit 232c29b. This will update automatically on new commits. Configure here.

…patch parts

When multiple patch parts (Merge and Join mode) update the same columns,
`addPatchPartsColumns` collects column names into a `NameSet`
(`std::unordered_set`) and then converts them to a `Names` vector.
The iteration order of `std::unordered_set` is non-deterministic, so
different patch parts reading the same user columns can produce blocks
with different column orderings. This causes a LOGICAL_ERROR exception
in `getUpdatedHeader` -> `assertCompatibleHeader` which compares
patch headers positionally:

  Code: 49. DB::Exception: Block structure mismatch in patch parts
  stream: different names of columns

Fix: sort the column names after extracting them from the `NameSet` to
ensure all patch parts produce blocks with the same deterministic column
order.
…lify test

- Sort columns in getUpdatedHeader via header.sortColumns() before
  cross-patch assertCompatibleHeader comparison. This is a defensive
  second layer: even if addPatchPartsColumns produces different column
  orderings, getUpdatedHeader normalizes them before comparing.

- In debug/sanitizer builds, deliberately reverse column order for
  odd-indexed patches in addPatchPartsColumns. This deterministically
  triggers the LOGICAL_ERROR without the getUpdatedHeader sort fix,
  ensuring the regression test catches the bug.

- Simplify the test to use plain MergeTree (no ZooKeeper dependency)
  and remove unnecessary tags/settings.
Replace the #ifdef DEBUG_OR_SANITIZER_BUILD block that reversed column
order for odd-indexed patches with a proper failpoint
`patch_parts_reverse_column_order`. This follows the established pattern
for fault injection in the codebase (libfiu + SYSTEM ENABLE FAILPOINT).

The test now explicitly enables/disables the failpoint via SQL, making it
work in any build configuration where libfiu is available.
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 10, 2026

Workflow [PR], commit [7d595b7]

Summary:

job_name test_name status info comment
Stateless tests (amd_tsan, s3 storage, parallel, 1/2) failure
04004_alter_modify_column_ttl_without_type FAIL cidb IGNORED

AI Review

Summary

This PR fixes a LOGICAL_ERROR in patch-part reads by normalizing header column order before cross-patch compatibility checks in getUpdatedHeader, and adds a deterministic regression test using a dedicated failpoint. I did not find any high-confidence blocker or major issue introduced by the changes.

Missing context

  • ⚠️ No completed Praktika/CI run artifacts were available at review time (checks are still pending), so I could not corroborate runtime behavior from logs.

ClickHouse Rules

Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsHistory.cpp
Safe rollout
Compilation time

Final Verdict

  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 10, 2026
@pamarcos pamarcos requested a review from Copilot March 10, 2026 13:08
@pamarcos pamarcos changed the title Fix patch parts column order mismatch Fix LOGICAL_ERROR due to patch parts column order mismatch Mar 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a LOGICAL_ERROR triggered when applying multiple patch parts that have the same updated columns but arrive with different column orders. It makes patch block/header ordering deterministic (normalized by column name) and adds a failpoint-driven regression test to ensure the mismatch is caught.

Changes:

  • Normalize patch headers in getUpdatedHeader() by sorting columns by name before cross-patch compatibility checks.
  • Make patch-part column reads deterministic by sorting requested column names (and add a failpoint to reverse order for selected patches in tests).
  • Add a stateless regression test reproducing issue #1021 and validating correct behavior after merges/updates.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/queries/0_stateless/04034_patch_parts_column_order_mismatch.sql New regression test that enables a failpoint to force differing patch column orders and validates patch application across Merge/Join modes.
tests/queries/0_stateless/04034_patch_parts_column_order_mismatch.reference Expected output for the new regression test.
src/Storages/MergeTree/PatchParts/applyPatches.cpp Sorts the updated header’s columns by name to avoid positional header mismatches across patches.
src/Storages/MergeTree/MergeTreeBlockReadUtils.cpp Sorts patch column name list to ensure deterministic patch block column order; adds failpoint hook to reverse order for odd-indexed patches.
src/Common/FailPoint.cpp Registers the new patch_parts_reverse_column_order failpoint.

@CurtizJ CurtizJ self-assigned this Mar 10, 2026
@pamarcos pamarcos requested a review from CurtizJ March 10, 2026 16:17
@pamarcos pamarcos marked this pull request as ready for review March 10, 2026 16:17
Comment on lines +406 to +412
fiu_do_on(FailPoints::patch_parts_reverse_column_order,
{
/// Reverse the order for odd-indexed patches to expose any code
/// that incorrectly relies on positional column matching between different patches.
if (i % 2 == 1)
std::reverse(patch_columns_to_read_names.begin(), patch_columns_to_read_names.end());
});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense. If we fixed the incompatible headers by using sort in the previous line, then after this point we cannot reshuffle the columns. If the code works correctly after this failpoint, the sort is useless, and the bug is elsewhere.

Copy link
Copy Markdown
Member Author

@pamarcos pamarcos Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual fix is in getUpdatedHeader. The sort prior to this failpoint was just meant for defensive programming as an extra check.

I'll remove the sort in L404 and keep the one in applyPatches.cpp.

@@ -0,0 +1,50 @@
-- Regression test for https://github.com/ClickHouse/clickhouse-core-incidents/issues/1021
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test doesn't reproduce the bug on the head version.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried reproducing it in master when I created the PR and have tried again in be3032f.

I've added the failpoint and the test and I can easily reproduce the issue 100% of the times:

2026.03.16 17:49:35.092769 [ 191387 ] {4e711fd4-9995-45f3-90da-a9cdcbfdbc03} <Fatal> : Logical error: 'Block structure mismatch in patch parts stream: different names of columns:
a_col String String(size = 0)
c_col Float64 Float64(size = 0)'.
2026.03.16 17:49:35.092848 [ 191387 ] {4e711fd4-9995-45f3-90da-a9cdcbfdbc03} <Fatal> : Format string: 'Block structure mismatch in {} stream: different names of columns:
{}
{}'.
2026.03.16 17:49:35.097154 [ 191126 ] {} <Trace> system.metric_log (9ec522a1-432f-474e-94d6-d6a3799c6063): Trying to reserve 1.00 MiB using storage policy from min volume index 0
2026.03.16 17:49:35.097182 [ 191126 ] {} <Trace> DiskLocal: Reserved 1.00 MiB on local disk `default`, having unreserved 132.52 GiB.
2026.03.16 17:49:35.097218 [ 191126 ] {} <Debug> FakeDiskTransaction: Creating FakeDiskTransaction for disk default
2026.03.16 17:49:35.111446 [ 191387 ] {4e711fd4-9995-45f3-90da-a9cdcbfdbc03} <Fatal> : Stack trace (when copying this message, always include the lines below):

0. ../contrib/llvm-project/libcxx/include/__exception/exception.h:113:14: Poco::Exception::Exception(String&&, int) @ 0x0000000026830f33
1. ./build/../src/Common/Exception.cpp:139:7: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000001638cae9
2. ../src/Common/Exception.h:171:100: DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000e3fa594
3. ../src/Common/Exception.h:57:54: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000e3fa091
4. ../src/Common/Exception.h:189:77: DB::Exception::Exception<std::basic_string_view<char, std::char_traits<char>>&, String, String>(int, FormatStringHelperImpl<std::type_identity<std::basic_string_view<char, std::char_traits<char>>&>::type, std::type_identity<String>::type, std::type_identity<String>::type>, std::basic_string_view<char, std::char_traits<char>>&, String&&, String&&) @ 0x000000001b0b4989
5. ./build/../src/Core/Block.cpp:44:15: void DB::checkColumnStructure<void>(DB::ColumnWithTypeAndName const&, DB::ColumnWithTypeAndName const&, std::basic_string_view<char, std::char_traits<char>>, bool, int) @ 0x000000001b0a8dbd
6. ./build/../src/Core/Block.cpp:155:13: void DB::checkBlockStructure<void>(DB::Block const&, DB::Block const&, std::basic_string_view<char, std::char_traits<char>>, bool) @ 0x000000001b0af97a
7. ./build/../src/Storages/MergeTree/PatchParts/applyPatches.cpp:282:9: DB::applyPatchesToBlock(DB::Block&, DB::Block&, std::vector<std::shared_ptr<DB::PatchToApply const>, std::allocator<std::shared_ptr<DB::PatchToApply const>>> const&, std::vector<String, std::allocator<String>> const&, unsigned long) @ 0x000000001fc83251
8. ./build/../src/Storages/MergeTree/MergeTreeReadersChain.cpp:472:9: DB::MergeTreeReadersChain::applyPatches(DB::Block const&, std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&, DB::Block&, std::optional<unsigned long>, std::optional<unsigned long>, std::vector<std::vector<DB::ColumnForPatch, std::allocator<DB::ColumnForPatch>>, std::allocator<std::vector<DB::ColumnForPatch, std::allocator<DB::ColumnForPatch>>>> const&, std::set<DB::ColumnForPatch::Order, std::less<DB::ColumnForPatch::Order>, std::allocator<DB::ColumnForPatch::Order>> const&, DB::Block const&) const @ 0x000000001fa78620
9. ./build/../src/Storages/MergeTree/MergeTreeReadersChain.cpp:218:9: DB::MergeTreeReadersChain::executeActionsBeforePrewhere(DB::MergeTreeRangeReader::ReadResult&, std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&, DB::MergeTreeRangeReader&, DB::Block const&, unsigned long) const @ 0x000000001fa76b79
10. ./build/../src/Storages/MergeTree/MergeTreeReadersChain.cpp:120:9: DB::MergeTreeReadersChain::read(unsigned long, DB::MarkRanges&, std::vector<DB::MarkRanges, std::allocator<DB::MarkRanges>>&, std::function<void (std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, unsigned long, std::optional<bool>&)> const&) @ 0x000000001fa75a4e
11. ./build/../src/Storages/MergeTree/MergeTreeReadTask.cpp:381:38: DB::MergeTreeReadTask::read() @ 0x000000001fa7261e
12. ../src/Storages/MergeTree/MergeTreeSelectAlgorithms.h:53:103: DB::MergeTreeInOrderSelectAlgorithm::readFromTask(DB::MergeTreeReadTask&) @ 0x000000001fa8a8cc
13. ./build/../src/Storages/MergeTree/MergeTreeSelectProcessor.cpp:232:31: DB::MergeTreeSelectProcessor::readCurrentTask(DB::MergeTreeReadTask&, DB::IMergeTreeSelectAlgorithm&) const @ 0x000000001fa81252
14. ./build/../src/Storages/MergeTree/MergeTreeSelectProcessor.cpp:323:16: DB::MergeTreeSelectProcessor::read() @ 0x000000001fa82298
15. ./build/../src/Storages/MergeTree/MergeTreeSource.cpp:233:41: DB::MergeTreeSource::tryGenerate() @ 0x00000000206f43e1
16. ./build/../src/Processors/ISource.cpp:110:26: DB::ISource::work() @ 0x00000000200e3881
17. ./build/../src/Processors/Executors/ExecutionThreadContext.cpp:53:26: DB::ExecutionThreadContext::executeTask() @ 0x00000000200ffe70
18. ./build/../src/Processors/Executors/PipelineExecutor.cpp:351:26: DB::PipelineExecutor::executeStepImpl(unsigned long, DB::IAcquiredSlot*, std::atomic<bool>*) @ 0x00000000200f1ff2
19. ./build/../src/Processors/Executors/PipelineExecutor.cpp:279:5: DB::PipelineExecutor::executeSingleThread(unsigned long, DB::IAcquiredSlot*) @ 0x00000000200f262c
20. ./build/../src/Processors/Executors/PipelineExecutor.cpp:602:13: DB::PipelineExecutor::executeImpl(unsigned long, bool) @ 0x00000000200f0fa1
21. ./build/../src/Processors/Executors/PipelineExecutor.cpp:136:9: DB::PipelineExecutor::execute(unsigned long, bool) @ 0x00000000200f0681
22. ./build/../src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:77:24: void std::__function::__policy_func<void ()>::__call_func[abi:se210105]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'()>(std::__function::__policy_storage const*) @ 0x00000000201081d7
23. ../contrib/llvm-project/libcxx/include/__functional/function.h:508:12: ? @ 0x000000001651d983
24. ../contrib/llvm-project/libcxx/include/__type_traits/invoke.h:0: void* std::__thread_proxy[abi:se210105]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000001652484e
25. start_thread @ 0x0000000000094ac3
26. __GI___clone3 @ 0x00000000001268d0

2026.03.16 17:49:35.111671 [ 190438 ] {} <Trace> BaseDaemon: Received signal 6
2026.03.16 17:49:35.111754 [ 190438 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
2026.03.16 17:49:35.111781 [ 190438 ] {} <Fatal> BaseDaemon: (version 26.3.1.1, build id: 85B263BBADD8525F21CC9D7F6F6D979104BB45A5, git hash: be3032fb3e3b35dbff0114041fed7f2e0bac507f, architecture: x86_64) (from thread 191387) Received signal 6 (internal)
2026.03.16 17:49:35.111793 [ 190438 ] {} <Fatal> BaseDaemon: Signal description: Aborted
2026.03.16 17:49:35.111797 [ 190438 ] {} <Fatal> BaseDaemon: Sent by tkill.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

The sort in addPatchPartsColumns is unnecessary: the actual fix is
sortColumns() in getUpdatedHeader which normalizes column order before
the positional assertCompatibleHeader comparison. The failpoint in
addPatchPartsColumns simulates the non-deterministic NameSet iteration
that causes the bug, and should not be preceded by a sort that masks it.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread src/Storages/MergeTree/MergeTreeBlockReadUtils.cpp
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 17, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.70% -0.10%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 100.00% (22/22, 0 noise lines excluded)
Diff coverage report
Uncovered code

Copy link
Copy Markdown
Member

@CurtizJ CurtizJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,50 @@
-- Regression test for https://github.com/ClickHouse/clickhouse-core-incidents/issues/1021
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Mar 18, 2026

FAIL | 04004_alter_modify_column_ttl_without_type - Fixed in master.

@CurtizJ CurtizJ added pr-must-backport Pull request should be backported intentionally. Use this label with great care! v25.12-must-backport labels Mar 18, 2026
@CurtizJ CurtizJ added this pull request to the merge queue Mar 18, 2026
Merged via the queue into ClickHouse:master with commit ff560a2 Mar 18, 2026
165 of 167 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Mar 18, 2026
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 18, 2026
robot-clickhouse added a commit that referenced this pull request Mar 18, 2026
robot-clickhouse added a commit that referenced this pull request Mar 18, 2026
robot-clickhouse added a commit that referenced this pull request Mar 18, 2026
robot-clickhouse added a commit that referenced this pull request Mar 18, 2026
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 19, 2026
clickhouse-gh bot added a commit that referenced this pull request Mar 19, 2026
Backport #99164 to 26.2: Fix LOGICAL_ERROR due to patch parts column order mismatch
CurtizJ added a commit that referenced this pull request Mar 19, 2026
Backport #99164 to 26.1: Fix LOGICAL_ERROR due to patch parts column order mismatch
CurtizJ added a commit that referenced this pull request Mar 19, 2026
Backport #99164 to 25.12: Fix LOGICAL_ERROR due to patch parts column order mismatch
CurtizJ added a commit that referenced this pull request Mar 19, 2026
Backport #99164 to 25.8: Fix LOGICAL_ERROR due to patch parts column order mismatch
@pamarcos pamarcos added v25.6-must-backport v25.8-must-backport and removed pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Mar 30, 2026
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Mar 30, 2026
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo v25.6-must-backport v25.8-must-backport v25.12-must-backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants