Fix LOGICAL_ERROR due to patch parts column order mismatch#99164
Fix LOGICAL_ERROR due to patch parts column order mismatch#99164CurtizJ merged 5 commits intoClickHouse:masterfrom
Conversation
…patch parts When multiple patch parts (Merge and Join mode) update the same columns, `addPatchPartsColumns` collects column names into a `NameSet` (`std::unordered_set`) and then converts them to a `Names` vector. The iteration order of `std::unordered_set` is non-deterministic, so different patch parts reading the same user columns can produce blocks with different column orderings. This causes a LOGICAL_ERROR exception in `getUpdatedHeader` -> `assertCompatibleHeader` which compares patch headers positionally: Code: 49. DB::Exception: Block structure mismatch in patch parts stream: different names of columns Fix: sort the column names after extracting them from the `NameSet` to ensure all patch parts produce blocks with the same deterministic column order.
…lify test - Sort columns in getUpdatedHeader via header.sortColumns() before cross-patch assertCompatibleHeader comparison. This is a defensive second layer: even if addPatchPartsColumns produces different column orderings, getUpdatedHeader normalizes them before comparing. - In debug/sanitizer builds, deliberately reverse column order for odd-indexed patches in addPatchPartsColumns. This deterministically triggers the LOGICAL_ERROR without the getUpdatedHeader sort fix, ensuring the regression test catches the bug. - Simplify the test to use plain MergeTree (no ZooKeeper dependency) and remove unnecessary tags/settings.
Replace the #ifdef DEBUG_OR_SANITIZER_BUILD block that reversed column order for odd-indexed patches with a proper failpoint `patch_parts_reverse_column_order`. This follows the established pattern for fault injection in the codebase (libfiu + SYSTEM ENABLE FAILPOINT). The test now explicitly enables/disables the failpoint via SQL, making it work in any build configuration where libfiu is available.
|
Workflow [PR], commit [7d595b7] Summary: ❌
AI ReviewSummaryThis PR fixes a Missing context
ClickHouse Rules
Final Verdict
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a LOGICAL_ERROR triggered when applying multiple patch parts that have the same updated columns but arrive with different column orders. It makes patch block/header ordering deterministic (normalized by column name) and adds a failpoint-driven regression test to ensure the mismatch is caught.
Changes:
- Normalize patch headers in
getUpdatedHeader()by sorting columns by name before cross-patch compatibility checks. - Make patch-part column reads deterministic by sorting requested column names (and add a failpoint to reverse order for selected patches in tests).
- Add a stateless regression test reproducing issue #1021 and validating correct behavior after merges/updates.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/queries/0_stateless/04034_patch_parts_column_order_mismatch.sql | New regression test that enables a failpoint to force differing patch column orders and validates patch application across Merge/Join modes. |
| tests/queries/0_stateless/04034_patch_parts_column_order_mismatch.reference | Expected output for the new regression test. |
| src/Storages/MergeTree/PatchParts/applyPatches.cpp | Sorts the updated header’s columns by name to avoid positional header mismatches across patches. |
| src/Storages/MergeTree/MergeTreeBlockReadUtils.cpp | Sorts patch column name list to ensure deterministic patch block column order; adds failpoint hook to reverse order for odd-indexed patches. |
| src/Common/FailPoint.cpp | Registers the new patch_parts_reverse_column_order failpoint. |
| fiu_do_on(FailPoints::patch_parts_reverse_column_order, | ||
| { | ||
| /// Reverse the order for odd-indexed patches to expose any code | ||
| /// that incorrectly relies on positional column matching between different patches. | ||
| if (i % 2 == 1) | ||
| std::reverse(patch_columns_to_read_names.begin(), patch_columns_to_read_names.end()); | ||
| }); |
There was a problem hiding this comment.
This doesn't make sense. If we fixed the incompatible headers by using sort in the previous line, then after this point we cannot reshuffle the columns. If the code works correctly after this failpoint, the sort is useless, and the bug is elsewhere.
There was a problem hiding this comment.
The actual fix is in getUpdatedHeader. The sort prior to this failpoint was just meant for defensive programming as an extra check.
I'll remove the sort in L404 and keep the one in applyPatches.cpp.
| @@ -0,0 +1,50 @@ | |||
| -- Regression test for https://github.com/ClickHouse/clickhouse-core-incidents/issues/1021 | |||
There was a problem hiding this comment.
Test doesn't reproduce the bug on the head version.
There was a problem hiding this comment.
I tried reproducing it in master when I created the PR and have tried again in be3032f.
I've added the failpoint and the test and I can easily reproduce the issue 100% of the times:
2026.03.16 17:49:35.092769 [ 191387 ] {4e711fd4-9995-45f3-90da-a9cdcbfdbc03} <Fatal> : Logical error: 'Block structure mismatch in patch parts stream: different names of columns:
a_col String String(size = 0)
c_col Float64 Float64(size = 0)'.
2026.03.16 17:49:35.092848 [ 191387 ] {4e711fd4-9995-45f3-90da-a9cdcbfdbc03} <Fatal> : Format string: 'Block structure mismatch in {} stream: different names of columns:
{}
{}'.
2026.03.16 17:49:35.097154 [ 191126 ] {} <Trace> system.metric_log (9ec522a1-432f-474e-94d6-d6a3799c6063): Trying to reserve 1.00 MiB using storage policy from min volume index 0
2026.03.16 17:49:35.097182 [ 191126 ] {} <Trace> DiskLocal: Reserved 1.00 MiB on local disk `default`, having unreserved 132.52 GiB.
2026.03.16 17:49:35.097218 [ 191126 ] {} <Debug> FakeDiskTransaction: Creating FakeDiskTransaction for disk default
2026.03.16 17:49:35.111446 [ 191387 ] {4e711fd4-9995-45f3-90da-a9cdcbfdbc03} <Fatal> : Stack trace (when copying this message, always include the lines below):
0. ../contrib/llvm-project/libcxx/include/__exception/exception.h:113:14: Poco::Exception::Exception(String&&, int) @ 0x0000000026830f33
1. ./build/../src/Common/Exception.cpp:139:7: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000001638cae9
2. ../src/Common/Exception.h:171:100: DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000e3fa594
3. ../src/Common/Exception.h:57:54: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000e3fa091
4. ../src/Common/Exception.h:189:77: DB::Exception::Exception<std::basic_string_view<char, std::char_traits<char>>&, String, String>(int, FormatStringHelperImpl<std::type_identity<std::basic_string_view<char, std::char_traits<char>>&>::type, std::type_identity<String>::type, std::type_identity<String>::type>, std::basic_string_view<char, std::char_traits<char>>&, String&&, String&&) @ 0x000000001b0b4989
5. ./build/../src/Core/Block.cpp:44:15: void DB::checkColumnStructure<void>(DB::ColumnWithTypeAndName const&, DB::ColumnWithTypeAndName const&, std::basic_string_view<char, std::char_traits<char>>, bool, int) @ 0x000000001b0a8dbd
6. ./build/../src/Core/Block.cpp:155:13: void DB::checkBlockStructure<void>(DB::Block const&, DB::Block const&, std::basic_string_view<char, std::char_traits<char>>, bool) @ 0x000000001b0af97a
7. ./build/../src/Storages/MergeTree/PatchParts/applyPatches.cpp:282:9: DB::applyPatchesToBlock(DB::Block&, DB::Block&, std::vector<std::shared_ptr<DB::PatchToApply const>, std::allocator<std::shared_ptr<DB::PatchToApply const>>> const&, std::vector<String, std::allocator<String>> const&, unsigned long) @ 0x000000001fc83251
8. ./build/../src/Storages/MergeTree/MergeTreeReadersChain.cpp:472:9: DB::MergeTreeReadersChain::applyPatches(DB::Block const&, std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&, DB::Block&, std::optional<unsigned long>, std::optional<unsigned long>, std::vector<std::vector<DB::ColumnForPatch, std::allocator<DB::ColumnForPatch>>, std::allocator<std::vector<DB::ColumnForPatch, std::allocator<DB::ColumnForPatch>>>> const&, std::set<DB::ColumnForPatch::Order, std::less<DB::ColumnForPatch::Order>, std::allocator<DB::ColumnForPatch::Order>> const&, DB::Block const&) const @ 0x000000001fa78620
9. ./build/../src/Storages/MergeTree/MergeTreeReadersChain.cpp:218:9: DB::MergeTreeReadersChain::executeActionsBeforePrewhere(DB::MergeTreeRangeReader::ReadResult&, std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&, DB::MergeTreeRangeReader&, DB::Block const&, unsigned long) const @ 0x000000001fa76b79
10. ./build/../src/Storages/MergeTree/MergeTreeReadersChain.cpp:120:9: DB::MergeTreeReadersChain::read(unsigned long, DB::MarkRanges&, std::vector<DB::MarkRanges, std::allocator<DB::MarkRanges>>&, std::function<void (std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, unsigned long, std::optional<bool>&)> const&) @ 0x000000001fa75a4e
11. ./build/../src/Storages/MergeTree/MergeTreeReadTask.cpp:381:38: DB::MergeTreeReadTask::read() @ 0x000000001fa7261e
12. ../src/Storages/MergeTree/MergeTreeSelectAlgorithms.h:53:103: DB::MergeTreeInOrderSelectAlgorithm::readFromTask(DB::MergeTreeReadTask&) @ 0x000000001fa8a8cc
13. ./build/../src/Storages/MergeTree/MergeTreeSelectProcessor.cpp:232:31: DB::MergeTreeSelectProcessor::readCurrentTask(DB::MergeTreeReadTask&, DB::IMergeTreeSelectAlgorithm&) const @ 0x000000001fa81252
14. ./build/../src/Storages/MergeTree/MergeTreeSelectProcessor.cpp:323:16: DB::MergeTreeSelectProcessor::read() @ 0x000000001fa82298
15. ./build/../src/Storages/MergeTree/MergeTreeSource.cpp:233:41: DB::MergeTreeSource::tryGenerate() @ 0x00000000206f43e1
16. ./build/../src/Processors/ISource.cpp:110:26: DB::ISource::work() @ 0x00000000200e3881
17. ./build/../src/Processors/Executors/ExecutionThreadContext.cpp:53:26: DB::ExecutionThreadContext::executeTask() @ 0x00000000200ffe70
18. ./build/../src/Processors/Executors/PipelineExecutor.cpp:351:26: DB::PipelineExecutor::executeStepImpl(unsigned long, DB::IAcquiredSlot*, std::atomic<bool>*) @ 0x00000000200f1ff2
19. ./build/../src/Processors/Executors/PipelineExecutor.cpp:279:5: DB::PipelineExecutor::executeSingleThread(unsigned long, DB::IAcquiredSlot*) @ 0x00000000200f262c
20. ./build/../src/Processors/Executors/PipelineExecutor.cpp:602:13: DB::PipelineExecutor::executeImpl(unsigned long, bool) @ 0x00000000200f0fa1
21. ./build/../src/Processors/Executors/PipelineExecutor.cpp:136:9: DB::PipelineExecutor::execute(unsigned long, bool) @ 0x00000000200f0681
22. ./build/../src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:77:24: void std::__function::__policy_func<void ()>::__call_func[abi:se210105]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'()>(std::__function::__policy_storage const*) @ 0x00000000201081d7
23. ../contrib/llvm-project/libcxx/include/__functional/function.h:508:12: ? @ 0x000000001651d983
24. ../contrib/llvm-project/libcxx/include/__type_traits/invoke.h:0: void* std::__thread_proxy[abi:se210105]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000001652484e
25. start_thread @ 0x0000000000094ac3
26. __GI___clone3 @ 0x00000000001268d0
2026.03.16 17:49:35.111671 [ 190438 ] {} <Trace> BaseDaemon: Received signal 6
2026.03.16 17:49:35.111754 [ 190438 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
2026.03.16 17:49:35.111781 [ 190438 ] {} <Fatal> BaseDaemon: (version 26.3.1.1, build id: 85B263BBADD8525F21CC9D7F6F6D979104BB45A5, git hash: be3032fb3e3b35dbff0114041fed7f2e0bac507f, architecture: x86_64) (from thread 191387) Received signal 6 (internal)
2026.03.16 17:49:35.111793 [ 190438 ] {} <Fatal> BaseDaemon: Signal description: Aborted
2026.03.16 17:49:35.111797 [ 190438 ] {} <Fatal> BaseDaemon: Sent by tkill.
The sort in addPatchPartsColumns is unnecessary: the actual fix is sortColumns() in getUpdatedHeader which normalizes column order before the positional assertCompatibleHeader comparison. The failpoint in addPatchPartsColumns simulates the non-deterministic NameSet iteration that causes the bug, and should not be preceded by a sort that masks it.
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 100.00% (22/22, 0 noise lines excluded) |
| @@ -0,0 +1,50 @@ | |||
| -- Regression test for https://github.com/ClickHouse/clickhouse-core-incidents/issues/1021 | |||
|
|
ff560a2
Backport #99164 to 26.2: Fix LOGICAL_ERROR due to patch parts column order mismatch
Backport #99164 to 26.1: Fix LOGICAL_ERROR due to patch parts column order mismatch
Backport #99164 to 25.12: Fix LOGICAL_ERROR due to patch parts column order mismatch
Backport #99164 to 25.8: Fix LOGICAL_ERROR due to patch parts column order mismatch
Linked issues
Issue
In
addPatchPartsColumns, patch column names are collected into aNameSet(std::unordered_set) and then converted to a Names vector. The iteration order of std::unordered_set is non-deterministic and depends on the standard library's hash table implementation. When two patch parts have different system column sets (Merge mode uses{_part, _part_offset, _data_version}, Join mode uses{_block_number, _block_offset, _data_version}), the different hash table layouts can cause user columns to iterate in different orders.getUpdatedHeaderthen compares patch headers positionally viaassertCompatibleHeader, which throws LOGICAL_ERROR on column name mismatch.The fix
Sort columns via
header.sortColumns()ingetUpdatedHeaderbefore the cross-patch assertCompatibleHeader comparison — a defensive second layer, since the downstream consumers (createPatchForColumn,applyPatchesToBlockRaw) use name-based lookups and are unaffected by column order.The test
Add a
patch_parts_reverse_column_orderfailpoint that deliberately reverses column order for odd-indexed patches, making the bug reproducible regardless of the standard library's hash table behavior. The regression test creates a scenario with both Merge and Join mode patches and verifies correct query results.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix LOGICAL_ERROR due to patch parts column order mismatch
Documentation entry for user-facing changes
Note
Medium Risk
Touches MergeTree patch-part read/apply paths; while intended to be order-only, it affects how patch blocks are constructed/compared and could impact lightweight update/query behavior if assumptions change.
Overview
Fixes a
LOGICAL_ERRORwhen applying multiple patch parts by making patch-block column ordering deterministic.Patch-part column lists are now sorted before reading, and
getUpdatedHeaderdefensively sorts filtered patch headers prior to cross-patchassertCompatibleHeaderchecks. A newpatch_parts_reverse_column_orderfailpoint and stateless regression test reproduce the prior mismatch between Merge- and Join-mode patches and verify stable results.Written by Cursor Bugbot for commit 232c29b. This will update automatically on new commits. Configure here.