Fix missing stream exception for nested JSON in wide parts by Avogar · Pull Request #100475 · ClickHouse/ClickHouse

Avogar · 2026-03-23T15:14:08Z

PR #97523 fixed ColumnObject::permute and ColumnDynamic::permute to propagate statistics, but left the same bug in filter, index, replicate, and scatter for both ColumnObject and ColumnDynamic.

When a top-level JSON column containing Array(JSON) is permuted during INSERT (e.g. MergeTree sorting with optimize_on_insert=0), the chain ColumnObject::permute → ColumnArray::permute → indexImpl → ColumnObject::index drops statistics on the inner JSON column. This causes a mismatch: enumerateStreams (stream creation) uses the block_sample column which retains statistics via cloneEmpty and chooses 1 bucket (empty shared data optimization), while serializeBinaryBulkStatePrefix (serialization) uses the permuted column without statistics and chooses N buckets. The subsequent write to bucket 1 fails with "Stream ... not found" (LOGICAL_ERROR).

Reported by customer on version 25.12.1.1459:

Code: 49. DB::Exception: Stream light_images.images.Array(JSON(max_dynamic_types=16, max_dynamic_paths=256)).object_shared_data.1.size1 not found. (LOGICAL_ERROR)

Changelog category (leave one):

Critical Bug Fix (crash, data loss, RBAC)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix LOGICAL_ERROR exception "Stream ... not found" when inserting into a table with nested Array(JSON) columns in wide parts with optimize_on_insert=0.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

PR ClickHouse#97523 fixed `ColumnObject::permute` and `ColumnDynamic::permute` to propagate `statistics`, but left the same bug in `filter`, `index`, `replicate`, and `scatter` for both `ColumnObject` and `ColumnDynamic`. When a top-level JSON column containing `Array(JSON)` is permuted during INSERT (e.g. MergeTree sorting with `optimize_on_insert=0`), the chain `ColumnObject::permute` -> `ColumnArray::permute` -> `indexImpl` -> `ColumnObject::index` drops statistics on the inner JSON column. This causes a mismatch: `enumerateStreams` (stream creation) uses the `block_sample` column which retains statistics via `cloneEmpty` and chooses 1 bucket (empty shared data optimization), while `serializeBinaryBulkStatePrefix` (serialization) uses the permuted column without statistics and chooses N buckets. The subsequent write to bucket 1 fails with "Stream ... not found" (LOGICAL_ERROR). Co-Authored-By: Claude Opus 4.6 <[email protected]>

clickhouse-gh · 2026-03-23T15:14:52Z

Workflow [PR], commit [0be0ba4]

Summary: ❌

job_name	test_name	status	info	comment
AST fuzzer (amd_debug, targeted)		failure
	Logical error: Filter column for arrayRemove was not evaluated as ColumnUInt8 but as A (STID: 3494-613d)	FAIL	cidb, issue	ISSUE EXISTS
Stress test (amd_tsan)		failure
	Logical error: 'ReadBuffer is canceled. Can't read from it.' (STID: 2508-2913)	FAIL	cidb, issue	ISSUE CREATED
AST fuzzer (amd_debug)		failure
	Logical error: Bad cast from type A to B (STID: 4185-5b0d)	FAIL	cidb, issue	ISSUE EXISTS

AI Review

Summary

This PR fixes a real correctness bug by preserving JSON path statistics through filter, index, replicate, and scatter for ColumnObject and ColumnDynamic, preventing Stream ... not found LOGICAL_ERROR during inserts with nested Array(JSON) into wide parts. The code changes themselves look consistent and low-risk, but test coverage is incomplete for all changed transform paths.

Findings

⚠️ Majors
- [tests/queries/0_stateless/04054_json_nested_shared_data_buckets_missing_stream_bug.sql:5] The new regression test validates the permute -> index failure path, but this PR also changes filter, replicate, and scatter in both ColumnObject and ColumnDynamic. Without dedicated assertions for these paths, regressions in the additional touched methods can slip through.
- Suggested fix: extend this test (or add sibling tests) to trigger and validate statistics propagation through filter, replicate, and scatter code paths as well.

Tests

⚠️ Add targeted regression coverage for filter, replicate, and scatter transformations on nested Array(JSON) data, because these methods were changed in this PR and currently are not directly exercised.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅

Final Verdict

Status: ⚠️ Request changes
Minimum required actions:
- Add regression coverage for the additional changed paths (filter, replicate, scatter) so all touched statistics-propagation paths are protected by tests.

clickhouse-gh · 2026-03-23T15:19:12Z

+
+SET allow_experimental_json_type = 1;
+
+-- Regression test for a bug where ColumnObject::index (and filter/replicate/scatter)


The regression test reproduces the permute -> index path, but this PR also changes filter, replicate, and scatter in both ColumnObject and ColumnDynamic. Please add focused coverage for those transformed-column paths as well, otherwise future regressions in those methods will not be caught.

clickhouse-gh · 2026-03-23T17:54:43Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	83.90%	83.90%	+0.00%
Functions	24.50%	24.60%	+0.10%
Branches	76.50%	76.50%	+0.00%

PR changed lines: PR changed-lines coverage: 100.00% (33/33, 0 noise lines excluded)
Diff coverage report
Uncovered code

scanhex12 · 2026-03-23T18:09:54Z

@@ -0,0 +1,42 @@
+-- Tags: long
+
+SET allow_experimental_json_type = 1;


Yes. But let's keep it, I need this PR to be backported for a customer in cloud, so don't want to wait for CI rerun

…N in wide parts

…ON in wide parts

…N in wide parts

Backport #100475 to 26.1: Fix missing stream exception for nested JSON in wide parts

Backport #100475 to 25.12: Fix missing stream exception for nested JSON in wide parts

Backport #100475 to 26.2: Fix missing stream exception for nested JSON in wide parts

Backport #100475 to 26.3: Fix missing stream exception for nested JSON in wide parts

Backport #100475 to 25.8: Fix missing stream exception for nested JSON in wide parts

Backport #100475 to 25.3: Fix missing stream exception for nested JSON in wide parts

clickhouse-gh bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels Mar 23, 2026

clickhouse-gh bot reviewed Mar 23, 2026

View reviewed changes

scanhex12 self-assigned this Mar 23, 2026

scanhex12 approved these changes Mar 23, 2026

View reviewed changes

Avogar added this pull request to the merge queue Mar 24, 2026

Merged via the queue into ClickHouse:master with commit f94a13b Mar 24, 2026
292 of 300 checks passed

Avogar deleted the fix-nested-json-shared-data-missing-stream branch March 24, 2026 13:19

robot-ch-test-poll2 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Mar 24, 2026

robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 24, 2026

robot-clickhouse mentioned this pull request Mar 24, 2026

Cherry pick #100475 to 25.3: Fix missing stream exception for nested JSON in wide parts #100587

Merged

robot-clickhouse added a commit that referenced this pull request Mar 24, 2026

Backport #100475 to 25.3: Fix missing stream exception for nested JSO…

d8593fa

…N in wide parts

This was referenced Mar 24, 2026

Backport #100475 to 25.3: Fix missing stream exception for nested JSON in wide parts #100588

Merged

Cherry pick #100475 to 25.8: Fix missing stream exception for nested JSON in wide parts #100589

Merged

robot-clickhouse added a commit that referenced this pull request Mar 24, 2026

Backport #100475 to 25.8: Fix missing stream exception for nested JSO…

687e609

…N in wide parts

This was referenced Mar 24, 2026

Backport #100475 to 25.8: Fix missing stream exception for nested JSON in wide parts #100590

Merged

Cherry pick #100475 to 25.12: Fix missing stream exception for nested JSON in wide parts #100591

Merged

robot-clickhouse added a commit that referenced this pull request Mar 24, 2026

Backport #100475 to 25.12: Fix missing stream exception for nested JS…

7937af9

…ON in wide parts

This was referenced Mar 24, 2026

Backport #100475 to 25.12: Fix missing stream exception for nested JSON in wide parts #100592

Merged

Cherry pick #100475 to 26.1: Fix missing stream exception for nested JSON in wide parts #100593

Merged

robot-clickhouse added a commit that referenced this pull request Mar 24, 2026

Backport #100475 to 26.1: Fix missing stream exception for nested JSO…

aaa9d55

…N in wide parts

robot-clickhouse mentioned this pull request Mar 24, 2026

Backport #100475 to 26.1: Fix missing stream exception for nested JSON in wide parts #100594

Merged

robot-clickhouse mentioned this pull request Mar 24, 2026

Cherry pick #100475 to 26.2: Fix missing stream exception for nested JSON in wide parts #100595

Merged

robot-clickhouse added a commit that referenced this pull request Mar 24, 2026

Backport #100475 to 26.2: Fix missing stream exception for nested JSO…

8446289

…N in wide parts

This was referenced Mar 24, 2026

Backport #100475 to 26.2: Fix missing stream exception for nested JSON in wide parts #100596

Merged

Cherry pick #100475 to 26.3: Fix missing stream exception for nested JSON in wide parts #100597

Merged

robot-clickhouse added a commit that referenced this pull request Mar 24, 2026

Backport #100475 to 26.3: Fix missing stream exception for nested JSO…

e6508af

…N in wide parts

robot-clickhouse mentioned this pull request Mar 24, 2026

Backport #100475 to 26.3: Fix missing stream exception for nested JSON in wide parts #100598

Merged

robot-ch-test-poll added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 24, 2026

clickhouse-gh bot added a commit that referenced this pull request Mar 24, 2026

Merge pull request #100594 from ClickHouse/backport/26.1/100475

92bead1

Backport #100475 to 26.1: Fix missing stream exception for nested JSON in wide parts

clickhouse-gh bot added a commit that referenced this pull request Mar 24, 2026

Merge pull request #100592 from ClickHouse/backport/25.12/100475

c75398f

Backport #100475 to 25.12: Fix missing stream exception for nested JSON in wide parts

clickhouse-gh bot added a commit that referenced this pull request Mar 24, 2026

Merge pull request #100596 from ClickHouse/backport/26.2/100475

37b7b00

Backport #100475 to 26.2: Fix missing stream exception for nested JSON in wide parts

Avogar added a commit that referenced this pull request Mar 24, 2026

Merge pull request #100598 from ClickHouse/backport/26.3/100475

1a95836

Backport #100475 to 26.3: Fix missing stream exception for nested JSON in wide parts

Avogar added a commit that referenced this pull request Mar 24, 2026

Merge pull request #100590 from ClickHouse/backport/25.8/100475

80e3af2

Backport #100475 to 25.8: Fix missing stream exception for nested JSON in wide parts

Avogar added a commit that referenced this pull request Mar 24, 2026

Merge pull request #100588 from ClickHouse/backport/25.3/100475

1334132

Backport #100475 to 25.3: Fix missing stream exception for nested JSON in wide parts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing stream exception for nested JSON in wide parts#100475

Fix missing stream exception for nested JSON in wide parts#100475
Avogar merged 1 commit intoClickHouse:masterfrom
Avogar:fix-nested-json-shared-data-missing-stream

Avogar commented Mar 23, 2026

Uh oh!

clickhouse-gh bot commented Mar 23, 2026 •

edited by Avogar

Loading

Uh oh!

clickhouse-gh bot Mar 23, 2026

Uh oh!

clickhouse-gh bot commented Mar 23, 2026

Uh oh!

scanhex12 Mar 23, 2026

Uh oh!

Avogar Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		SET allow_experimental_json_type = 1;

		-- Regression test for a bug where ColumnObject::index (and filter/replicate/scatter)

		@@ -0,0 +1,42 @@
		-- Tags: long

		SET allow_experimental_json_type = 1;

Conversation

Avogar commented Mar 23, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Mar 23, 2026 • edited by Avogar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Findings

Tests

ClickHouse Rules

Final Verdict

Uh oh!

clickhouse-gh bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot commented Mar 23, 2026

LLVM Coverage Report

Uh oh!

scanhex12 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Avogar Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

clickhouse-gh bot commented Mar 23, 2026 •

edited by Avogar

Loading