Skip to content

Fix squashing partitioned delta lake data#95773

Merged
kssenii merged 1 commit intomasterfrom
delta-lake-fix-squashing-partitioned-data
Feb 2, 2026
Merged

Fix squashing partitioned delta lake data#95773
kssenii merged 1 commit intomasterfrom
delta-lake-fix-squashing-partitioned-data

Conversation

@kssenii
Copy link
Copy Markdown
Member

@kssenii kssenii commented Feb 2, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix squashing partitioned delta lake data.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 2, 2026

Workflow [PR], commit [e9fdc28]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, ParallelReplicas, s3 storage, parallel) failure
03812_join_order_ubsan_overflow FAIL cidb, issue ISSUE EXISTS
Stateless tests (arm_binary, parallel) failure
03621_pr_distributed_index_analysis FAIL cidb, issue ISSUE CREATED
Upgrade check (amd_msan) failure
Error message in clickhouse-server.log (see upgrade_error_messages.txt) FAIL cidb IGNORED

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Feb 2, 2026
for (auto & column : columns)
{
if (column->isConst())
column = column->cloneResized(chunk_num_rows)->convertToFullColumnIfConst();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also have different size of column, so we need to do cloneResized?

Overall such cases considered as bug, for example in some places we call checkNumRowsIsConsistent:

void Chunk::checkNumRowsIsConsistent()
{
for (size_t i = 0; i < columns.size(); ++i)
{
auto & column = columns[i];
if (column->size() != num_rows)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Invalid number of rows in Chunk {} column {} at position {}: expected {}, got {}",
dumpStructure(), column->getName(), i, num_rows, column->size());
}
}

So, even const columns expected to be set to correct size, but if it still some corner case maybe it's fine

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The const column is created here

auto parsed_transform = visitScanCallbackExpression(
transform,
context->read_schema,
context->expression_schema,
context->enable_expression_visitor_logging);
LOG_TEST(
context->log,
"Scanned file: {}, size: {}, num records: {}, transform: {}, has dv info: {}",
object->getPath(), size, stats ? DB::toString(stats->num_records) : "Unknown",
parsed_transform->dumpNames(), dv_info && dv_info->has_vector);
object->data_lake_metadata->schema_transform = std::move(parsed_transform);
and yes it is not resized there to correct size for memory efficiency as there we can create quite many such transforms.

@vdimir vdimir self-assigned this Feb 2, 2026
@kssenii kssenii added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Feb 2, 2026
@kssenii kssenii added this pull request to the merge queue Feb 2, 2026
Merged via the queue into master with commit 5ab350d Feb 2, 2026
130 of 134 checks passed
@kssenii kssenii deleted the delta-lake-fix-squashing-partitioned-data branch February 2, 2026 16:06
@robot-ch-test-poll robot-ch-test-poll added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Feb 2, 2026
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 2, 2026
clickhouse-gh bot added a commit that referenced this pull request Feb 2, 2026
Backport #95773 to 26.1: Fix squashing partitioned delta lake data
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Feb 5, 2026
kssenii added a commit that referenced this pull request Feb 5, 2026
Backport #95773 to 25.12: Fix squashing partitioned delta lake data
clickhouse-gh bot added a commit that referenced this pull request Feb 5, 2026
Backport #95773 to 25.11: Fix squashing partitioned delta lake data
kssenii added a commit that referenced this pull request Feb 6, 2026
Backport #95773 to 25.8: Fix squashing partitioned delta lake data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants