Skip to content

S3 cluster hive pruning#93284

Open
ianton-ru wants to merge 12 commits intoClickHouse:masterfrom
ianton-ru:s3_cluster_hive_pruning
Open

S3 cluster hive pruning#93284
ianton-ru wants to merge 12 commits intoClickHouse:masterfrom
ianton-ru:s3_cluster_hive_pruning

Conversation

@ianton-ru
Copy link
Contributor

@ianton-ru ianton-ru commented Dec 31, 2025

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a [user-readable short description]

s3Cluster table function optimization with use_hive_partitioning setting.

Documentation entry for user-facing changes

s3 table function with use_hive_partitioning setting can skip loading unused objects based on object paths.
s3Cluster ignored this and loaded all objects.

s3 uses Filter step to filter unused files. s3Cluster can't use this step, because doesn't have column to filter data after loading.
Added new step ObjectFilter to filter only loading objects.

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Touches distributed query planning and filter pushdown for cluster table functions; mistakes could change which objects are scanned or break predicate propagation/serialization across nodes.

Overview
Adds a new query plan step ObjectFilterStep (serializable/registrable) to carry a WHERE predicate specifically for object-path/virtual-column pruning in distributed object-storage reads.

When use_hive_partitioning is enabled, InterpreterSelectQuery now injects ObjectFilterStep on the initiator for ReadFromCluster plans so s3Cluster can skip unrelated Hive-partitioned objects even if the filtering column isn’t present in returned blocks; the primary-key/limit optimization pass is updated to push this step’s predicate down into SourceStepWithFilter.

Refactors ReadFromCluster into IStorageCluster.h and updates cluster filtering to use either locally-added filter DAGs or query_info.filter_actions_dag, plus adjusts object-storage cluster iteration to use getVirtualsList(); adds an integration test validating reduced file reads for s3Cluster with Hive partitioning.

Written by Cursor Bugbot for commit 25cfa06. This will update automatically on new commits. Configure here.

@kssenii kssenii self-assigned this Dec 31, 2025
@kssenii kssenii added the can be tested Allows running workflows for external contributors label Dec 31, 2025
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Dec 31, 2025

Workflow [PR], commit [ab4a5b6]

Summary:

job_name test_name status info comment
Finish Workflow failure
python3 ./ci/jobs/scripts/workflow_hooks/pr_body_check.py failure

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Dec 31, 2025
@ianton-ru
Copy link
Contributor Author

Failed tests test_peak_memory_usage and test_backup_restore_on_cluster look unrelated.

@kssenii kssenii requested a review from Copilot January 12, 2026 13:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the s3Cluster table function by implementing object-level filtering when use_hive_partitioning is enabled. Previously, s3Cluster loaded all objects and only filtered data after loading, unlike the regular s3 function which could skip loading unused objects based on partition paths. This change introduces a new ObjectFilterStep that filters objects before loading them, matching the optimization behavior of the non-cluster variant.

Changes:

  • Introduced ObjectFilterStep to filter S3 objects based on hive partitioning before loading
  • Modified ReadFromCluster class to be defined in the header file instead of inline in the implementation
  • Added integration test to verify the optimization works for both s3 and s3Cluster functions

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/integration/test_s3_cluster/test.py Added comprehensive test verifying hive partitioning optimization reduces file reads for both s3 and s3Cluster
src/Storages/ObjectStorage/StorageObjectStorageCluster.h Removed unused virtual_columns field
src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp Updated to use getVirtualsList() instead of removed virtual_columns field
src/Storages/IStorageCluster.h Moved ReadFromCluster class definition from .cpp to header for broader visibility
src/Storages/IStorageCluster.cpp Removed inline ReadFromCluster class definition (moved to header)
src/Processors/QueryPlan/QueryPlanStepRegistry.cpp Registered new ObjectFilterStep in the query plan step registry
src/Processors/QueryPlan/Optimizations/optimizePrimaryKeyConditionAndLimit.cpp Added handling for ObjectFilterStep in query plan optimization
src/Processors/QueryPlan/ObjectFilterStep.h New header defining ObjectFilterStep for filtering objects before loading
src/Processors/QueryPlan/ObjectFilterStep.cpp Implementation of ObjectFilterStep with serialization support
src/Planner/Planner.cpp Added logic to insert ObjectFilterStep for cluster queries with hive partitioning
src/Interpreters/InterpreterSelectQuery.cpp Added logic to insert ObjectFilterStep for cluster queries with hive partitioning in legacy interpreter

import os
import shutil
import time
import uuid
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uuid module is imported but the standard library uuid.uuid4() is only used for generating unique query IDs in the test. Consider using the existing random_string helper from helpers.utility (already imported on line 14) for consistency with the codebase pattern, unless UUID format is specifically required for query ID tracking.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most integration tests where query_id is used it is created based on uuid.


QueryPipelineBuilderPtr ObjectFilterStep::updatePipeline(QueryPipelineBuilders pipelines, const BuildQueryPipelineSettings & /* settings */)
{
return std::move(pipelines.front());
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updatePipeline method returns the pipeline unchanged without applying the filter transformation. This means the ObjectFilterStep does not actually filter any data in the pipeline execution. The method should create and add a FilterTransform (already included in headers) using actions_dag and filter_column_name to perform the actual filtering.

Suggested change
return std::move(pipelines.front());
auto pipeline = std::move(pipelines.front());
pipeline->addSimpleTransform([this](const Block & header)
{
return std::make_shared<FilterTransform>(
header,
std::make_shared<ActionsDAG>(actions_dag),
filter_column_name,
false);
});
return pipeline;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created ObjectFilterStep exactly because can't use FilterStep for this purpose.

&& !query_processing_info.isFirstStage()
&& expression_analysis_result.hasWhere())
{
if (typeid_cast<ReadFromCluster *>(query_plan.getRootNode()->step.get()))
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to insert ObjectFilterStep is duplicated between Planner.cpp and InterpreterSelectQuery.cpp with nearly identical conditions and implementations. Consider extracting this into a shared helper function to reduce duplication and ensure consistent behavior across both query execution paths.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this makes sense given future plans to remove old analyzer.

&& !expressions.first_stage
&& expressions.hasWhere())
{
if (typeid_cast<ReadFromCluster *>(query_plan.getRootNode()->step.get()))
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to insert ObjectFilterStep is duplicated between InterpreterSelectQuery.cpp and Planner.cpp with nearly identical conditions and implementations. Consider extracting this into a shared helper function to reduce duplication and ensure consistent behavior across both query execution paths.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above.

namespace DB
{

/// Implements WHERE operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short comment here describing key difference from FilterStep and why/where it is needed

FROM s3Cluster(cluster_simple, 'http://minio1:9001/{data_path}/key=**.parquet', 'minio', '{minio_secret_key}', 'Parquet', 'key Int32, value Int32')
WHERE key <= 2
FORMAT TSV
SETTINGS enable_filesystem_cache = 0, use_query_cache = 0, use_cache_for_count_from_files = 0, use_hive_partitioning = 1, allow_experimental_analyzer={allow_experimental_analyzer}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a test with partition_strategy='hive' as it works a bit differently from default hive setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, partition_strategy makes sense only for INSERT, not for SELECT.

Copy link
Member

@kssenii kssenii Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is also that with that setting hive columns start to be physical, while without it they are virtual (if not directly specified in schema).

@ianton-ru
Copy link
Contributor Author

Integration tests were interrupted by timeout:

[2026-01-16 16:01:42] !!!!!!! xdist.dsession.Interrupted: session-timeout: 5400.0 sec exceeded !!!!!!!

@ianton-ru
Copy link
Contributor Author

@kssenii Is anything more required?

@kssenii
Copy link
Member

kssenii commented Feb 12, 2026

Hi, sorry for the wait! I wanted to verify something before merge and postponed too much, thank you for pinging...
I checked and it looks like in case of new analyzer we can simplify this fix a bit: https://pastila.nl/?00acee2b/6be4308e9c92cd4aee7a8c6e0b301c1d#/YzOlYpLBKaA7nyIcM86DQ==GCM. I verified your test passes with this change.
I remembered about this because I was fixing partition pruning for deltaLakeCluster and it is basically same as hive pruning because delta lake stores partition values in hive partitioning style (#82131, this fix was only for new analyzer).

@ianton-ru
Copy link
Contributor Author

ianton-ru commented Mar 10, 2026

@kssenii Does this PR need something else from my side?

@kssenii
Copy link
Member

kssenii commented Mar 10, 2026

Hi! See my above comment - I suggested to simplify this fix and attached a patch in pastila link showing my suggestion.

@ianton-ru
Copy link
Contributor Author

Sorry, I missed it.
Changes are added.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@alexbakharew
Copy link
Contributor

Hi @ianton-ru,
could you please merge master into your branch? It will resolve LLVM Coverage issue

@ianton-ru
Copy link
Contributor Author

@alexbakharew Done.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Mar 17, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.70% 83.80% +0.10%
Functions 23.90% 23.90% +0.00%
Branches 76.30% 76.30% +0.00%

PR changed lines: PR changed-lines coverage: 82.29% (79/96, 0 noise lines excluded)
Diff coverage report
Uncovered code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants