Fix condition not being moved to prewhere in case there is a row policy by arthurpassos · Pull Request #85118 · ClickHouse/ClickHouse

arthurpassos · 2025-08-05T18:21:12Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixes #69777 and #83748

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

arthurpassos · 2025-08-05T18:21:59Z

I am not very familiar with where condition analysis, take this PR with a grain of salt :)

alexey-milovidov · 2025-08-05T18:50:33Z

Is it secure? Check that it's not possible to expose information using the throwIf function or timing attacks.
The problem is that it's not safe to even touch the data that we are not allowed to access (even if the result is discarded).

arthurpassos · 2025-08-05T20:13:05Z

Is it secure? Check that it's not possible to expose information using the throwIf function or timing attacks. The problem is that it's not safe to even touch the data that we are not allowed to access (even if the result is discarded).

Perhaps if I implement it differently and fill

ClickHouse/src/Storages/SelectQueryInfo.h

Line 46 in 957e0df

std::optional<ActionsDAG> row_level_filter;

it will be safer. Assuming PrewhereInfo::row_level_filter is checked before PrewhereInfo::prewhere_actions

UnamedRus · 2025-08-05T22:21:09Z

it will be safer. Assuming PrewhereInfo::row_level_filter is checked before

Wouldn't it be worse?

Ie, if you push user defined prewhere to RLF stage, it does mean that user could trigger exception during RLF scan.

Just an idea
Row level filter: tenant = 'XXXXXX'

Query: PREWHERE throwIf(tenant = 'YYYYYY')

If push to RLF, it will form condition like tenant = 'XXXXXX' AND throwIf(tenant = 'YYYYYY')

arthurpassos · 2025-08-06T12:36:40Z

it will be safer. Assuming PrewhereInfo::row_level_filter is checked before

Wouldn't it be worse?

Ie, if you push user defined prewhere to RLF stage, it does mean that user could trigger exception during RLF scan.

Just an idea Row level filter: tenant = 'XXXXXX'

Query: PREWHERE throwIf(tenant = 'YYYYYY')

If push to RLF, it will form condition like tenant = 'XXXXXX' AND throwIf(tenant = 'YYYYYY')

User defined wouldn't be pushed to RLF stage. Inside the PrewhereInfo object there are two condition fields: row_level_filter and prewhere_actions. Right now, prewhere_actions is being populated with everything and row_level_filter remains empty. What I am suggesting is to actually make use of the row_level_filter for row policies (as it is supposed to, I don't know why it's not being used) and leave prewhere_actions for user defined where

arthurpassos · 2025-08-06T12:38:31Z

But before doing that, I need to understand why is it that the decision to fill row_level_filter or prewhere_actions is based on order of invokation instead of relying on "here's a row policy, put it in row level filter". Maybe @amosbird knows about it?

https://github.com/arthurpassos/ClickHouse/blob/cbaa9e4e335cfef4b734038193df4e64a0fc0de7/src/Planner/PlannerJoinTree.cpp#L913

arthurpassos · 2025-08-07T00:41:24Z

@alexey-milovidov can you enable CI so I can have a better understanding if I broke something?

arthurpassos · 2025-08-07T00:55:51Z

Query condition cache is causing problems with this fix. The cache will map from prewhere to skipped ranges and ignore the row level filters in PrewhereInfo::row_level_filter. So if the row level filter actually skipped some ranges, the following queries will also skip that range even if the row policy is dropped.

More info on: https://github.com/ClickHouse/ClickHouse/pull/69236/files#r2258613976

arthurpassos · 2025-08-07T13:40:48Z

Throughout the codebase the object PrewhereInfo is assumed to contain prewhere_actions whenever it is non-null. It literally assumes it's impossible to have PrewhereInfo::row_level_filter != nullopt while PrewhereInfo::prewhere_actions.empty() == true. This looks like a mistake, but I might be missinng some important details

arthurpassos · 2025-08-07T17:58:16Z

Ok, it seems like additional_table_filters are somewhat equivalent to row policies, and that's why that weird logic of filling PrewhereInfo by execution order "exists" / "works"

arthurpassos · 2025-08-08T22:44:07Z

Background

This PR addresses an issue where WHERE conditions were not being pushed to PREWHERE when a row policy was present. This is a regression from 23.8, where such conditions were moved to PREWHERE. Details: #69777, #83748.

Root Cause

Under the new analyzer, ClickHouse instantiated PrewhereInfo with the row policy. Later, when attempting to push the regular WHERE to PREWHERE, it detected an existing PREWHERE and exited early. Removing this early exit was not enough, the WHERE condition would then override the existing PREWHERE.

Naive solution (not very important)

The fix required:

Removing the early exit.
Merging conditions so that existing PREWHERE clauses are preserved.

However, @alexey-milovidov raised a valid security concern:

Is it secure? Check that it's not possible to expose information using the throwIf function or timing attacks.

It wasn’t secure in the original form. To address this, row policies must be evaluated first, separately from other conditions. ClickHouse already has a row_level_filters field in PrewhereInfo for this purpose, but it was not being used correctly (row policies were stored in prewhere_actions instead).

The real solution

Remove early exit;
Fill PrewhereInfo::row_level_filters with row policy filters instead of putting it in PrewhereInfo:;prewhere_actions
Make PrewhereInfo::prewhere_actions optional as the only prewhere condition might be row_level_filters

By applying the above, the row policies are evaluated first. I have added throwIf tests to validate this behavior — suggestions for additional cases are welcome.

Why the PR Is Large

PrewhereInfo::prewhere_column_name and PrewhereInfo::prewhere_actions were assumed to be non-null whenever PrewhereInfo was non-null. This assumption fails when only row_level_filters are set. I made these fields optional, similar to row_level_filters, which required changes across multiple files.

Additional Consideration – additional_table_filters

Some use additional_table_filters as a shortcut for tenant-based restrictions. These were not designed as a security feature and cannot be safely evaluated with row policies. This PR forces them to be evaluated with the regular WHERE clause. If users have relied on them for security, this may introduce issues. We might want to make this behavior configurable.

arthurpassos · 2025-08-08T22:45:36Z

Perhaps an ordered conjunction of these conditions could also work (i.e, and(row_policy, additional_table_filters, ...)). This would be safe under the assumption no part of clickhouse would rearrange it

ilejn · 2025-08-11T22:42:11Z

tests/queries/0_stateless/03591_optimize_prewhere_row_policy.sql

+
+SELECT * FROM 03591_test WHERE throwIf(b=1, 'Should throw') SETTINGS optimize_move_to_prewhere = 1; -- {serverError FUNCTION_THROW_IF_VALUE_IS_NON_ZERO}
+
+CREATE ROW POLICY 03591_rp ON 03591_test USING b=2 TO CURRENT_USER;


It may be good idea to have a bit more complex row policies, e.g. another PERMISSIVE and another RESTRICTIVE.

It may be interesting to add a case where columns used in row policies are not selected.

ilejn · 2025-08-14T16:25:45Z

src/Processors/QueryPlan/Optimizations/optimizePrimaryKeyConditionAndLimit.cpp

+            source_step_with_filter->addFilter(storage_prewhere_info->prewhere_actions->clone(), storage_prewhere_info->prewhere_column_name);
        if (storage_prewhere_info->row_level_filter)
            source_step_with_filter->addFilter(storage_prewhere_info->row_level_filter->clone(), storage_prewhere_info->row_level_column_name);
    }


To me this code looks rather worrying (don't mean actual change).
It seems that some conditions like Limit applied before row policy. Is it true? If yes, is it risky from security standpoint?

The limit should be applied only after all the filters.

Also, it's used only in ReadFromPostgreSQL and ReadFromSystemNumbersStep as I can see.

alexey-milovidov · 2025-08-15T21:12:39Z

@arthurpassos We support multiple prewhere steps, so we can make the first (the earliest) prewhere step to only include row policies, and the rest - as usual.

arthurpassos · 2025-08-15T21:13:54Z

@arthurpassos We support multiple prewhere steps, so we can make the first (the earliest) prewhere step to only include row policies, and the rest - as usual.

Isn't this what I implemented in this PR?

clickhouse-gh · 2025-08-16T07:27:12Z

Workflow [PR], commit [5ceb04a]

Summary: ❌
15 failures out of 107 shown:

job_name	test_name	status
Style check		failure
	cpp	failure
	various	failure
Fast test		failure
	02346_additional_filters	FAIL
	00950_default_prewhere	FAIL
	00910_buffer_prewhere_different_types	FAIL
	02680_illegal_type_of_filter_projection	FAIL
	03143_prewhere_profile_events	FAIL
	01917_prewhere_column_type	FAIL
Build (amd_debug)		dropped
Build (amd_release)		dropped
Build (amd_asan)		dropped
Build (amd_tsan)		dropped
Build (amd_msan)		dropped
Build (amd_ubsan)		dropped
Build (amd_binary)		dropped
Build (arm_release)		dropped
Build (arm_asan)		dropped
Build (arm_coverage)		dropped
Build (arm_binary)		dropped
Build (amd_darwin)		dropped
Build (arm_darwin)		dropped

alexey-milovidov · 2025-08-16T23:12:07Z

Thanks.

arthurpassos · 2025-08-18T19:26:39Z

While working on this, I discovered another bug: #85834

It should probably be fixed before this one (otherwise it is too hard to debug the failing tests).

Unfortunately, I don't have time to continue this work right now

arthurpassos · 2025-08-18T19:29:30Z

src/Processors/QueryPlan/Optimizations/optimizePrewhere.cpp

+        if (prewhere_info->remove_prewhere_column)
+        {
+            removeFromOutput(split_result.first, prewhere_info->prewhere_column_name);
+            // split_result.first.removeUnusedActions();
+        }
+    }
+
+        {
+        std::unordered_set<const ActionsDAG::Node *> first_outputs(
+            split_result.first.getOutputs().begin(), split_result.first.getOutputs().end());
+        for (const auto * input : split_result.first.getInputs())
+        {
+            if (!first_outputs.contains(input))
+            {
+                split_result.first.getOutputs().push_back(input);
+                /// Add column to second actions as input.
+                /// Do not add it to result, so it would be removed.
+                split_result.second.addInput(input->result_name, input->result_type);
+            }
+        }
+    }
+


I still don't know what's the right procedure to merge these conditions. This was the closest I got to it (a few lines below it'll and the functions together).

arthurpassos · 2025-08-18T19:29:54Z

src/Processors/QueryPlan/Optimizations/optimizePrewhere.cpp

    for (const auto * condition : prewhere_nodes_list)
        conditions.push_back(split_result.split_nodes_mapping.at(condition));

+    /// Is it possible that prewhere_info->prewhere_actions was not empty?


Yes. Just set prewhere manually.

arthurpassos · 2025-08-18T19:32:12Z

src/Processors/QueryPlan/Optimizations/optimizePrewhere.cpp


+    /// Is it possible that prewhere_info->prewhere_actions was not empty?
+    /// Not sure, but just in case let's merge it
+    if (prewhere_info->prewhere_actions)


The thing about this function splitAndFillPrewhereInfo is that the previous implementation simply did not care if PrewhereInfo already contained actions. It would simply override them. (Actually, this was not happening because there was an early return on optimizePrewhere that prevented the code from reaching this function).

All that needs to be done is to properly construct a DAG that is a conjuction of the optimized filter_expression + existing prewhere (preserving row level filters). I just don't know how to do it :).

arthurpassos · 2025-09-26T14:19:43Z

Closing this PR in favor of #87303

fix where conditions not being moved to prewhere when row policy exists

6b09e4e

arthurpassos added 4 commits August 6, 2025 10:28

make it more secure

5c691ff

small adjustment

d876c64

add security tests

ffd0c1c

slightly different comment

362afcd

arthurpassos mentioned this pull request Aug 7, 2025

Implement Query Condition Cache #69236

Merged

arthurpassos mentioned this pull request Aug 7, 2025

Query condition cache does not handle prewhere properly #85222

Closed

arthurpassos added 2 commits August 8, 2025 16:53

add additional_table_filters to the test as well, still flkay

aea5975

make PrewhereInfo::prewhere_actions optional

5ceb04a

ilejn reviewed Aug 11, 2025

View reviewed changes

ilejn reviewed Aug 14, 2025

View reviewed changes

amosbird added the can be tested Allows running workflows for external contributors label Aug 16, 2025

clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Aug 16, 2025

wip

b524f9d

arthurpassos marked this pull request as draft August 18, 2025 19:26

arthurpassos commented Aug 18, 2025

View reviewed changes

KochetovNicolai mentioned this pull request Sep 18, 2025

Fix condition not being moved to PREWHERE in case there is a row policy (version 2) #87303

Merged

1 task

arthurpassos closed this Sep 26, 2025


		SELECT * FROM 03591_test WHERE throwIf(b=1, 'Should throw') SETTINGS optimize_move_to_prewhere = 1; -- {serverError FUNCTION_THROW_IF_VALUE_IS_NON_ZERO}

		CREATE ROW POLICY 03591_rp ON 03591_test USING b=2 TO CURRENT_USER;

Conversation

arthurpassos commented Aug 5, 2025

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

arthurpassos commented Aug 5, 2025

Uh oh!

alexey-milovidov commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurpassos commented Aug 5, 2025

Uh oh!

UnamedRus commented Aug 5, 2025

Uh oh!

arthurpassos commented Aug 6, 2025

Uh oh!

arthurpassos commented Aug 6, 2025

Uh oh!

arthurpassos commented Aug 7, 2025

Uh oh!

arthurpassos commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurpassos commented Aug 7, 2025

Uh oh!

arthurpassos commented Aug 7, 2025

Uh oh!

arthurpassos commented Aug 8, 2025

Uh oh!

arthurpassos commented Aug 8, 2025

Uh oh!

ilejn Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

ilejn Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

ilejn Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KochetovNicolai Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Aug 15, 2025

Uh oh!

arthurpassos commented Aug 15, 2025

Uh oh!

clickhouse-gh bot commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov commented Aug 16, 2025

Uh oh!

arthurpassos commented Aug 18, 2025

Uh oh!

arthurpassos Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

arthurpassos commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alexey-milovidov commented Aug 5, 2025 •

edited

Loading

arthurpassos commented Aug 7, 2025 •

edited

Loading

ilejn Aug 14, 2025 •

edited

Loading

clickhouse-gh bot commented Aug 16, 2025 •

edited

Loading