Fix condition not being moved to prewhere in case there is a row policy#85118
Fix condition not being moved to prewhere in case there is a row policy#85118arthurpassos wants to merge 8 commits intoClickHouse:masterfrom
Conversation
|
I am not very familiar with where condition analysis, take this PR with a grain of salt :) |
|
Is it secure? Check that it's not possible to expose information using the |
Perhaps if I implement it differently and fill ClickHouse/src/Storages/SelectQueryInfo.h Line 46 in 957e0df PrewhereInfo::row_level_filter is checked before PrewhereInfo::prewhere_actions
|
Wouldn't it be worse? Ie, if you push user defined prewhere to RLF stage, it does mean that user could trigger exception during RLF scan. Just an idea Query: PREWHERE throwIf(tenant = 'YYYYYY') If push to RLF, it will form condition like tenant = 'XXXXXX' AND throwIf(tenant = 'YYYYYY') |
User defined wouldn't be pushed to RLF stage. Inside the |
|
But before doing that, I need to understand why is it that the decision to fill |
|
@alexey-milovidov can you enable CI so I can have a better understanding if I broke something? |
|
Query condition cache is causing problems with this fix. The cache will map from prewhere to skipped ranges and ignore the row level filters in More info on: https://github.com/ClickHouse/ClickHouse/pull/69236/files#r2258613976 |
|
Throughout the codebase the object |
|
Ok, it seems like |
|
Background This PR addresses an issue where WHERE conditions were not being pushed to PREWHERE when a row policy was present. This is a regression from 23.8, where such conditions were moved to PREWHERE. Details: #69777, #83748. Root Cause Under the new analyzer, ClickHouse instantiated PrewhereInfo with the row policy. Later, when attempting to push the regular WHERE to PREWHERE, it detected an existing PREWHERE and exited early. Removing this early exit was not enough, the WHERE condition would then override the existing PREWHERE. Naive solution (not very important) The fix required:
However, @alexey-milovidov raised a valid security concern:
It wasn’t secure in the original form. To address this, row policies must be evaluated first, separately from other conditions. ClickHouse already has a row_level_filters field in PrewhereInfo for this purpose, but it was not being used correctly (row policies were stored in prewhere_actions instead). The real solution
By applying the above, the row policies are evaluated first. I have added throwIf tests to validate this behavior — suggestions for additional cases are welcome. Why the PR Is Large PrewhereInfo::prewhere_column_name and PrewhereInfo::prewhere_actions were assumed to be non-null whenever PrewhereInfo was non-null. This assumption fails when only row_level_filters are set. I made these fields optional, similar to row_level_filters, which required changes across multiple files. Additional Consideration – additional_table_filters Some use additional_table_filters as a shortcut for tenant-based restrictions. These were not designed as a security feature and cannot be safely evaluated with row policies. This PR forces them to be evaluated with the regular WHERE clause. If users have relied on them for security, this may introduce issues. We might want to make this behavior configurable. |
|
Perhaps an ordered conjunction of these conditions could also work (i.e, |
|
|
||
| SELECT * FROM 03591_test WHERE throwIf(b=1, 'Should throw') SETTINGS optimize_move_to_prewhere = 1; -- {serverError FUNCTION_THROW_IF_VALUE_IS_NON_ZERO} | ||
|
|
||
| CREATE ROW POLICY 03591_rp ON 03591_test USING b=2 TO CURRENT_USER; |
There was a problem hiding this comment.
It may be good idea to have a bit more complex row policies, e.g. another PERMISSIVE and another RESTRICTIVE.
There was a problem hiding this comment.
It may be interesting to add a case where columns used in row policies are not selected.
| source_step_with_filter->addFilter(storage_prewhere_info->prewhere_actions->clone(), storage_prewhere_info->prewhere_column_name); | ||
| if (storage_prewhere_info->row_level_filter) | ||
| source_step_with_filter->addFilter(storage_prewhere_info->row_level_filter->clone(), storage_prewhere_info->row_level_column_name); | ||
| } |
There was a problem hiding this comment.
To me this code looks rather worrying (don't mean actual change).
It seems that some conditions like Limit applied before row policy. Is it true? If yes, is it risky from security standpoint?
There was a problem hiding this comment.
The limit should be applied only after all the filters.
Also, it's used only in ReadFromPostgreSQL and ReadFromSystemNumbersStep as I can see.
|
@arthurpassos We support multiple prewhere steps, so we can make the first (the earliest) prewhere step to only include row policies, and the rest - as usual. |
Isn't this what I implemented in this PR? |
|
Workflow [PR], commit [5ceb04a] Summary: ❌
|
|
Thanks. |
|
While working on this, I discovered another bug: #85834 It should probably be fixed before this one (otherwise it is too hard to debug the failing tests). Unfortunately, I don't have time to continue this work right now |
| if (prewhere_info->remove_prewhere_column) | ||
| { | ||
| removeFromOutput(split_result.first, prewhere_info->prewhere_column_name); | ||
| // split_result.first.removeUnusedActions(); | ||
| } | ||
| } | ||
|
|
||
| { | ||
| std::unordered_set<const ActionsDAG::Node *> first_outputs( | ||
| split_result.first.getOutputs().begin(), split_result.first.getOutputs().end()); | ||
| for (const auto * input : split_result.first.getInputs()) | ||
| { | ||
| if (!first_outputs.contains(input)) | ||
| { | ||
| split_result.first.getOutputs().push_back(input); | ||
| /// Add column to second actions as input. | ||
| /// Do not add it to result, so it would be removed. | ||
| split_result.second.addInput(input->result_name, input->result_type); | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I still don't know what's the right procedure to merge these conditions. This was the closest I got to it (a few lines below it'll and the functions together).
| for (const auto * condition : prewhere_nodes_list) | ||
| conditions.push_back(split_result.split_nodes_mapping.at(condition)); | ||
|
|
||
| /// Is it possible that prewhere_info->prewhere_actions was not empty? |
There was a problem hiding this comment.
Yes. Just set prewhere manually.
|
|
||
| /// Is it possible that prewhere_info->prewhere_actions was not empty? | ||
| /// Not sure, but just in case let's merge it | ||
| if (prewhere_info->prewhere_actions) |
There was a problem hiding this comment.
The thing about this function splitAndFillPrewhereInfo is that the previous implementation simply did not care if PrewhereInfo already contained actions. It would simply override them. (Actually, this was not happening because there was an early return on optimizePrewhere that prevented the code from reaching this function).
All that needs to be done is to properly construct a DAG that is a conjuction of the optimized filter_expression + existing prewhere (preserving row level filters). I just don't know how to do it :).
|
Closing this PR in favor of #87303 |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixes #69777 and #83748
Documentation entry for user-facing changes