Skip to content

Fix match(col, '^…') index analysis with escaped metacharacters to avoid wrong results and crashes#79969

Merged
GrigoryPervakov merged 5 commits intoClickHouse:masterfrom
filimonov:fix_index_match
Jun 6, 2025
Merged

Fix match(col, '^…') index analysis with escaped metacharacters to avoid wrong results and crashes#79969
GrigoryPervakov merged 5 commits intoClickHouse:masterfrom
filimonov:fix_index_match

Conversation

@filimonov
Copy link
Contributor

@filimonov filimonov commented May 8, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixed incorrect query results and out-of-memory crashes when using match(column, '^…') with backslash-escaped characters.

Documentation entry for user-facing changes

Previously, hitting an unsupported “\x” escape in ^… patterns would do pos = end but still run a stray ++pos, so the code walked past the regex buffer, pulled in garbage bytes (the “�i” etc), and could OOM.

Now that stray increment is removed and all single-char escapes are handled correctly.

Backports are needed.

See example of the problem in fiddle: https://fiddle.clickhouse.com/ef60eb3a-8613-4127-b5b4-bb62fc274748

Another symptom is a very quick memory allocations (up to OOM) with a traces like below:

std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::push_back(char)
DB::KeyCondition::$_20::__invoke(DB::KeyCondition::RPNElement&, DB::Field const&)
DB::KeyCondition::extractAtomFromTree(DB::RPNBuilderTreeNode const&, DB::KeyCondition::RPNElement&)
DB::RPNBuilder<DB::KeyCondition::RPNElement>::traverseTree(DB::RPNBuilderTreeNode const&)
DB::RPNBuilder<DB::KeyCondition::RPNElement>::traverseTree(DB::RPNBuilderTreeNode const&)
DB::KeyCondition::KeyCondition(DB::ActionsDAG const*, std::__1::shared_ptr<DB::Context const>, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, std::__1::shared_ptr<DB::ExpressionActions> const&, bool)
DB::buildIndexes(std::__1::optional<DB::ReadFromMergeTree::Indexes>&, DB::ActionsDAG const*, DB::MergeTreeData const&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const>>> const&, std::__1::shared_ptr<DB::MergeTreeData::IMutationsSnapshot const> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryInfo const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<Poco::Logger> const&)
DB::ReadFromMergeTree::applyFilters(DB::ActionDAGNodes)
DB::QueryPlanOptimizations::optimizePrimaryKeyConditionAndLimit(std::__1::vector<DB::QueryPlanOptimizations::Frame, std::__1::allocator<DB::QueryPlanOptimizations::Frame>> const&)
DB::QueryPlanOptimizations::optimizeTreeSecondPass(DB::QueryPlanOptimizationSettings const&, DB::QueryPlan::Node&, std::__1::list<DB::QueryPlan::Node, std::__1::allocator<DB::QueryPlan::Node>>&)
DB::QueryPlan::optimize(DB::QueryPlanOptimizationSettings const&)
DB::QueryPlan::buildQueryPipeline(DB::QueryPlanOptimizationSettings const&, DB::BuildQueryPipelineSettings const&)
DB::InterpreterSelectQueryAnalyzer::buildQueryPipeline()
DB::InterpreterSelectQueryAnalyzer::execute()
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)

- Change loop to `while (pos < end)` to guard against overruns
- Support all single‐char escapes: `\]`, `\}`, `\-`, `\\` alongside existing metachars
- Move `++pos` into the supported‐escape case only, removing stray increments
- Prevent garbage reads past `end` and ensure correct fixed‐prefix for escaped metacharacters
default:
/// all other escape sequences are not supported
pos = end;
break;
Copy link
Contributor Author

@filimonov filimonov May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after that break (which exits the current switch) the increment on line 151 was occurring, moving the pos pointer after end of string.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label May 8, 2025
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented May 8, 2025

Workflow [PR], commit [d483324]

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label May 8, 2025
@filimonov
Copy link
Contributor Author

Bugfix check fails because of
Unknown setting 'allow_dynamic_cache_resize'

Indeed that setting was missing in 25.4...

@GrigoryPervakov GrigoryPervakov self-assigned this May 12, 2025
Copy link
Member

@GrigoryPervakov GrigoryPervakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but check your test failure. Probably some random settings broke it and it needs no-random-settings to be stable in ci

@GrigoryPervakov
Copy link
Member

Fuzzer error is not related, it reproduces in master #81431

@GrigoryPervakov GrigoryPervakov added this pull request to the merge queue Jun 6, 2025
Merged via the queue into ClickHouse:master with commit 79902f2 Jun 6, 2025
117 of 121 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 6, 2025
zvonand pushed a commit to Altinity/ClickHouse that referenced this pull request Jun 17, 2025
Fix match(col, '^…') index analysis with escaped metacharacters to avoid wrong results and crashes
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Jun 18, 2025
24.8.14 Backport of ClickHouse#79969 -- Fix match(col, '^…') index analysis with escaped metacharacters to avoid wrong results and crashes
@filimonov
Copy link
Contributor Author

Backports? It really can lead to crashes.

@KochetovNicolai KochetovNicolai added pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-cloud labels Oct 31, 2025
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Oct 31, 2025
robot-ch-test-poll3 added a commit that referenced this pull request Oct 31, 2025
Cherry pick #79969 to 25.3: Fix match(col, '^…') index analysis with escaped metacharacters to avoid wrong results and crashes
robot-clickhouse added a commit that referenced this pull request Oct 31, 2025
…aped metacharacters to avoid wrong results and crashes
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Oct 31, 2025
mkmkme pushed a commit to Altinity/ClickHouse that referenced this pull request Nov 4, 2025
Fix match(col, '^…') index analysis with escaped metacharacters to avoid wrong results and crashes
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Nov 5, 2025
25.3.8 Backport of ClickHouse#79969 - Fix match(col, '^…') index analysis with escaped metacharacters to avoid wrong results and crashes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants