Skip to content

Fix function execution over const and LowCardinality with GROUP BY const for analyzer#59986

Merged
novikd merged 6 commits intoClickHouse:masterfrom
azat:analyzer/fix-group-by-const
Mar 19, 2024
Merged

Fix function execution over const and LowCardinality with GROUP BY const for analyzer#59986
novikd merged 6 commits intoClickHouse:masterfrom
azat:analyzer/fix-group-by-const

Conversation

@azat
Copy link
Member

@azat azat commented Feb 14, 2024

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix function execution over const and LowCardinality with GROUP BY const for analyzer

Consider the following example:

SELECT concatWithSeparator('|', 'a', concatWithSeparator('|', CAST('x', 'LowCardinality(String)'))) GROUP BY 'a'

Under analyzer it fails, UBsan report:

==15121==WARNING: MemorySanitizer: use-of-uninitialized-value
  ...
  8 0x5555601880ed in void DB::FormatStringImpl::format<true, false>() /src/ch/clickhouse/src/Functions/formatString.h:125:21
  9 0x55556017aeb8 in void DB::FormatStringImpl::formatExecute<>() /src/ch/clickhouse/src/Functions/formatString.h:30:13
  10 0x555560196779 in DB::()::ConcatWithSeparatorImpl<>::executeImpl() const /src/ch/clickhouse/src/Functions/concatWithSeparator.cpp:151:9
  11 0x55555a2ad5b7 in DB::FunctionToExecutableFunctionAdaptor::executeImpl() const /src/ch/clickhouse/src/Functions/IFunctionAdaptor.h:21:26
  12 0x555584312297 in DB::IExecutableFunction::executeWithoutLowCardinalityColumns() const /src/ch/clickhouse/src/Functions/IFunction.cpp:249:15
  13 0x555584317640 in DB::IExecutableFunction::executeWithoutSparseColumns() const /src/ch/clickhouse/src/Functions/IFunction.cpp:283:24
  14 0x55558431bf5c in DB::IExecutableFunction::execute() const /src/ch/clickhouse/src/Functions/IFunction.cpp:380:16
  15 0x555587bf3e20 in DB::executeAction() /src/ch/clickhouse/src/Interpreters/ExpressionActions.cpp:613:60

Uninitialized value was created by a heap allocation
  ...
  6 0x55558b1c1a05 in DB::ColumnString::reserve(unsigned long) /src/ch/clickhouse/src/Columns/ColumnString.cpp:494:13
  7 0x55558980095d in DB::prepareOutputBlockColumns() /src/ch/clickhouse/src/Interpreters/AggregationUtils.cpp:32:25

The problem is that during query analysis
(QueryAnalyzer::resolveFunction()), the return value of the function had been executed as LowCardinality(String), but the 'a' argument that is passed to the concatWithSeparator() is not-const, because it had been reused from the GROUP BY step, and this causes UB, since column 'a' does not have enough rows (it should have 2 rows, since LowCardinality always contains the default, while it has only 1).

So to address this we need to take column as-is for aggregation.

Cc: @kitaisreal
Fixes: #59918
Fixes: #59684
Fixes: #59779

@azat azat added the analyzer Issues and pull-requests related to new analyzer label Feb 14, 2024
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-bugfix Pull request with bugfix, not backported by default label Feb 14, 2024
@robot-ch-test-poll4
Copy link
Contributor

robot-ch-test-poll4 commented Feb 14, 2024

This is an automated comment for commit 0e9c648 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help❌ failure
Bugfix validationChecks that either a new test (functional or integration) or there some changed tests that fail with the binary built on master branch❌ failure
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors❌ failure

@azat azat marked this pull request as draft February 14, 2024 19:34
@azat azat force-pushed the analyzer/fix-group-by-const branch from 7c2b8c9 to 9a6c9e5 Compare February 15, 2024 10:40
@azat azat marked this pull request as ready for review February 15, 2024 10:40
@yakov-olkhovskiy
Copy link
Member

I have another solution for this, but not sure it's correct
#60046
@azat could you please take a look?

@azat azat force-pushed the analyzer/fix-group-by-const branch from 9a6c9e5 to ae839af Compare February 22, 2024 17:29
@novikd novikd self-assigned this Feb 23, 2024
@novikd
Copy link
Member

novikd commented Feb 23, 2024

AFAIU the tests fail because of WITH TOTALS. I think it's okay to just update reference files.

@azat azat force-pushed the analyzer/fix-group-by-const branch from ae839af to 6bf203c Compare February 23, 2024 15:46
@azat azat marked this pull request as draft February 23, 2024 15:47
@azat
Copy link
Member Author

azat commented Feb 23, 2024

Not all of them, need to take a more careful look.

@novikd
Copy link
Member

novikd commented Mar 7, 2024

@azat What do you mean? Both tests in Fast tests check are failed due to how we handle constants in TOTALS pocket.

@azat
Copy link
Member Author

azat commented Mar 8, 2024

What do you mean? Both tests in Fast tests check are failed due to how we handle constants in TOTALS pocket.

Yeah, you are right, I guess this comment was for some old failure, that had been fixed in b5c765c (#59986)

Anyway, I've fixed the remaining tests, let's see.

@azat azat marked this pull request as ready for review March 8, 2024 23:24
@azat
Copy link
Member Author

azat commented Mar 9, 2024

ClickHouse build check — 18/19 artifact groups are OK

Mar 09 01:02:38 FAILED: src/CMakeFiles/dbms.dir/Functions/CastOverloadResolver.cpp.o 
Mar 09 01:02:38 LLVM ERROR: out of memory

azat added 4 commits March 12, 2024 12:32
…nst for analyzer

Consider the following example:

    SELECT concatWithSeparator('|', 'a', concatWithSeparator('|', CAST('x', 'LowCardinality(String)'))) GROUP BY 'a'

Under analyzer it fails, UBsan report:

    ==15121==WARNING: MemorySanitizer: use-of-uninitialized-value
      ...
      8 0x5555601880ed in void DB::FormatStringImpl::format<true, false>() /src/ch/clickhouse/src/Functions/formatString.h:125:21
      9 0x55556017aeb8 in void DB::FormatStringImpl::formatExecute<>() /src/ch/clickhouse/src/Functions/formatString.h:30:13
      10 0x555560196779 in DB::()::ConcatWithSeparatorImpl<>::executeImpl() const /src/ch/clickhouse/src/Functions/concatWithSeparator.cpp:151:9
      11 0x55555a2ad5b7 in DB::FunctionToExecutableFunctionAdaptor::executeImpl() const /src/ch/clickhouse/src/Functions/IFunctionAdaptor.h:21:26
      12 0x555584312297 in DB::IExecutableFunction::executeWithoutLowCardinalityColumns() const /src/ch/clickhouse/src/Functions/IFunction.cpp:249:15
      13 0x555584317640 in DB::IExecutableFunction::executeWithoutSparseColumns() const /src/ch/clickhouse/src/Functions/IFunction.cpp:283:24
      14 0x55558431bf5c in DB::IExecutableFunction::execute() const /src/ch/clickhouse/src/Functions/IFunction.cpp:380:16
      15 0x555587bf3e20 in DB::executeAction() /src/ch/clickhouse/src/Interpreters/ExpressionActions.cpp:613:60

    Uninitialized value was created by a heap allocation
      ...
      6 0x55558b1c1a05 in DB::ColumnString::reserve(unsigned long) /src/ch/clickhouse/src/Columns/ColumnString.cpp:494:13
      7 0x55558980095d in DB::prepareOutputBlockColumns() /src/ch/clickhouse/src/Interpreters/AggregationUtils.cpp:32:25

The problem is that during query analysis
(QueryAnalyzer::resolveFunction()), the return value of the function had
been executed as LowCardinality(String), but the 'a' argument that is
passed to the concatWithSeparator() is not-const, because it had been
reused from the GROUP BY step, and this causes UB, since column 'a' does
not have enough rows (it should have 2 rows, since LowCardinality always
contains the default, while it has only 1).

v2: fix GROUPING SETs
Signed-off-by: Azat Khuzhin <[email protected]>
Now it preserves the original header in case of GROUP BY const, though
not for remote queries.

Signed-off-by: Azat Khuzhin <[email protected]>
Fixes:

    select toNullable(os_name) AS os_name, count() from (SELECT CAST('iphone' AS Enum8('iphone' = 1, 'android' = 2)) AS os_name) group by os_name WITH TOTALS settings allow_experimental_analyzer=1
    Code: 10. DB::Exception: Received from localhost:9000. DB::Exception: Not found column __table1.os_name:  in block toNullable(__table1.os_name) Nullable(Enum8('iphone' = 1, 'android' = 2)) Nullable(size = 0, Int8(size = 0), UInt8(size = 0)), count() UInt64 UInt64(size = 0). (NOT_FOUND_COLUMN_IN_BLOCK)

Signed-off-by: Azat Khuzhin <[email protected]>
Signed-off-by: Azat Khuzhin <[email protected]>
@azat azat marked this pull request as draft March 12, 2024 17:39
@azat
Copy link
Member Author

azat commented Mar 12, 2024

Test failures does related

Stateless tests (release, analyzer, s3, DatabaseReplicated) [2/4] — fail: 1, passed: 1557, skipped: 34

  • 02579_fill_empty_chunk

Stateless tests (release, analyzer, s3, DatabaseReplicated) [3/4] — fail: 1, passed: 1484, skipped: 43

  • 01056_predicate_optimizer_bugs

Looking into it

- 00757_enum_defaults - TOTALS
- 02699_polygons_sym_difference_rollup - TOTALS
- 02579_fill_empty_chunk - GROUP BY constX with arrayJoin(constX)

Signed-off-by: Azat Khuzhin <[email protected]>
…imizer_bugs

The behaviour of GROUP BY const in analyzer is slightly different, it
still preserves the const property from the subqueries, and that's why
it simply do not execute filters after GROUP BY, but simply one time for
the const.

The initial bug was about predicate pushdown not about GROUP BY const,
so I will update the test.

I've also tested the initial bug with fiddle and the new test is
idential:
- 19.17 (without the fix) - https://fiddle.clickhouse.com/312a48f4-6a5b-411d-a4bb-4b1a757effaf
- 20.2 (with the fix) - https://fiddle.clickhouse.com/4e1ad2e5-1c03-4ce7-9ac3-e878ccfae83a

Refs: ClickHouse#8503
Refs: ClickHouse#5682

Signed-off-by: Azat Khuzhin <[email protected]>
@azat azat force-pushed the analyzer/fix-group-by-const branch from a84dca3 to 0e9c648 Compare March 14, 2024 09:36
@azat azat marked this pull request as ready for review March 14, 2024 09:39
@azat
Copy link
Member Author

azat commented Mar 14, 2024

  • 02579_fill_empty_chunk - the problem is different behavior in analyzer, but analyzer is correct one
Details
SELECT
    2 AS x,
    arrayJoin([NULL, NULL, NULL])
GROUP BY
    GROUPING SETS (
        (0),
        ([NULL, NULL, NULL]))
ORDER BY x ASC
SETTINGS allow_experimental_analyzer = 1, enable_positional_arguments = 0

Query id: 9b7f209d-f10a-4f35-9815-0c7fc76c9862

┌─x─┬─arrayJoin([NULL, NULL, NULL])─┐
│ 2 │ ᴺᵁᴸᴸ                          │
│ 2 │ ᴺᵁᴸᴸ                          │
│ 2 │ ᴺᵁᴸᴸ                          │
└───┴───────────────────────────────┘
┌─x─┬─arrayJoin([NULL, NULL, NULL])─┐
│ 2 │ ᴺᵁᴸᴸ                          │
│ 2 │ ᴺᵁᴸᴸ                          │
│ 2 │ ᴺᵁᴸᴸ                          │
└───┴───────────────────────────────┘

SELECT
    2 AS x,
    arrayJoin([NULL, NULL, NULL])
GROUP BY
    GROUPING SETS (
        (0),
        ([NULL, NULL, NULL]))
ORDER BY x ASC
SETTINGS allow_experimental_analyzer = 0, enable_positional_arguments = 0

Query id: 4ddcfaa0-6f06-45e6-aab1-96606536c536

┌─x─┬─arrayJoin([NULL, NULL, NULL])─┐
│ 2 │ ᴺᵁᴸᴸ                          │
│ 2 │ ᴺᵁᴸᴸ                          │
│ 2 │ ᴺᵁᴸᴸ                          │
└───┴───────────────────────────────┘
  • 01056_predicate_optimizer_bugs - the problem is that the const property is preserved and the filter does not work, but the original bug (bug with subquery #5682) was about predicate push down, so I've adjusted the test - 0e9c648 (#59986)

@azat
Copy link
Member Author

azat commented Mar 14, 2024

Bugfix validation — New test(s) failed to reproduce a bug

@tavplubix
Copy link
Member

@novikd, @azat looks like this PR has introduced a crash that happens quite often in the CI:
CI DB query

Examples:
segfault: https://s3.amazonaws.com/clickhouse-test-reports/0/fd8fa63abc221532edbf300a47ff3dc246f51f8c/stress_test__asan_.html
use of uninitialized value: https://s3.amazonaws.com/clickhouse-test-reports/0/b8b977e99fefcedcd55900b3198f5a317c7f5090/stress_test__msan_.html

 [ 27510 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
 [ 27510 ] {} <Fatal> BaseDaemon: (version 24.3.1.1 (official build), build id: 64BDB31C750E14EE0A031C85058ECC3C0F8E570F, git hash: fd8fa63abc221532edbf300a47ff3dc246f51f8c) (from thread 25045) Received signal 11
 [ 27510 ] {} <Fatal> BaseDaemon: Signal description: Segmentation fault
 [ 27510 ] {} <Fatal> BaseDaemon: Address: NULL pointer. Access: read. Unknown si_code.
 [ 27510 ] {} <Fatal> BaseDaemon: Stack trace: 0x000055f1e5667af4 0x000055f1e153efbf 0x000055f1e846615c 0x000055f1e83cce84 0x000055f1e83c5f5e 0x000055f1e83a9bdc 0x000055f1e83eb011 0x000055f1efe8280f 0x000055f1efe834b7 0x000055f1f029043c 0x000055f1f0289428 0x000055f1bfe8e91b 0x00007f4456ed0ac3 0x0000
 [ 27510 ] {} <Fatal> BaseDaemon: ########################################
 [ 27510 ] {} <Fatal> BaseDaemon: (version 24.3.1.1 (official build), build id: 64BDB31C750E14EE0A031C85058ECC3C0F8E570F, git hash: fd8fa63abc221532edbf300a47ff3dc246f51f8c) (from thread 25045) (query_id: e50d5b28-c713-444a-accb-7a3bce1b318a) (query: SELECT 12 AS n GROUP BY n WITH ROLLUP;) Received 
 [ 27510 ] {} <Fatal> BaseDaemon: Address: NULL pointer. Access: read. Unknown si_code.
 [ 27510 ] {} <Fatal> BaseDaemon: Stack trace: 0x000055f1e5667af4 0x000055f1e153efbf 0x000055f1e846615c 0x000055f1e83cce84 0x000055f1e83c5f5e 0x000055f1e83a9bdc 0x000055f1e83eb011 0x000055f1efe8280f 0x000055f1efe834b7 0x000055f1f029043c 0x000055f1f0289428 0x000055f1bfe8e91b 0x00007f4456ed0ac3 0x0000
 [ 27510 ] {} <Fatal> BaseDaemon: 3. ./build_docker/./src/Columns/ColumnNullable.cpp:786: DB::ColumnNullable::checkConsistency() const @ 0x000000002ff92af4
 [ 27510 ] {} <Fatal> BaseDaemon: 4. ./build_docker/./src/DataTypes/Serializations/SerializationNullable.cpp:117: DB::SerializationNullable::serializeBinaryBulkWithMultipleStreams(DB::IColumn const&, unsigned long, unsigned long, DB::ISerialization::SerializeBinaryBulkSettings&, std::shared_ptr<DB::
 [ 27510 ] {} <Fatal> BaseDaemon: 5.0. inlined from ./build_docker/./src/Formats/NativeWriter.cpp:64: DB::writeData(DB::ISerialization const&, COW<DB::IColumn>::immutable_ptr<DB::IColumn> const&, DB::WriteBuffer&, unsigned long, unsigned long)
 [ 27510 ] {} <Fatal> BaseDaemon: 5. ./build_docker/./src/Formats/NativeWriter.cpp:164: DB::NativeWriter::write(DB::Block const&) @ 0x0000000032d9115c
 [ 27510 ] {} <Fatal> BaseDaemon: 6. ./build_docker/./src/Server/TCPHandler.cpp:0: DB::TCPHandler::sendData(DB::Block const&) @ 0x0000000032cf7e84
 [ 27510 ] {} <Fatal> BaseDaemon: 7. ./build_docker/./src/Server/TCPHandler.cpp:1056: DB::TCPHandler::processOrdinaryQueryWithProcessors() @ 0x0000000032cf0f5e
 [ 27510 ] {} <Fatal> BaseDaemon: 8. ./build_docker/./src/Server/TCPHandler.cpp:0: DB::TCPHandler::runImpl() @ 0x0000000032cd4bdc
 [ 27510 ] {} <Fatal> BaseDaemon: 9. ./build_docker/./src/Server/TCPHandler.cpp:2331: DB::TCPHandler::run() @ 0x0000000032d16011
 [ 27510 ] {} <Fatal> BaseDaemon: 10. ./build_docker/./base/poco/Net/src/TCPServerConnection.cpp:57: Poco::Net::TCPServerConnection::start() @ 0x000000003a7ad80f
 [ 27510 ] {} <Fatal> BaseDaemon: 11.0. inlined from ./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:48: std::default_delete<Poco::Net::TCPServerConnection>::operator()[abi:v15000](Poco::Net::TCPServerConnection*) const
 [ 27510 ] {} <Fatal> BaseDaemon: 11.1. inlined from ./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:305: std::unique_ptr<Poco::Net::TCPServerConnection, std::default_delete<Poco::Net::TCPServerConnection>>::reset[abi:v15000](Poco::Net::TCPServerConnection*)
 [ 27510 ] {} <Fatal> BaseDaemon: 11.2. inlined from ./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:259: ~unique_ptr
 [ 27510 ] {} <Fatal> BaseDaemon: 11. ./build_docker/./base/poco/Net/src/TCPServerDispatcher.cpp:116: Poco::Net::TCPServerDispatcher::run() @ 0x000000003a7ae4b7
 [ 27510 ] {} <Fatal> BaseDaemon: 12. ./build_docker/./base/poco/Foundation/src/ThreadPool.cpp:0: Poco::PooledThread::run() @ 0x000000003abbb43c
 [ 27510 ] {} <Fatal> BaseDaemon: 13.0. inlined from ./base/poco/Foundation/include/Poco/SharedPtr.h:231: Poco::SharedPtr<Poco::Runnable, Poco::ReferenceCounter, Poco::ReleasePolicy<Poco::Runnable>>::get()
 [ 27510 ] {} <Fatal> BaseDaemon: 13.1. inlined from ./base/poco/Foundation/include/Poco/SharedPtr.h:139: Poco::SharedPtr<Poco::Runnable, Poco::ReferenceCounter, Poco::ReleasePolicy<Poco::Runnable>>::assign(Poco::Runnable*)
 [ 27510 ] {} <Fatal> BaseDaemon: 13.2. inlined from ./base/poco/Foundation/include/Poco/SharedPtr.h:180: Poco::SharedPtr<Poco::Runnable, Poco::ReferenceCounter, Poco::ReleasePolicy<Poco::Runnable>>::operator=(Poco::Runnable*)
 [ 27510 ] {} <Fatal> BaseDaemon: 13. ./base/poco/Foundation/src/Thread_POSIX.cpp:350: Poco::ThreadImpl::runnableEntry(void*) @ 0x000000003abb4428
 [ 27510 ] {} <Fatal> BaseDaemon: 14. asan_thread_start(void*) @ 0x000000000a7b991b
 [ 27510 ] {} <Fatal> BaseDaemon: 15. ? @ 0x00007f4456ed0ac3
 [ 27510 ] {} <Fatal> BaseDaemon: 16. ? @ 0x00007f4456f62850
 [ 27510 ] {} <Fatal> BaseDaemon: Integrity check of the executable successfully passed (checksum: 3DAB30269CD3E9BA6413BEC087983A67)
 [ 27510 ] {} <Fatal> BaseDaemon: Report this error to https://github.com/ClickHouse/ClickHouse/issues
 [ 27510 ] {} <Fatal> BaseDaemon: Changed settings: min_compress_block_size = 132786, max_compress_block_size = 1397473, max_block_size = 80170, max_insert_threads = 8, max_threads = 63, max_read_buffer_size = 844524, connect_timeout_with_failover_ms = 2000, connect_timeout_with_failover_secure_ms =
 [ 3405 ] {} <Fatal> Application: Child process was terminated by signal 11.

@tavplubix
Copy link
Member

And some failures on the last commit in this PR are very suspicious:
AST fuzzer (tsan) — Received signal 11 - Details
Stress test (debug) — Killed by signal (in clickhouse-server.log) - Details

@antonio2368
Copy link
Member

@tavplubix most likely fixed with #61717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

analyzer Issues and pull-requests related to new analyzer pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

7 participants