Skip to content

Remove redundant sorting#43905

Merged
devcrafter merged 55 commits intomasterfrom
igor/remove_redundant_order_by
Jan 18, 2023
Merged

Remove redundant sorting#43905
devcrafter merged 55 commits intomasterfrom
igor/remove_redundant_order_by

Conversation

@devcrafter
Copy link
Member

@devcrafter devcrafter commented Dec 2, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Remove redundant sorting, for example, sorting related ORDER BY clauses in subqueries. Implemented on top of query plan. It does similar optimization as optimize_duplicate_order_by_and_distinct regarding ORDER BY clauses, but more generic, since it's applied to any redundant sorting steps (not only caused by ORDER BY clause) and applied to subqueries of any depth. Related to #42648

@devcrafter devcrafter marked this pull request as draft December 2, 2022 21:36
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-improvement Pull request with some product improvements label Dec 2, 2022
@novikd novikd self-assigned this Dec 5, 2022
break;

if (!trans->getDataStreamTraits().preserves_sorting)
break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To-do: this case is not covered by tests yet. Need some query which contain stateful function on top of ORDER BY and expression which doesn't preserve sorting between them in query plan

@devcrafter devcrafter marked this pull request as ready for review December 9, 2022 12:16
devcrafter and others added 13 commits December 12, 2022 13:57
When removing ORDER BY we remove corresponding expression,
so to remove only ORDER BY expression, do it before "mergeExpressions"
optimization is applied
- do not apply optimization in case of UnionStep on top of Sorting

01104_distributed_one_test.sql
+ 01952_optimize_distributed_group_by_sharding_key
+ 02377_optimize_sorting_by_input_stream_properties_explain.sh

class RemoveRedundantOrderBy : public QueryPlanVisitor<RemoveRedundantOrderBy, debug_logging_enabled>
{
std::vector<QueryPlan::Node *> nodes_affect_order;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment would be helpful.

Maybe not very important, but I'd better try to store the last node from nodes_affect_order in the stack frame.
For this, you can specify a template structure that you can store in the stack.

Also, ideally, somehow make a stack private for derived classes.

Copy link
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally ok (if we add FilterStep to check)

@devcrafter devcrafter changed the title Remove redundant ORDER BY clauses in subqueries Remove redundant sorting Jan 9, 2023
@devcrafter
Copy link
Member Author

Fatal messages in stress test, CI link :

/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:46:31.403884 [ 31178 ] {a9875e83-3cf9-413d-826f-9c3d975a2bc5} <Fatal> : Logical error: 'Join is supported only for pipelines with one output port'.
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.413142 [ 47753 ] {} <Fatal> BaseDaemon: ########################################
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.413953 [ 47753 ] {} <Fatal> BaseDaemon: (version 22.13.1.1, build id: 174BAE212B423703828ACC2B98068E63786671A6) (from thread 31178) (query_id: a9875e83-3cf9-413d-826f-9c3d975a2bc5) (query: SELECT bitmapCardinality(day_today) AS today_users, bitmapCardinality(day_before) AS before_users, bitmapOrCardinality(day_today, day_before) AS all_users, bitmapAndCardinality(day_today, day_before) AS old_users, bitmapAndnotCardinality(day_today, day_before) AS new_users, bitmapXorCardinality(day_today, day_before) AS diff_users FROM ( SELECT city_id, groupBitmapState( uid ) AS day_today FROM bitmap_test WHERE pickup_date = '2019-01-02' GROUP BY city_id ORDER BY city_id ) js1 ALL LEFT JOIN ( SELECT city_id, groupBitmapState( uid ) AS day_before FROM bitmap_test WHERE pickup_date = '2019-01-01' GROUP BY city_id ORDER BY city_id ) js2 USING city_id;) Received signal Aborted (6)
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.414340 [ 47753 ] {} <Fatal> BaseDaemon: 
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.414713 [ 47753 ] {} <Fatal> BaseDaemon: Stack trace: 0x7fbed1c6c00b 0x7fbed1c4b859 0x21200656 0x21200715 0x21200865 0x187e562a 0x2819856f 0x2ba8c9bc 0x2baa05df 0x295d4d71 0x29a6f0ea 0x29a6afc4 0x2b38c9b4 0x2b39d185 0x2ff577d9 0x2ff5801c 0x301a7254 0x301a3ffa 0x301a2dde 0x7fbed1e23609 0x7fbed1d48133
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.415120 [ 47753 ] {} <Fatal> BaseDaemon: 4. raise @ 0x7fbed1c6c00b in ?
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.415360 [ 47753 ] {} <Fatal> BaseDaemon: 5. abort @ 0x7fbed1c4b859 in ?
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.821774 [ 47753 ] {} <Fatal> BaseDaemon: 6. /build/build_docker/../src/Common/Exception.cpp:41: DB::abortOnFailedAssertion(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) @ 0x21200656 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.941782 [ 47753 ] {} <Fatal> BaseDaemon: 7. /build/build_docker/../src/Common/Exception.cpp:64: DB::handle_error_code(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, bool, std::__1::vector<void*, std::__1::allocator<void*>> const&) @ 0x21200715 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:57.043891 [ 47753 ] {} <Fatal> BaseDaemon: 8. /build/build_docker/../src/Common/Exception.cpp:78: DB::Exception::Exception(DB::Exception::MessageMasked const&, int, bool) @ 0x21200865 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:57.167260 [ 47753 ] {} <Fatal> BaseDaemon: 9. /build/build_docker/../src/Common/Exception.h:41: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, bool) @ 0x187e562a in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:57.790904 [ 47753 ] {} <Fatal> BaseDaemon: 10. /build/build_docker/../src/QueryPipeline/QueryPipelineBuilder.cpp:354: DB::QueryPipelineBuilder::joinPipelinesYShaped(std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>, std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>, std::__1::shared_ptr<DB::IJoin>, DB::Block const&, unsigned long, std::__1::vector<std::__1::shared_ptr<DB::IProcessor>, std::__1::allocator<std::__1::shared_ptr<DB::IProcessor>>>*) @ 0x2819856f in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:58.117524 [ 47753 ] {} <Fatal> BaseDaemon: 11. /build/build_docker/../src/Processors/QueryPlan/JoinStep.cpp:38: DB::JoinStep::updatePipeline(std::__1::vector<std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>, std::__1::allocator<std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>>>, DB::BuildQueryPipelineSettings const&) @ 0x2ba8c9bc in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:58.354250 [ 47753 ] {} <Fatal> BaseDaemon: 12. /build/build_docker/../src/Processors/QueryPlan/QueryPlan.cpp:187: DB::QueryPlan::buildQueryPipeline(DB::QueryPlanOptimizationSettings const&, DB::BuildQueryPipelineSettings const&) @ 0x2baa05df in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:58.997904 [ 47753 ] {} <Fatal> BaseDaemon: 13. /build/build_docker/../src/Interpreters/InterpreterSelectWithUnionQuery.cpp:380: DB::InterpreterSelectWithUnionQuery::execute() @ 0x295d4d71 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:59.748408 [ 47753 ] {} <Fatal> BaseDaemon: 14. /build/build_docker/../src/Interpreters/executeQuery.cpp:705: DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x29a6f0ea in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:00.053072 [ 47753 ] {} <Fatal> BaseDaemon: 15. /build/build_docker/../src/Interpreters/executeQuery.cpp:1104: DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x29a6afc4 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:00.693271 [ 47753 ] {} <Fatal> BaseDaemon: 16. /build/build_docker/../src/Server/TCPHandler.cpp:378: DB::TCPHandler::runImpl() @ 0x2b38c9b4 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.184866 [ 47753 ] {} <Fatal> BaseDaemon: 17. /build/build_docker/../src/Server/TCPHandler.cpp:1931: DB::TCPHandler::run() @ 0x2b39d185 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.220573 [ 47753 ] {} <Fatal> BaseDaemon: 18. /build/build_docker/../contrib/poco/Net/src/TCPServerConnection.cpp:43: Poco::Net::TCPServerConnection::start() @ 0x2ff577d9 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.330324 [ 47753 ] {} <Fatal> BaseDaemon: 19. /build/build_docker/../contrib/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x2ff5801c in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.430185 [ 47753 ] {} <Fatal> BaseDaemon: 20. /build/build_docker/../contrib/poco/Foundation/src/ThreadPool.cpp:199: Poco::PooledThread::run() @ 0x301a7254 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.501286 [ 47753 ] {} <Fatal> BaseDaemon: 21. /build/build_docker/../contrib/poco/Foundation/src/Thread.cpp:56: Poco::(anonymous namespace)::RunnableHolder::run() @ 0x301a3ffa in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.551658 [ 47753 ] {} <Fatal> BaseDaemon: 22. /build/build_docker/../contrib/poco/Foundation/src/Thread_POSIX.cpp:345: Poco::ThreadImpl::runnableEntry(void*) @ 0x301a2dde in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.562212 [ 47753 ] {} <Fatal> BaseDaemon: 23. ? @ 0x7fbed1e23609 in ?
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.562543 [ 47753 ] {} <Fatal> BaseDaemon: 24. __clone @ 0x7fbed1d48133 in ?
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.562781 [ 47753 ] {} <Fatal> BaseDaemon: Integrity check of the executable skipped because the reference checksum could not be read.
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:46.742502 [ 1568 ] {} <Fatal> Application: Child process was terminated by signal 6.

cc @vdimir Is it something known?

@vdimir
Copy link
Member

vdimir commented Jan 9, 2023

Checked on commit 042f61ab92264daecccef6c2235c14b4f746fc7e from this PR

Repro:

DROP TABLE IF EXISTS t;
CREATE TABLE t ( city_id UInt32 ) ENGINE = Memory;
INSERT INTO t VALUES (1), (2);

SET max_rows_in_set_to_optimize_join = 0; --- not necessary

SET join_algorithm = 'full_sorting_merge';
SET query_plan_remove_redundant_order_by = 1;

explain plan description = 1, sorting = 1
SELECT *
FROM ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js1
JOIN ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js2
USING city_id;
SET query_plan_remove_redundant_order_by = 0;
SELECT '----';

explain plan description = 1, sorting = 1
SELECT *
FROM ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js1
JOIN ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js2
USING city_id;

But actually streams before join is sorted:

Expression ((Projection + Before ORDER BY))
Sorting (None)
  Join (JOIN YShaped)
  Sorting (None)
    Sorting (Sort Left before JOIN)
    Sorting (Global): city_id ASC
      Expression ((Before JOIN + (Projection + Before ORDER BY)))
      Sorting (None)
        Aggregating
        Sorting (None)
          Expression (Before GROUP BY)
          Sorting (None)
            ReadFromStorage (Memory)
            Sorting (None)
    Sorting (Sort Right before JOIN)
    Sorting (Global): js2.city_id ASC
      Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY))))
      Sorting (None)
        Aggregating
        Sorting (None)
          Expression (Before GROUP BY)
          Sorting (None)
            ReadFromStorage (Memory)
            Sorting (None)
----
Expression ((Projection + Before ORDER BY))
Sorting (None)
  Join (JOIN YShaped)
  Sorting (None)
    Sorting (Sort Left before JOIN)
    Sorting (Global): city_id ASC
      Expression ((Before JOIN + Projection))
      Sorting (Global): city_id ASC
        Sorting (Sorting for ORDER BY)
        Sorting (Global): city_id ASC
          Expression (Before ORDER BY)
          Sorting (None)
            Aggregating
            Sorting (None)
              Expression (Before GROUP BY)
              Sorting (None)
                ReadFromStorage (Memory)
                Sorting (None)
    Sorting (Sort Right before JOIN)
    Sorting (Global): js2.city_id ASC
      Expression ((Joined actions + (Rename joined columns + Projection)))
      Sorting (Global): city_id ASC
        Sorting (Sorting for ORDER BY)
        Sorting (Global): city_id ASC
          Expression (Before ORDER BY)
          Sorting (None)
            Aggregating
            Sorting (None)
              Expression (Before GROUP BY)
              Sorting (None)
                ReadFromStorage (Memory)
                Sorting (None)

Need to add some additional info to error message to check what left and right actually is

if (left->getNumStreams() != 1 || right->getNumStreams() != 1)
throw Exception("Join is supported only for pipelines with one output port", ErrorCodes::LOGICAL_ERROR);

@vdimir vdimir force-pushed the igor/remove_redundant_order_by branch from e98981d to 7347f51 Compare January 16, 2023 18:38
@vdimir
Copy link
Member

vdimir commented Jan 16, 2023

@devcrafter I've pushed commit with fix for the case above, feel free to drop it if you'll find proper solution.

After removing sorting step we need to update sorting properties of
input/ouput streams
@devcrafter devcrafter force-pushed the igor/remove_redundant_order_by branch from 7347f51 to 6328e02 Compare January 17, 2023 13:42
@devcrafter
Copy link
Member Author

@devcrafter I've pushed commit with fix for the case above, feel free to drop it if you'll find proper solution.

@vdimir Thanks! The proper fix is here. Your commit contains better checks during building pipeline for JOIN, so I think, it'd be nice to bring them in but by other PR.

@devcrafter
Copy link
Member Author

Stress test (ubsan) failure related to #44912

@devcrafter devcrafter merged commit 7206684 into master Jan 18, 2023
@devcrafter devcrafter deleted the igor/remove_redundant_order_by branch January 18, 2023 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants