Remove redundant sorting by devcrafter · Pull Request #43905 · ClickHouse/ClickHouse

devcrafter · 2022-12-02T21:36:10Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Remove redundant sorting, for example, sorting related ORDER BY clauses in subqueries. Implemented on top of query plan. It does similar optimization as optimize_duplicate_order_by_and_distinct regarding ORDER BY clauses, but more generic, since it's applied to any redundant sorting steps (not only caused by ORDER BY clause) and applied to subqueries of any depth. Related to #42648

…nt_order_by

+ tests

…nt_order_by

src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp

devcrafter · 2022-12-09T12:15:59Z

src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp

+                break;
+
+            if (!trans->getDataStreamTraits().preserves_sorting)
+                break;


To-do: this case is not covered by tests yet. Need some query which contain stateful function on top of ORDER BY and expression which doesn't preserve sorting between them in query plan

…nt_order_by

When removing ORDER BY we remove corresponding expression, so to remove only ORDER BY expression, do it before "mergeExpressions" optimization is applied

…nt_order_by

- do not apply optimization in case of UnionStep on top of Sorting 01104_distributed_one_test.sql

+ 01952_optimize_distributed_group_by_sharding_key

+ 02377_optimize_sorting_by_input_stream_properties_explain.sh

…nt_order_by

src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp

KochetovNicolai · 2023-01-02T19:04:14Z

src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp

+
+class RemoveRedundantOrderBy : public QueryPlanVisitor<RemoveRedundantOrderBy, debug_logging_enabled>
+{
+    std::vector<QueryPlan::Node *> nodes_affect_order;


Comment would be helpful.

Maybe not very important, but I'd better try to store the last node from nodes_affect_order in the stack frame.
For this, you can specify a template structure that you can store in the stack.

Also, ideally, somehow make a stack private for derived classes.

KochetovNicolai

Generally ok (if we add FilterStep to check)

…nt_order_by

+ cleanup + disable optimization in sort performance test since it removes sorting at all

…nt_order_by

devcrafter · 2023-01-09T11:22:37Z

Fatal messages in stress test, CI link :

/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:46:31.403884 [ 31178 ] {a9875e83-3cf9-413d-826f-9c3d975a2bc5} <Fatal> : Logical error: 'Join is supported only for pipelines with one output port'.
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.413142 [ 47753 ] {} <Fatal> BaseDaemon: ########################################
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.413953 [ 47753 ] {} <Fatal> BaseDaemon: (version 22.13.1.1, build id: 174BAE212B423703828ACC2B98068E63786671A6) (from thread 31178) (query_id: a9875e83-3cf9-413d-826f-9c3d975a2bc5) (query: SELECT bitmapCardinality(day_today) AS today_users, bitmapCardinality(day_before) AS before_users, bitmapOrCardinality(day_today, day_before) AS all_users, bitmapAndCardinality(day_today, day_before) AS old_users, bitmapAndnotCardinality(day_today, day_before) AS new_users, bitmapXorCardinality(day_today, day_before) AS diff_users FROM ( SELECT city_id, groupBitmapState( uid ) AS day_today FROM bitmap_test WHERE pickup_date = '2019-01-02' GROUP BY city_id ORDER BY city_id ) js1 ALL LEFT JOIN ( SELECT city_id, groupBitmapState( uid ) AS day_before FROM bitmap_test WHERE pickup_date = '2019-01-01' GROUP BY city_id ORDER BY city_id ) js2 USING city_id;) Received signal Aborted (6)
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.414340 [ 47753 ] {} <Fatal> BaseDaemon: 
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.414713 [ 47753 ] {} <Fatal> BaseDaemon: Stack trace: 0x7fbed1c6c00b 0x7fbed1c4b859 0x21200656 0x21200715 0x21200865 0x187e562a 0x2819856f 0x2ba8c9bc 0x2baa05df 0x295d4d71 0x29a6f0ea 0x29a6afc4 0x2b38c9b4 0x2b39d185 0x2ff577d9 0x2ff5801c 0x301a7254 0x301a3ffa 0x301a2dde 0x7fbed1e23609 0x7fbed1d48133
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.415120 [ 47753 ] {} <Fatal> BaseDaemon: 4. raise @ 0x7fbed1c6c00b in ?
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.415360 [ 47753 ] {} <Fatal> BaseDaemon: 5. abort @ 0x7fbed1c4b859 in ?
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.821774 [ 47753 ] {} <Fatal> BaseDaemon: 6. /build/build_docker/../src/Common/Exception.cpp:41: DB::abortOnFailedAssertion(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) @ 0x21200656 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:56.941782 [ 47753 ] {} <Fatal> BaseDaemon: 7. /build/build_docker/../src/Common/Exception.cpp:64: DB::handle_error_code(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, bool, std::__1::vector<void*, std::__1::allocator<void*>> const&) @ 0x21200715 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:57.043891 [ 47753 ] {} <Fatal> BaseDaemon: 8. /build/build_docker/../src/Common/Exception.cpp:78: DB::Exception::Exception(DB::Exception::MessageMasked const&, int, bool) @ 0x21200865 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:57.167260 [ 47753 ] {} <Fatal> BaseDaemon: 9. /build/build_docker/../src/Common/Exception.h:41: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, bool) @ 0x187e562a in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:57.790904 [ 47753 ] {} <Fatal> BaseDaemon: 10. /build/build_docker/../src/QueryPipeline/QueryPipelineBuilder.cpp:354: DB::QueryPipelineBuilder::joinPipelinesYShaped(std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>, std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>, std::__1::shared_ptr<DB::IJoin>, DB::Block const&, unsigned long, std::__1::vector<std::__1::shared_ptr<DB::IProcessor>, std::__1::allocator<std::__1::shared_ptr<DB::IProcessor>>>*) @ 0x2819856f in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:58.117524 [ 47753 ] {} <Fatal> BaseDaemon: 11. /build/build_docker/../src/Processors/QueryPlan/JoinStep.cpp:38: DB::JoinStep::updatePipeline(std::__1::vector<std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>, std::__1::allocator<std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>>>, DB::BuildQueryPipelineSettings const&) @ 0x2ba8c9bc in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:58.354250 [ 47753 ] {} <Fatal> BaseDaemon: 12. /build/build_docker/../src/Processors/QueryPlan/QueryPlan.cpp:187: DB::QueryPlan::buildQueryPipeline(DB::QueryPlanOptimizationSettings const&, DB::BuildQueryPipelineSettings const&) @ 0x2baa05df in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:58.997904 [ 47753 ] {} <Fatal> BaseDaemon: 13. /build/build_docker/../src/Interpreters/InterpreterSelectWithUnionQuery.cpp:380: DB::InterpreterSelectWithUnionQuery::execute() @ 0x295d4d71 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:47:59.748408 [ 47753 ] {} <Fatal> BaseDaemon: 14. /build/build_docker/../src/Interpreters/executeQuery.cpp:705: DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x29a6f0ea in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:00.053072 [ 47753 ] {} <Fatal> BaseDaemon: 15. /build/build_docker/../src/Interpreters/executeQuery.cpp:1104: DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x29a6afc4 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:00.693271 [ 47753 ] {} <Fatal> BaseDaemon: 16. /build/build_docker/../src/Server/TCPHandler.cpp:378: DB::TCPHandler::runImpl() @ 0x2b38c9b4 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.184866 [ 47753 ] {} <Fatal> BaseDaemon: 17. /build/build_docker/../src/Server/TCPHandler.cpp:1931: DB::TCPHandler::run() @ 0x2b39d185 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.220573 [ 47753 ] {} <Fatal> BaseDaemon: 18. /build/build_docker/../contrib/poco/Net/src/TCPServerConnection.cpp:43: Poco::Net::TCPServerConnection::start() @ 0x2ff577d9 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.330324 [ 47753 ] {} <Fatal> BaseDaemon: 19. /build/build_docker/../contrib/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x2ff5801c in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.430185 [ 47753 ] {} <Fatal> BaseDaemon: 20. /build/build_docker/../contrib/poco/Foundation/src/ThreadPool.cpp:199: Poco::PooledThread::run() @ 0x301a7254 in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.501286 [ 47753 ] {} <Fatal> BaseDaemon: 21. /build/build_docker/../contrib/poco/Foundation/src/Thread.cpp:56: Poco::(anonymous namespace)::RunnableHolder::run() @ 0x301a3ffa in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.551658 [ 47753 ] {} <Fatal> BaseDaemon: 22. /build/build_docker/../contrib/poco/Foundation/src/Thread_POSIX.cpp:345: Poco::ThreadImpl::runnableEntry(void*) @ 0x301a2dde in /usr/bin/clickhouse
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.562212 [ 47753 ] {} <Fatal> BaseDaemon: 23. ? @ 0x7fbed1e23609 in ?
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.562543 [ 47753 ] {} <Fatal> BaseDaemon: 24. __clone @ 0x7fbed1d48133 in ?
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:01.562781 [ 47753 ] {} <Fatal> BaseDaemon: Integrity check of the executable skipped because the reference checksum could not be read.
/var/log/clickhouse-server/clickhouse-server.err.log:2023.01.06 21:48:46.742502 [ 1568 ] {} <Fatal> Application: Child process was terminated by signal 6.

cc @vdimir Is it something known?

vdimir · 2023-01-09T12:08:48Z

Checked on commit 042f61ab92264daecccef6c2235c14b4f746fc7e from this PR

Repro:

DROP TABLE IF EXISTS t;
CREATE TABLE t ( city_id UInt32 ) ENGINE = Memory;
INSERT INTO t VALUES (1), (2);

SET max_rows_in_set_to_optimize_join = 0; --- not necessary

SET join_algorithm = 'full_sorting_merge';
SET query_plan_remove_redundant_order_by = 1;

explain plan description = 1, sorting = 1
SELECT *
FROM ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js1
JOIN ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js2
USING city_id;
SET query_plan_remove_redundant_order_by = 0;
SELECT '----';

explain plan description = 1, sorting = 1
SELECT *
FROM ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js1
JOIN ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js2
USING city_id;

But actually streams before join is sorted:

Expression ((Projection + Before ORDER BY))
Sorting (None)
  Join (JOIN YShaped)
  Sorting (None)
    Sorting (Sort Left before JOIN)
    Sorting (Global): city_id ASC
      Expression ((Before JOIN + (Projection + Before ORDER BY)))
      Sorting (None)
        Aggregating
        Sorting (None)
          Expression (Before GROUP BY)
          Sorting (None)
            ReadFromStorage (Memory)
            Sorting (None)
    Sorting (Sort Right before JOIN)
    Sorting (Global): js2.city_id ASC
      Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY))))
      Sorting (None)
        Aggregating
        Sorting (None)
          Expression (Before GROUP BY)
          Sorting (None)
            ReadFromStorage (Memory)
            Sorting (None)
----
Expression ((Projection + Before ORDER BY))
Sorting (None)
  Join (JOIN YShaped)
  Sorting (None)
    Sorting (Sort Left before JOIN)
    Sorting (Global): city_id ASC
      Expression ((Before JOIN + Projection))
      Sorting (Global): city_id ASC
        Sorting (Sorting for ORDER BY)
        Sorting (Global): city_id ASC
          Expression (Before ORDER BY)
          Sorting (None)
            Aggregating
            Sorting (None)
              Expression (Before GROUP BY)
              Sorting (None)
                ReadFromStorage (Memory)
                Sorting (None)
    Sorting (Sort Right before JOIN)
    Sorting (Global): js2.city_id ASC
      Expression ((Joined actions + (Rename joined columns + Projection)))
      Sorting (Global): city_id ASC
        Sorting (Sorting for ORDER BY)
        Sorting (Global): city_id ASC
          Expression (Before ORDER BY)
          Sorting (None)
            Aggregating
            Sorting (None)
              Expression (Before GROUP BY)
              Sorting (None)
                ReadFromStorage (Memory)
                Sorting (None)

Need to add some additional info to error message to check what left and right actually is

ClickHouse/src/QueryPipeline/QueryPipelineBuilder.cpp

Lines 353 to 354 in 119501f

    
           if (left->getNumStreams() != 1 || right->getNumStreams() != 1) 
        
               throw Exception("Join is supported only for pipelines with one output port", ErrorCodes::LOGICAL_ERROR);

…nt_order_by

vdimir · 2023-01-16T18:40:47Z

@devcrafter I've pushed commit with fix for the case above, feel free to drop it if you'll find proper solution.

After removing sorting step we need to update sorting properties of input/ouput streams

…nt_order_by

devcrafter · 2023-01-17T17:42:33Z

@devcrafter I've pushed commit with fix for the case above, feel free to drop it if you'll find proper solution.

@vdimir Thanks! The proper fix is here. Your commit contains better checks during building pipeline for JOIN, so I think, it'd be nice to bring them in but by other PR.

devcrafter · 2023-01-18T12:20:35Z

Stress test (ubsan) failure related to #44912

devcrafter added 4 commits December 2, 2022 19:27

save

47ca855

Something working

07f8ef1

Some tests + setting

e59703a

Add GROUP BY w/o aggrecation functions analyze + test

5a3abaa

devcrafter marked this pull request as draft December 2, 2022 21:36

robot-clickhouse-ci-1 added the pr-improvement Pull request with some product improvements label Dec 2, 2022

devcrafter added 2 commits December 2, 2022 21:46

Fix type

63df878

Fix + CROSS JOIN tests

611de4d

novikd self-assigned this Dec 5, 2022

devcrafter added 5 commits December 7, 2022 11:44

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

5684453

…nt_order_by

GROUP BY with check if aggr func depends on order

4969457

+ tests

Respect stateful functions in the optimization

c2b10e8

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

1658276

…nt_order_by

Polishing

55da7a0

devcrafter commented Dec 9, 2022

View reviewed changes

devcrafter marked this pull request as ready for review December 9, 2022 12:16

devcrafter mentioned this pull request Dec 9, 2022

[Umbrella] Analyzer, Planner migration #42648

Closed

devcrafter and others added 13 commits December 12, 2022 13:57

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

69d1b39

…nt_order_by

Fix: remove ORDER BY before "mergeExpressions" optimization

6f5fc66

When removing ORDER BY we remove corresponding expression, so to remove only ORDER BY expression, do it before "mergeExpressions" optimization is applied

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

1bed506

…nt_order_by

Merge branch 'master' into igor/remove_redundant_order_by

078b8ab

Fix window function case

1cbdce8

Query plan visitor with debug logs

e25053d

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

ad6e824

…nt_order_by

Fix for distributed query

6ca0773

- do not apply optimization in case of UnionStep on top of Sorting 01104_distributed_one_test.sql

Move checks to functions

b5de352

Fix for distributed queries

6dd2849

+ 01952_optimize_distributed_group_by_sharding_key

Update test to make it stable

0cdc7ab

+ 02377_optimize_sorting_by_input_stream_properties_explain.sh

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

50e1879

…nt_order_by

Fix for full sorting merge join

629f317

KochetovNicolai reviewed Jan 2, 2023

View reviewed changes

src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp Outdated Show resolved Hide resolved

KochetovNicolai reviewed Jan 2, 2023

View reviewed changes

src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp Show resolved Hide resolved

KochetovNicolai reviewed Jan 2, 2023

View reviewed changes

KochetovNicolai approved these changes Jan 2, 2023

View reviewed changes

devcrafter added 10 commits January 4, 2023 12:34

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

1edfdfd

…nt_order_by

Fix review comments

d886b56

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

48b58a9

…nt_order_by

Fix typo

d2d5a09

Disable diagnostics

2187bdd

+ cleanup + disable optimization in sort performance test since it removes sorting at all

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

042f61a

…nt_order_by

Test for aggr func w/o group by

856ecca

Rename settings to remove_redundant_sorting

e29cd5d

Renaming according to setting

cec7dc5

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

7298b39

…nt_order_by

devcrafter changed the title ~~Remove redundant ORDER BY clauses in subqueries~~ Remove redundant sorting Jan 9, 2023

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

a34991c

…nt_order_by

vdimir force-pushed the igor/remove_redundant_order_by branch from e98981d to 7347f51 Compare January 16, 2023 18:38

Fix: update input/output stream properties

6328e02

After removing sorting step we need to update sorting properties of input/ouput streams

devcrafter force-pushed the igor/remove_redundant_order_by branch from 7347f51 to 6328e02 Compare January 17, 2023 13:42

devcrafter added 2 commits January 17, 2023 16:24

Rename source file according to implementation

9855504

Merge remote-tracking branch 'origin/master' into igor/remove_redunda…

0cfa08d

…nt_order_by

Merge branch 'master' into igor/remove_redundant_order_by

0db9bf3

devcrafter merged commit 7206684 into master Jan 18, 2023

devcrafter deleted the igor/remove_redundant_order_by branch January 18, 2023 12:25

devcrafter mentioned this pull request Jan 18, 2023

Revert "Remove redundant sorting" #45414

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant sorting#43905

Remove redundant sorting#43905
devcrafter merged 55 commits intomasterfrom
igor/remove_redundant_order_by

devcrafter commented Dec 2, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

devcrafter Dec 9, 2022

Uh oh!

Uh oh!

Uh oh!

KochetovNicolai Jan 2, 2023

Uh oh!

KochetovNicolai left a comment

Uh oh!

devcrafter commented Jan 9, 2023

Uh oh!

vdimir commented Jan 9, 2023

Uh oh!

vdimir commented Jan 16, 2023

Uh oh!

devcrafter commented Jan 17, 2023

Uh oh!

devcrafter commented Jan 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

devcrafter commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

Uh oh!

Uh oh!

devcrafter Dec 9, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

KochetovNicolai Jan 2, 2023

Choose a reason for hiding this comment

Uh oh!

KochetovNicolai left a comment

Choose a reason for hiding this comment

Uh oh!

devcrafter commented Jan 9, 2023

Uh oh!

vdimir commented Jan 9, 2023

Uh oh!

vdimir commented Jan 16, 2023

Uh oh!

devcrafter commented Jan 17, 2023

Uh oh!

devcrafter commented Jan 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

devcrafter commented Dec 2, 2022 •

edited

Loading