Conversation
src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp
Outdated
Show resolved
Hide resolved
src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp
Outdated
Show resolved
Hide resolved
| break; | ||
|
|
||
| if (!trans->getDataStreamTraits().preserves_sorting) | ||
| break; |
There was a problem hiding this comment.
To-do: this case is not covered by tests yet. Need some query which contain stateful function on top of ORDER BY and expression which doesn't preserve sorting between them in query plan
When removing ORDER BY we remove corresponding expression, so to remove only ORDER BY expression, do it before "mergeExpressions" optimization is applied
- do not apply optimization in case of UnionStep on top of Sorting 01104_distributed_one_test.sql
+ 01952_optimize_distributed_group_by_sharding_key
+ 02377_optimize_sorting_by_input_stream_properties_explain.sh
src/Processors/QueryPlan/Optimizations/removeRedundantOrderBy.cpp
Outdated
Show resolved
Hide resolved
|
|
||
| class RemoveRedundantOrderBy : public QueryPlanVisitor<RemoveRedundantOrderBy, debug_logging_enabled> | ||
| { | ||
| std::vector<QueryPlan::Node *> nodes_affect_order; |
There was a problem hiding this comment.
Comment would be helpful.
Maybe not very important, but I'd better try to store the last node from nodes_affect_order in the stack frame.
For this, you can specify a template structure that you can store in the stack.
Also, ideally, somehow make a stack private for derived classes.
KochetovNicolai
left a comment
There was a problem hiding this comment.
Generally ok (if we add FilterStep to check)
+ cleanup + disable optimization in sort performance test since it removes sorting at all
|
Fatal messages in stress test, CI link : cc @vdimir Is it something known? |
|
Checked on commit Repro: DROP TABLE IF EXISTS t;
CREATE TABLE t ( city_id UInt32 ) ENGINE = Memory;
INSERT INTO t VALUES (1), (2);
SET max_rows_in_set_to_optimize_join = 0; --- not necessary
SET join_algorithm = 'full_sorting_merge';
SET query_plan_remove_redundant_order_by = 1;
explain plan description = 1, sorting = 1
SELECT *
FROM ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js1
JOIN ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js2
USING city_id;
SET query_plan_remove_redundant_order_by = 0;
SELECT '----';
explain plan description = 1, sorting = 1
SELECT *
FROM ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js1
JOIN ( SELECT city_id FROM t GROUP BY city_id ORDER BY city_id ) js2
USING city_id;But actually streams before join is sorted: Need to add some additional info to error message to check what ClickHouse/src/QueryPipeline/QueryPipelineBuilder.cpp Lines 353 to 354 in 119501f |
e98981d to
7347f51
Compare
|
@devcrafter I've pushed commit with fix for the case above, feel free to drop it if you'll find proper solution. |
After removing sorting step we need to update sorting properties of input/ouput streams
7347f51 to
6328e02
Compare
@vdimir Thanks! The proper fix is here. Your commit contains better checks during building pipeline for JOIN, so I think, it'd be nice to bring them in but by other PR. |
|
Stress test (ubsan) failure related to #44912 |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Remove redundant sorting, for example, sorting related ORDER BY clauses in subqueries. Implemented on top of query plan. It does similar optimization as
optimize_duplicate_order_by_and_distinctregardingORDER BYclauses, but more generic, since it's applied to any redundant sorting steps (not only caused by ORDER BY clause) and applied to subqueries of any depth. Related to #42648