Skip to content

Revert "Merge pull request #34055 from palegre-tiny/groupSortedArray"#36822

Merged
alexey-milovidov merged 5 commits intomasterfrom
revert-group-array-sorted
May 6, 2022
Merged

Revert "Merge pull request #34055 from palegre-tiny/groupSortedArray"#36822
alexey-milovidov merged 5 commits intomasterfrom
revert-group-array-sorted

Conversation

@alexey-milovidov
Copy link
Member

@alexey-milovidov alexey-milovidov commented Apr 30, 2022

Changelog category (leave one):

  • Bug Fix (user-visible misbehaviour in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Remove function groupArraySorted which has a bug.

This reverts commit f055d7b, reversing
changes made to 4ec3c35.

Bug has been found.
The function will not be returned back unless both of these conditions will be satistied:

  1. Bug will be fixed Fix groupArraySorted() for optimize_aggregation_in_order and WITH TOTALS #36815.
  2. Fuzz testing will be improved to automatically find all these bugs before merge.

@alexey-milovidov
Copy link
Member Author

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented May 1, 2022

update

✅ Branch has been successfully updated

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label May 1, 2022
@azat
Copy link
Member

azat commented May 1, 2022

@alexey-milovidov maybe it worth give this function a try, after #36815? (although implementation looks complex, at least to me)

@alexey-milovidov
Copy link
Member Author

It's not the first time we forgot to test ser/de or merge, let's do fuzzing of table engines (replace every table to distributed table).

@alexey-milovidov
Copy link
Member Author

Btw, why this particular function useful?

@azat
Copy link
Member

azat commented May 1, 2022

Btw, why this particular function useful?

I don't need it, but I guess original developer (@palegre-tiny) will not be happy if it will be removed.
That said that sometimes reverts are too quick, it is great for the code base (I guess), but not for users.
Also this function hadn't been marked as experimental, so someone can use it.

P.S. I don't have any strong opinion here, this is just basic thoughts about almost every "function"

@alexey-milovidov
Copy link
Member Author

Ok, let's merge the fix first.

@alexey-milovidov
Copy link
Member Author

But there are other code cleanups required, maybe you can make full code review and cleanup of this function?

@UnamedRus
Copy link
Contributor

UnamedRus commented May 2, 2022

Btw, why this particular function useful?

Because it should allow you to get top x values by sorting in cheap way.
And doing arraySort(groupArray(xxx)) is costly:

  1. how array functions works in memory.
  2. in case of distributed query the whose groupArray state will be send to the init node.

@palegre-tiny
Copy link
Contributor

palegre-tiny commented May 3, 2022

We need this feature for some scenarios and we didn't see any acceptable alternatives. I could clean up or simplify the code if it's not clear.

@alexey-milovidov
Copy link
Member Author

Do you mind if I will revert it temporarily just because it has memory safety issue and our master branch must be 100% memory safe? Then we will merge #36815 when it will be ready.

@palegre-tiny
Copy link
Contributor

Do you mind if I will revert it temporarily just because it has memory safety issue and our master branch must be 100% memory safe? Then we will merge #36815 when it will be ready.

Sure, lets do like that. Thank you.


[1,1,2,2] 0 0 6000
-- sum() can be compiled, check that compiled version works correctly
select groupArraySorted(partition), parent_key, child_key, sum(value) from data_02233 group by parent_key, child_key with totals order by parent_key, child_key settings optimize_aggregation_in_order=1, compile_aggregate_expressions=1, min_count_to_compile_aggregate_expression=0;
Copy link
Member

@azat azat May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't have explicit test for optimize_aggregation_in_order and compiled aggregate functions...
Will address this later.

@Felixoid Felixoid added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants