-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Description
Hi,
I'm creating a table containing 3 columns uid, ev and timestamp and running sequencematch function grouped by uid and searching for a pattern on events. The table has 10 million unique uids and 100 evs per uid, so a total of 1 billion rows. The query below is running out of memory(which is 8gb on the test machine). I understand from the docs that merge tree is sorted by the primary key in each part.
Is it guaranteed that a uid is present in a single part as it is part of the primary key? If so can we not call merge and insert for each part and not at the end and deallocate the data?
If uid is not guaranteed to be in the same part can you suggest a better alternative to achieve the below result.
Table:
CREATE TABLE ev ( uid String, ev String, t DateTime, d Date) ENGINE = MergeTree(d, (uid, t, d), 8192)
Query:
SELECT count()
FROM
(
SELECT uid
FROM ev
GROUP BY uid
HAVING sequenceMatch('{"pattern":["ev1","ev2"]}')(t, ev)
)
Thanks