Skip to content

Aggregate function running out of memory #38

@karteek-gamooga

Description

@karteek-gamooga

Hi,

I'm creating a table containing 3 columns uid, ev and timestamp and running sequencematch function grouped by uid and searching for a pattern on events. The table has 10 million unique uids and 100 evs per uid, so a total of 1 billion rows. The query below is running out of memory(which is 8gb on the test machine). I understand from the docs that merge tree is sorted by the primary key in each part.

Is it guaranteed that a uid is present in a single part as it is part of the primary key? If so can we not call merge and insert for each part and not at the end and deallocate the data?

If uid is not guaranteed to be in the same part can you suggest a better alternative to achieve the below result.

Table:
CREATE TABLE ev ( uid String, ev String, t DateTime, d Date) ENGINE = MergeTree(d, (uid, t, d), 8192)

Query:
SELECT count()
FROM
(
SELECT uid
FROM ev
GROUP BY uid
HAVING sequenceMatch('{"pattern":["ev1","ev2"]}')(t, ev)
)

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions