Jean-Gabriel Doyon (045d41c5) at 18 Mar 19:57
fix(gitlab-client): prevent panic on re-polling completed response ...
Jean-Gabriel Doyon (c4342a75) at 18 Mar 19:42
refactor(indexer): promote repository resolver logs to info level
Jean-Gabriel Doyon (3c38e038) at 18 Mar 19:13
refactor(indexer): extract MAX_FRAME_LENGTH constant in BlobStream
Jean-Gabriel Doyon (469cd025) at 18 Mar 19:04
fix(indexer): harden repository cache security and error handling
Jean-Gabriel Doyon (ef3240d6) at 18 Mar 18:40
refactor: stream changed paths and reorganize repository module
Jean-Gabriel Doyon (cf8b5ed0) at 18 Mar 18:39
refactor: stream changed paths and reorganize repository module
Jean-Gabriel Doyon (96e41825) at 18 Mar 18:34
refactor: stream changed paths and reorganize repository module
Jean-Gabriel Doyon (ec3780fe) at 18 Mar 18:32
refactor: stream changed paths and reorganize repository module
Jean-Gabriel Doyon (ff1f2684) at 18 Mar 18:28
refactor: stream changed paths and reorganize repository module
Jean-Gabriel Doyon (23a95bf0) at 18 Mar 18:25
refactor: stream changed paths and reorganize repository module
Jean-Gabriel Doyon (4f2061c2) at 18 Mar 18:15
refactor: stream changed paths and reorganize repository module
Good stuff! We just discussed this and it's already done, amazing!
Adds missing bounds to all unbounded arrays, objects, and string values in the query DSL. Prevents query complexity explosion from crafted payloads (e.g. 10,000 filters → 10,000 AND clauses).
| Field | Attack | Bound added |
|---|---|---|
filters per node |
name='a' AND name='b' AND ... × 10k |
maxProperties: 10 |
filters per relationship |
Same on edge filters | maxProperties: 5 |
columns list |
1000 column names → huge SELECT | maxItems: 50 |
aggregations |
1000 aggregate expressions | maxItems: 10 |
rel_types (path/neighbors) |
1000 types → huge IN clause | maxItems: 10 |
RelationshipType array |
1000 types per relationship | maxItems: 10 |
FilterValue strings |
100MB string in a filter value | maxLength: 1024 |
FilterValue arrays |
Nested arrays of values | maxItems: 100 |
contains/starts_with/ends_with value |
100MB LIKE pattern | maxLength: 1024 |
All bounds are enforced at two layers:
graph_query.schema.json) — rejects before parsingcheck_depth in validate.rs — rejects after parsing if schema was bypassed| Field | Bound |
|---|---|
nodes |
maxItems: 5 |
relationships |
maxItems: 5 |
node_ids |
maxItems: 500 |
max_hops |
maximum: 3 |
max_depth |
maximum: 3 |
limit |
maximum: 1000 |
IN values |
maxItems: 100 |
Jean-Gabriel Doyon (d9f7e0d8) at 18 Mar 17:57
refactor: consolidate response status checks and improve blob decod...
Jean-Gabriel Doyon (233f0c6a) at 18 Mar 17:55
refactor: consolidate response status checks and improve blob decod...
Jean-Gabriel Doyon (2caa38dd) at 18 Mar 17:26
feat(indexer): stream blob response through LengthDelimitedCodec
Jean-Gabriel Doyon (ff0d4b27) at 18 Mar 17:12
Jean-Gabriel Doyon (c2390b8b) at 18 Mar 17:12
Merge branch 'edge-lowcardinality-and-set-index' into 'main'
... and 1 more commit
Per @ahegyi's feedback on !592:
source_kind, relationship_kind, target_kind from String to LowCardinality(String)
idx_relationship from bloom_filter to set(100)
relationship_kind has 21 distinct values, source_kind/target_kind have ~24. LowCardinality dictionary-encodes these, reducing read bytes. set(100) stores the exact value set per block with zero false positives, vs bloom_filter's probabilistic approach — better fit for low cardinality columns.
| Config | Time (median) | Peak Memory | Read Bytes |
|---|---|---|---|
| bloom_filter | 40ms | 207 MiB | 258.80 MiB |
| set(100) | 37ms | 207 MiB | 258.80 MiB |
| set(100) + LowCardinality | 35ms | 204 MiB | 201.28 MiB |
Both index types skip the same granules (775/1426). LowCardinality cuts read bytes by 22% (259 → 201 MiB) via dictionary encoding.
TY for the quick turnaround!