NodeDB's graph engine uses a native CSR (Compressed Sparse Row) adjacency index with interned node IDs (u32) and labels (u32) — not recursive JOINs pretending to be a graph. At 1 billion edges, CSR uses ~10 GB vs ~60 GB for naive adjacency lists (6x improvement). Sub-millisecond multi-hop traversals, 13 native algorithms, Cypher-subset pattern matching, and GraphRAG fusion — all in the same process as every other engine.
Edges are persisted in a redb B-Tree with forward and reverse indexes, both keyed by a (tenant_id, composite) tuple:
Forward Index (EDGES table):
Key: (tenant_id: u32, "src\x00edge_label\x00dst")
Value: edge properties (MessagePack)
Reverse Index (REVERSE_EDGES table):
Key: (tenant_id: u32, "dst\x00edge_label\x00src")
Value: [] (existence check only)
Tenant isolation is structural: the tenant id is a first-class key component, not a lexical prefix on node names. Node names in the composite portion are user-visible strings. Both tables update atomically in a single write transaction, and the null-byte separator enables prefix scans for outbound traversal within a tenant.
At query time the in-memory CSR is partitioned by tenant (ShardedCsrIndex = one CsrIndex per tenant); algorithms and traversals run against a single tenant's partition:
CsrIndex (per tenant):
out_offsets: Vec<u32> [num_nodes + 1] — offset into target array per node
out_targets: Vec<u32> [num_edges] — destination node IDs (contiguous)
out_labels: Vec<u32> [num_edges] — edge labels (parallel array)
out_weights: Vec<f64> [num_edges] — optional, allocated only when weighted
in_offsets / in_targets / in_labels / in_weights — symmetric for inbound
Each CsrIndex gets a unique partition tag at construction. Public APIs that return dense node indices hand out LocalNodeId { id, partition_tag } — using a node id from one partition with another partition's API panics at the boundary.
Writes go to a mutable buffer and become visible immediately. Compaction merges the buffer into the dense CSR arrays via double-buffered swap when the buffer exceeds 10% of the dense size.
GRAPH INSERT EDGE FROM 'users:alice' TO 'users:bob' TYPE 'KNOWS';
-- With properties (JSON string form):
GRAPH INSERT EDGE FROM 'users:alice' TO 'users:bob' TYPE 'KNOWS'
PROPERTIES '{"since": 2020, "weight": 0.9}';
-- Object literal form (equivalent):
GRAPH INSERT EDGE FROM 'users:alice' TO 'users:bob' TYPE 'KNOWS'
PROPERTIES { since: 2020, weight: 0.9 };GRAPH DELETE EDGE FROM 'users:alice' TO 'users:bob' TYPE 'KNOWS';Removes both forward and reverse index entries atomically. Cascading delete is available for all edges touching a node.
GRAPH TRAVERSE FROM 'users:alice' DEPTH 3;
GRAPH TRAVERSE FROM 'users:alice' DEPTH 2 LABEL 'follows' DIRECTION out;Breadth-first search from a start node. Returns discovered nodes at each depth level.
| Parameter | Default | Description |
|---|---|---|
DEPTH |
2 | Maximum hop count |
LABEL |
(any) | Filter by edge label |
DIRECTION |
out | in, out, or both |
GRAPH NEIGHBORS OF 'users:bob';
GRAPH NEIGHBORS OF 'users:bob' LABEL 'follows' DIRECTION both;Returns immediate neighbors only — no hop expansion.
GRAPH PATH FROM 'users:alice' TO 'users:charlie';
GRAPH PATH FROM 'users:alice' TO 'users:charlie' MAX_DEPTH 5 LABEL 'knows';Cross-core BFS path finding. Returns an ordered list of node IDs, or empty array if no path exists within MAX_DEPTH (default 10).
NodeDB embeds a Cypher-subset pattern engine. MATCH queries arrive through any protocol — pgwire, HTTP, or native MessagePack — and are parsed, optimized, and dispatched to the Data Plane like any other query.
Node bindings:
(a) -- named variable, any label
(a:Person) -- named variable, label filter
(:Person) -- anonymous node, label filter
() -- anonymous, any label
Edge bindings with direction:
-[:KNOWS]-> -- outbound
<-[:KNOWS]- -- inbound
-[:KNOWS]- -- undirected (both directions)
-[r:KNOWS]-> -- named edge binding
Variable-length paths:
-[:KNOWS*1..3]-> -- 1 to 3 hops (BFS expansion)
-[:KNOWS*2..2]-> -- exactly 2 hops
-[:FOLLOWS*]-> -- unlimited hops
Clauses:
| Clause | Description |
|---|---|
MATCH |
Required. Pattern to match against the graph. |
OPTIONAL MATCH |
LEFT JOIN semantics — preserves rows with no match. |
WHERE |
Filter on bound variables. Supports =, !=, <, <=, >, >=. |
WHERE NOT EXISTS { MATCH ... } |
Anti-join — exclude rows matching a sub-pattern. |
RETURN |
Project bindings. Supports aliases (AS). Default: all bound variables. |
ORDER BY |
Sort results. ASC or DESC. |
LIMIT |
Cap result count. |
Multiple comma-separated patterns in one MATCH clause act as self-joins.
Friend-of-friend:
MATCH (a:Person)-[:knows]->(b:Person)-[:knows]->(c:Person)
WHERE a.name = 'Alice'
RETURN b.name, c.name;Variable-length recommendations:
MATCH (u:User)-[:follows*2..3]->(recommended:User)
WHERE u.id = 'you'
RETURN DISTINCT recommended.id
LIMIT 10;Multi-pattern query (self-join):
MATCH (a:Person)-[:works_at]->(company:Company),
(a)-[:located_in]->(city:City),
(company)-[:headquartered_in]->(hq_city:City)
WHERE city.id = hq_city.id
RETURN a.name, company.name;Anti-join (negative pattern):
MATCH (a:User)-[:follows]->(b:User)
WHERE NOT EXISTS { MATCH (b)-[:blocked_by]->(a) }
RETURN a.id, b.id;OPTIONAL MATCH:
MATCH (a:Person)-[:knows]->(b:Person)
OPTIONAL MATCH (b)-[:works_at]->(c:Company)
RETURN a.name, b.name, c.name;Fraud detection pattern:
MATCH (a:Account)-[:transfers_to]->(mid:Account)-[:transfers_to]->(b:Account)
WHERE a.risk_score > 0.7 AND b.risk_score > 0.7
RETURN a.id, b.id, count(*) AS paths
ORDER BY paths DESC
LIMIT 20;- Parse — Tokenizes Cypher syntax into AST (
MatchQuery) - Optimize — Reorders triples by selectivity (bind tightly-connected patterns first)
- Serialize — MessagePack the AST for SPSC transport
- Execute — Data Plane resolves bindings against CSR, expands variable-length paths via BFS, applies WHERE predicates, projects RETURN columns
- Return — Results come back as pgwire rows
All algorithms run directly on the CSR index. SIMD-accelerated where applicable (rank scatter, L1 norm delta, fill operations).
GRAPH ALGO PAGERANK ON social_graph DAMPING 0.85 ITERATIONS 20 TOLERANCE 1e-7;
GRAPH ALGO WCC ON knowledge_graph;
GRAPH ALGO SSSP ON routes FROM 'city:chicago';
GRAPH ALGO COMMUNITY ON products ITERATIONS 10 RESOLUTION 1.0;
GRAPH ALGO BETWEENNESS ON network SAMPLE 500;
GRAPH ALGO KCORE ON collaboration;
GRAPH ALGO TRIANGLES ON social MODE global;
GRAPH ALGO DIAMETER ON web;| Algorithm | What it computes | Key Parameters |
|---|---|---|
| PageRank | Node importance via incoming link structure | DAMPING (0.85), ITERATIONS (20), TOLERANCE (1e-7) |
| WCC | Weakly connected components (union-find with path compression) | — |
| Label Propagation | Community detection via iterative label spreading | ITERATIONS (10) |
| LCC | Local clustering coefficient — how tightly neighbors connect | — |
| SSSP | Single-source shortest path (Dijkstra, rejects negative weights) | FROM (required) |
| Betweenness | Bridge nodes with high traffic (Brandes' algorithm) | SAMPLE (optional, for approximation) |
| Closeness | How close a node is to all others (inverse distance sum) | SAMPLE (optional) |
| Harmonic | Like closeness, but handles disconnected graphs | SAMPLE (optional) |
| Degree | Connection count per node | DIRECTION (in/out/both) |
| Louvain | Community detection via modularity optimization | ITERATIONS (10), RESOLUTION (1.0) |
| Triangles | Triangle count (per-node or global) | MODE (global/per_node) |
| Diameter | Longest shortest path in the graph | — |
| k-Core | Coreness decomposition (peeling algorithm) | — |
Combines vector similarity search with graph traversal in a single query pipeline — zero network hops between engines.
Query Vector
|
v
[1] Vector Search (HNSW) --- top-K semantically similar nodes (seed set)
|
v
[2] Graph Expansion (BFS) -- expand seeds along edges to depth N
|
v
[3] RRF Fusion ------------- merge vector rank + graph hop distance
|
v
Final top-N results with (node_id, rrf_score, vector_rank, vector_distance, hop_distance)
GRAPH RAG FUSION ON entities
QUERY $embedding
VECTOR_TOP_K 50
EXPANSION_DEPTH 2
EDGE_LABEL 'related_to'
DIRECTION both
FINAL_TOP_K 10
RRF_K (60.0, 35.0)
MAX_VISITED 1000;| Parameter | Description |
|---|---|
QUERY |
Query embedding vector |
VECTOR_TOP_K |
Number of seed nodes from vector search |
EXPANSION_DEPTH |
BFS hop count from seeds |
EDGE_LABEL |
Optional edge type filter during expansion |
DIRECTION |
in, out, or both for BFS |
FINAL_TOP_K |
Final result count after fusion |
RRF_K |
Weighting constants (vector_k, graph_k) for RRF scoring |
MAX_VISITED |
Memory budget cap — BFS stops early if exceeded |
The response includes a truncation flag if the memory budget forced early termination.
-- 1. Seed retrieval
SELECT id, content FROM chunks
WHERE embedding <-> $query_embedding
LIMIT 10;
-- 2. Graph expansion from seeds
MATCH (c:Chunk)-[:mentions]->(e:Entity)-[:related_to*1..3]->(related:Entity)
WHERE c.id IN ('chunk-042', 'chunk-017')
RETURN DISTINCT related.id, related.description;
-- 3. Assemble expanded context for LLMIn multi-shard deployments, graph algorithms and pattern matching execute via Bulk Synchronous Parallel (BSP) coordination.
Each shard maintains a local rank vector, out-degree array, and dangling node mask. Cross-shard edges are tracked as "ghost edges." Each superstep:
- Scatter — distribute rank to outbound edges
- Boundary exchange — package contributions for ghost edges, send to target shards
- Gather — add incoming contributions from other shards
- Convergence check — L1 norm delta across all shards
- Repeat until convergence or max iterations
Union-find with cross-shard label propagation:
- Each shard computes local WCC
- Boundary edges send component labels to target shards
- Target shards adopt the lexicographically smaller label
- Repeat until no labels change (guaranteed convergence)
Scatter-gather with continuations:
- Coordinator broadcasts MATCH query to all shards
- Each shard executes the pattern on its local CSR
- Ghost edges produce
PatternContinuationmessages (partial bindings + remaining pattern + start node) - Target shards resume from continuations
- Coordinator collects completed rows and new continuations
- Repeat until no pending continuations or max rounds reached
- Vector Search — GraphRAG combines graph traversal with vector similarity
- Full-Text Search — BM25 results can be re-ranked using graph context
- GraphRAG Patterns — Detailed retrieval-augmented generation with graphs