How codedb indexes your codebase for AI agents

Rach Pradhan · 6 min read

Every time an AI agent searches your code, it spawns a process, walks the filesystem, opens every file, and serializes the output. The codebase barely changes between queries. codedb indexes once on startup and serves structural queries at sub-millisecond latency. 1,000x faster than the tools your agent uses today.

The problem

When Claude Code, Codex, or Cursor needs to find a function, it shells out to ripgrep. When it needs an outline, it reads the whole file. When it needs the dependency graph, it greps import statements. Every query pays the same cost: process startup, filesystem walk, regex scan.

This is a known problem. Cursor built client-side trigram indexes to speed up regex search in their editor. OpenAI's Codex runs searches through a sandboxed shell, paying process startup on every call. We'd been working on the same problem independently, and took it further: inverted word indexes for O(1) identifier lookup, structural outlines, reverse dependency graphs, and version tracking. All served over MCP so any agent can use it, not just one editor.

A single ripgrep search on the zigrepper repo (849 .zig files) costs ~59ms and returns ~1KB of raw line matches. Agents make hundreds of these per session. The real expense isn't wall-clock time. It's the token budget burned on unstructured output that the model has to parse.

How it works

codedb is a single Zig binary that starts as an MCP server. On startup it walks the codebase, builds five in-memory indexes, and serves queries over JSON-RPC. A background watcher polls every 2 seconds, checking mtimes first, then content hashes, so the indexes stay fresh without full rescans. Re-indexing a single changed file takes less than 2ms.

MCP / HTTPstdio JSON-RPC 2.0, REST :7719
Explorer5 in-memory indexes, O(1) lookups
Storeappend-only version log, seq counter
Watcher2s poll, mtime+hash, <2ms re-index

The five indexes cover different query patterns:

Outlinesfunctions, structs, imports
0.05ms
Word Indexidentifier → O(1) lookup
4ns
Trigram3-byte ngram → file sets
110μs
Dep Graphreverse import tracking
0.05ms
Contentraw file cache (no disk)
0.04ms

The trigram index maps every 3-byte sequence to the set of files containing it. Searching for handleAuth intersects the trigram sets and only scans the candidates, 5.5x faster than brute force. For exact identifiers, the inverted word index is even faster: a single hash lookup returns all locations in 4 nanoseconds.

Brute force
603 μs
Trigram index
110 μs
Word index
4 ns

Synthetic benchmark: 500 files, 100K lines.

Performance

All benchmarks on Apple M4 Pro. MCP queries hit pre-built in-memory indexes (20 iterations averaged). CLI tools include process startup cost.

CLI tools
codedb MCP
File tree1,323x
52.9 ms
0.04 ms
Full-text search106x
5.3 ms
0.05 ms
Symbol search63x
6.3 ms
0.1 ms
Word lookup180x
7.2 ms
0.04 ms
Dep graph44x
2.2 ms
0.05 ms

codedb2 repo, 20 files. Apple M4 Pro. MCP = pre-indexed, 20 iterations avg.

The latency gap matters, but the token gap matters more. ripgrep dumps raw lines. codedb returns structured results, exactly what the agent needs.

grep / ripgrep
codedb MCP
Search: 'allocator'1,628x fewer
32,564 tok
20 tok
Search: 'init'1,213x fewer
18,200 tok
15 tok
Outline: main.zig107x fewer
4,800 tok
45 tok

Token estimates based on response byte size.

Cold start scales linearly. Snapshots make subsequent startups instant.

codedb
16 files56KB lines<50 ms
merjs
100 files17.3k lines~80 ms
vitessio/vitess
5,028 files2.18M lines~2.5 s
openclaw/openclaw
7,364 files128MB lines2.9 s

After indexing, hot query latencies on openclaw (7,364 files, 128MB):

0.1ms
codedb_status
0.2ms
codedb_word
0.7ms
codedb_outline
1.3ms
codedb_deps
3.9ms
codedb_symbol
29ms
codedb_tree

muonry + codedb

muonry is stateless: SIMD search, structural reads, atomic edits, all at 0.7ms per op. codedb is stateful: in-memory indexes, sub-millisecond lookups. They solve different problems. Together they're the full stack.

Here's the difference on a real task: editing the handleBatch function in a 1,184-line file.

Traditional (rg + cat)

rg "handleBatch" src/264 B
cat mcp.zig59,120 B
<agent edits>
cat mcp.zig (verify)59,120 B
Total: 118,504 bytes

codedb + muonry

codedb_word100 B0.007ms
muonry symbol12,033 B
muonry edit~500 B diff
codedb_deps200 B0.05ms
Total: 12,833 bytes (9x less)

The traditional approach sends 118K bytes to the model, most of it the full file, twice. With codedb + muonry, the agent reads only the function it needs, edits return the diff inline, and the word lookup is a 7-microsecond hash hit instead of a 59ms filesystem scan.

They also share a wire protocol. When muonry edits a file, codedb's indexes update instantly. No polling delay, no stale results. The agent doesn't coordinate this. It just works.

Edit → queryable index

Without crosstalk2,020ms
edit + 2s poll delay
With crosstalk20ms
edit + instant sync
100x faster

Measured on zigrepper repo (849 .zig files). Apple M4 Pro.

Get started

~
# Install codedb (open source, MIT license)
curl -fsSL https://codedb.codegraff.com/install.sh | sh

# Install muonry + the full zig* toolchain
curl -fsSL https://codegraff.com/install.sh | sh

Both installers auto-register as MCP servers. Your agent gets codedb's 16 tools: the core 12 (tree, outline, symbol, search, word, hot, deps, read, edit, changes, status, snapshot) plus codedb_remote for querying any public GitHub repo without cloning, codedb_bundle for full codebase snapshots, codedb_projects for multi-project management, and codedb_index for on-demand re-indexing.

codedb_remote is worth calling out: it lets your agent query any public GitHub repo through codedb.codegraff.com cloud intelligence. File trees, symbol outlines, code search on external repos, no cloning required. Think DeepWiki, but built into your agent's toolchain.

codedb is fully open source at github.com/justrach/codedb. 12,853 lines of Zig, 199 unit tests, zero runtime dependencies. We built it because AI agents deserve the same structural code intelligence that IDEs have had for decades. Stars and PRs welcome.