Every time an AI agent searches your code, it spawns a process, walks the filesystem, opens every file, and serializes the output. The codebase barely changes between queries. codedb indexes once on startup and serves structural queries at sub-millisecond latency. 1,000x faster than the tools your agent uses today.
The problem
When Claude Code, Codex, or Cursor needs to find a function, it shells out to ripgrep. When it needs an outline, it reads the whole file. When it needs the dependency graph, it greps import statements. Every query pays the same cost: process startup, filesystem walk, regex scan.
This is a known problem. Cursor built client-side trigram indexes to speed up regex search in their editor. OpenAI's Codex runs searches through a sandboxed shell, paying process startup on every call. We'd been working on the same problem independently, and took it further: inverted word indexes for O(1) identifier lookup, structural outlines, reverse dependency graphs, and version tracking. All served over MCP so any agent can use it, not just one editor.
A single ripgrep search on the zigrepper repo (849 .zig files) costs ~59ms and returns ~1KB of raw line matches. Agents make hundreds of these per session. The real expense isn't wall-clock time. It's the token budget burned on unstructured output that the model has to parse.
How it works
codedb is a single Zig binary that starts as an MCP server. On startup it walks the codebase, builds five in-memory indexes, and serves queries over JSON-RPC. A background watcher polls every 2 seconds, checking mtimes first, then content hashes, so the indexes stay fresh without full rescans. Re-indexing a single changed file takes less than 2ms.
The five indexes cover different query patterns:
0.05ms4ns110μs0.05ms0.04msThe trigram index maps every 3-byte sequence to the set of files containing it. Searching for handleAuth intersects the trigram sets and only scans the candidates, 5.5x faster than brute force. For exact identifiers, the inverted word index is even faster: a single hash lookup returns all locations in 4 nanoseconds.
Synthetic benchmark: 500 files, 100K lines.
Performance
All benchmarks on Apple M4 Pro. MCP queries hit pre-built in-memory indexes (20 iterations averaged). CLI tools include process startup cost.
codedb2 repo, 20 files. Apple M4 Pro. MCP = pre-indexed, 20 iterations avg.
The latency gap matters, but the token gap matters more. ripgrep dumps raw lines. codedb returns structured results, exactly what the agent needs.
Token estimates based on response byte size.
Cold start scales linearly. Snapshots make subsequent startups instant.
After indexing, hot query latencies on openclaw (7,364 files, 128MB):
muonry + codedb
muonry is stateless: SIMD search, structural reads, atomic edits, all at 0.7ms per op. codedb is stateful: in-memory indexes, sub-millisecond lookups. They solve different problems. Together they're the full stack.
Here's the difference on a real task: editing the handleBatch function in a 1,184-line file.
Traditional (rg + cat)
codedb + muonry
The traditional approach sends 118K bytes to the model, most of it the full file, twice. With codedb + muonry, the agent reads only the function it needs, edits return the diff inline, and the word lookup is a 7-microsecond hash hit instead of a 59ms filesystem scan.
They also share a wire protocol. When muonry edits a file, codedb's indexes update instantly. No polling delay, no stale results. The agent doesn't coordinate this. It just works.
Edit → queryable index
Measured on zigrepper repo (849 .zig files). Apple M4 Pro.
Get started
# Install codedb (open source, MIT license) curl -fsSL https://codedb.codegraff.com/install.sh | sh # Install muonry + the full zig* toolchain curl -fsSL https://codegraff.com/install.sh | sh
Both installers auto-register as MCP servers. Your agent gets codedb's 16 tools: the core 12 (tree, outline, symbol, search, word, hot, deps, read, edit, changes, status, snapshot) plus codedb_remote for querying any public GitHub repo without cloning, codedb_bundle for full codebase snapshots, codedb_projects for multi-project management, and codedb_index for on-demand re-indexing.
codedb_remote is worth calling out: it lets your agent query any public GitHub repo through codedb.codegraff.com cloud intelligence. File trees, symbol outlines, code search on external repos, no cloning required. Think DeepWiki, but built into your agent's toolchain.
codedb is fully open source at github.com/justrach/codedb. 12,853 lines of Zig, 199 unit tests, zero runtime dependencies. We built it because AI agents deserve the same structural code intelligence that IDEs have had for decades. Stars and PRs welcome.