Fast regex search: indexing text for agent tools

Every time an AI agent searches your code, it spawns a process, walks the filesystem, opens every file, and serializes the output. The codebase barely changes between queries. codedb indexes once on startup and serves structural queries at sub-millisecond latency. 1,000x faster than the tools your agent uses today.

The problem

When Claude Code, Codex, or Cursor needs to find a function, it shells out to ripgrep. When it needs an outline, it reads the whole file. When it needs the dependency graph, it greps import statements. Every query pays the same cost: process startup, filesystem walk, regex scan.

This is a known problem. Cursor built client-side trigram indexes to speed up regex search in their editor. OpenAI's Codex runs searches through a sandboxed shell, paying process startup on every call. We'd been working on the same problem independently, and took it further: inverted word indexes for O(1) identifier lookup, structural outlines, reverse dependency graphs, and version tracking. All served over MCP so any agent can use it, not just one editor.

A single ripgrep search on the zigrepper repo (849 .zig files) costs ~59ms and returns ~1KB of raw line matches. Agents make hundreds of these per session. The real expense isn't wall-clock time. It's the token budget burned on unstructured output that the model has to parse.

How it works

codedb is a single Zig binary that starts as an MCP server. On startup it walks the codebase, builds five in-memory indexes, and serves queries over JSON-RPC. A background watcher polls every 2 seconds, checking mtimes first, then content hashes, so the indexes stay fresh without full rescans. Re-indexing a single changed file takes less than 2ms.

MCP / HTTPstdio JSON-RPC 2.0, REST :7719

Explorer5 in-memory indexes, O(1) lookups

Storeappend-only version log, seq counter

Watcher2s poll, mtime+hash, <2ms re-index

The five indexes cover different query patterns:

Outlinesfunctions, structs, imports

0.05ms

Word Indexidentifier → O(1) lookup

4ns

Trigram3-byte ngram → file sets

110μs

Dep Graphreverse import tracking

0.05ms

Contentraw file cache (no disk)

0.04ms

The trigram index maps every 3-byte sequence to the set of files containing it. Searching for handleAuth intersects the trigram sets and only scans the candidates, 5.5x faster than brute force. For exact identifiers, the inverted word index is even faster: a single hash lookup returns all locations in 4 nanoseconds.

Brute force

603 μs

Trigram index

110 μs

Word index

4 ns

Synthetic benchmark: 500 files, 100K lines.

Performance

All benchmarks on Apple M4 Pro. MCP queries hit pre-built in-memory indexes (20 iterations averaged). CLI tools include process startup cost.

CLI tools

codedb MCP

File tree1,323x

52.9 ms

0.04 ms

Full-text search106x

5.3 ms

0.05 ms

Symbol search63x

6.3 ms

0.1 ms

Word lookup180x

7.2 ms

0.04 ms

Dep graph44x

2.2 ms

0.05 ms

codedb2 repo, 20 files. Apple M4 Pro. MCP = pre-indexed, 20 iterations avg.

The latency gap matters, but the token gap matters more. ripgrep dumps raw lines. codedb returns structured results, exactly what the agent needs.

grep / ripgrep

codedb MCP

Search: 'allocator'1,628x fewer

32,564 tok

20 tok

Search: 'init'1,213x fewer

18,200 tok

15 tok

Outline: main.zig107x fewer

4,800 tok

45 tok

Token estimates based on response byte size.

Cold start scales linearly. Snapshots make subsequent startups instant.

codedb

16 files56KB lines<50 ms

merjs

100 files17.3k lines~80 ms

vitessio/vitess

5,028 files2.18M lines~2.5 s

openclaw/openclaw

7,364 files128MB lines2.9 s

After indexing, hot query latencies on openclaw (7,364 files, 128MB):

0.1ms

codedb_status

0.2ms

codedb_word

0.7ms

codedb_outline

1.3ms

codedb_deps

3.9ms

codedb_symbol

29ms

codedb_tree

muonry + codedb

muonry is stateless: SIMD search, structural reads, atomic edits, all at 0.7ms per op. codedb is stateful: in-memory indexes, sub-millisecond lookups. They solve different problems. Together they're the full stack.

Here's the difference on a real task: editing the handleBatch function in a 1,184-line file.

Traditional (rg + cat)

rg "handleBatch" src/264 B

cat mcp.zig59,120 B

cat mcp.zig (verify)59,120 B

Total: 118,504 bytes

codedb + muonry

codedb_word100 B0.007ms

muonry symbol12,033 B

muonry edit~500 B diff

codedb_deps200 B0.05ms

Total: 12,833 bytes (9x less)

The traditional approach sends 118K bytes to the model, most of it the full file, twice. With codedb + muonry, the agent reads only the function it needs, edits return the diff inline, and the word lookup is a 7-microsecond hash hit instead of a 59ms filesystem scan.

They also share a wire protocol. When muonry edits a file, codedb's indexes update instantly. No polling delay, no stale results. The agent doesn't coordinate this. It just works.

Edit → queryable index

Without crosstalk2,020ms

edit + 2s poll delay

With crosstalk20ms

edit + instant sync

100x faster

Measured on zigrepper repo (849 .zig files). Apple M4 Pro.

Get started

# Install codedb (open source, MIT license)
curl -fsSL https://codedb.codegraff.com/install.sh | sh

# Install muonry + the full zig* toolchain
curl -fsSL https://codegraff.com/install.sh | sh

Both installers auto-register as MCP servers. Your agent gets codedb's 16 tools: the core 12 (tree, outline, symbol, search, word, hot, deps, read, edit, changes, status, snapshot) plus codedb_remote for querying any public GitHub repo without cloning, codedb_bundle for full codebase snapshots, codedb_projects for multi-project management, and codedb_index for on-demand re-indexing.

codedb_remote is worth calling out: it lets your agent query any public GitHub repo through codedb.codegraff.com cloud intelligence. File trees, symbol outlines, code search on external repos, no cloning required. Think DeepWiki, but built into your agent's toolchain.

codedb is fully open source at github.com/justrach/codedb. 12,853 lines of Zig, 199 unit tests, zero runtime dependencies. We built it because AI agents deserve the same structural code intelligence that IDEs have had for decades. Stars and PRs welcome.

How codedb indexes your codebase for AI agents

The problem

How it works

Performance

muonry + codedb

Get started