Skip to content

justrach/codedb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

172 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

codedb

Release License Zig 0.15 Alpha Ask DeepWiki

codedb

Code intelligence server for AI agents. Zig core. MCP native. Zero dependencies.

Structural indexing Β· Trigram search Β· Word index Β· Dependency graph Β· File watching Β· MCP + HTTP

Status Β· Install Β· Quick Start Β· MCP Tools Β· Benchmarks Β· Architecture Β· Data & Privacy Β· Building


Status

Alpha software β€” API is stabilizing but may change

codedb works and is used daily in production AI workflows, but:

  • Language support β€” Zig, Python, TypeScript/JavaScript, Rust, PHP, C# (more planned)
  • No auth β€” HTTP server binds to localhost only
  • Snapshot format may change between versions
  • MCP protocol is JSON-RPC 2.0 over stdio (stable)
What works today What's in progress
16 MCP tools for full codebase intelligence Additional language parsers (HCL/Go)
Trigram v2: integer doc IDs, batch-accumulate, merge intersect Incremental segment-based indexing
538x faster than ripgrep on pre-indexed queries WASM target for Cloudflare Workers
O(1) inverted word index for identifier lookup Multi-project support
Structural outlines (functions, structs, imports) mmap-backed trigram index
Reverse dependency graph
Atomic line-range edits with version tracking
Auto-registration in Claude, Codex, Gemini, Cursor
Polling file watcher with filtered directory walker
Portable snapshot for instant MCP startup
Singleton MCP with PID lock + 2min idle timeout
Sensitive file blocking (.env, credentials, keys)
Codesigned + notarized macOS binaries
SHA256 checksum verification in installer
Cross-platform: macOS (ARM/x86), Linux (ARM/x86)

⚑ Install

curl -fsSL https://codedb.codegraff.com/install.sh | sh

Downloads the binary for your platform and auto-registers codedb as an MCP server in Claude Code, Codex, Gemini CLI, and Cursor.

Platform Binary Signed
macOS ARM64 (Apple Silicon) codedb-darwin-arm64 βœ… codesigned + notarized
macOS x86_64 (Intel) codedb-darwin-x86_64 βœ… codesigned + notarized
Linux ARM64 codedb-linux-arm64 β€”
Linux x86_64 codedb-linux-x86_64 β€”

Or install manually from GitHub Releases.


⚑ Quick Start

As an MCP server (recommended)

After installing, codedb is automatically registered. Just open a project and the 12 MCP tools are available to your AI agent.

# Manual MCP start (auto-configured by install script)
codedb mcp /path/to/your/project

As an HTTP server

codedb serve /path/to/your/project
# listening on localhost:7719

CLI

codedb tree /path/to/project          # file tree with symbol counts
codedb outline src/main.zig           # symbols in a file
codedb find AgentRegistry             # find symbol definitions
codedb search "handleAuth"            # full-text search (trigram-accelerated)
codedb word Store                     # exact word lookup (inverted index, O(1))
codedb hot                            # recently modified files

πŸ”§ MCP Tools

16 tools over the Model Context Protocol (JSON-RPC 2.0 over stdio):

Tool Description
codedb_tree Full file tree with language, line counts, symbol counts
codedb_outline Symbols in a file: functions, structs, imports, with line numbers
codedb_symbol Find where a symbol is defined across the codebase
codedb_search Trigram-accelerated full-text search (supports regex, scoped results)
codedb_word O(1) inverted index word lookup
codedb_hot Most recently modified files
codedb_deps Reverse dependency graph (which files import this file)
codedb_read Read file content (supports line ranges, hash-based caching)
codedb_edit Apply line-range edits (replace, insert, delete β€” atomic writes)
codedb_changes Changed files since a sequence number
codedb_status Index status (file count, current sequence)
codedb_snapshot Full pre-rendered JSON snapshot of the codebase
codedb_bundle Batch multiple read-only queries in one call (max 20 ops)
codedb_remote Query any GitHub repo via cloud intelligence β€” no local clone needed
codedb_projects List all locally indexed projects on this machine
codedb_index Index a local folder and create a codedb.snapshot

codedb_remote β€” Cloud Intelligence

Query any public GitHub repo without cloning it. Powered by codedb.codegraff.com.

# Get the file tree of an external repo
codedb_remote repo="vercel/next.js" action="proxy.php?url=https%3A%2F%2Fgithub.com%2Ftree"

# Search for code in a dependency
codedb_remote repo="justrach/merjs" action="proxy.php?url=https%3A%2F%2Fgithub.com%2Fsearch" query="handleRequest"

# Get symbol outlines
codedb_remote repo="justrach/merjs" action="proxy.php?url=https%3A%2F%2Fgithub.com%2Foutline"

# Get repo metadata
codedb_remote repo="justrach/merjs" action="proxy.php?url=https%3A%2F%2Fgithub.com%2Fmeta"

Actions: tree, outline, search, meta

Note: This tool calls codedb.codegraff.com via HTTPS. No API key required. The service must be available for this tool to work.

CLI Commands

Command Description
codedb tree Show file tree with language and symbol counts
codedb outline <path> List all symbols in a file
codedb find <name> Find where a symbol is defined
codedb search <query> Full-text search (trigram, case-insensitive)
codedb search --regex <pattern> Regex search
codedb word <identifier> Exact word lookup via inverted index
codedb hot Recently modified files
codedb snapshot Write codedb.snapshot to project root
codedb serve HTTP daemon on :7719
codedb mcp [path] JSON-RPC/MCP server over stdio
codedb update Self-update via install script
codedb --version Print version

Options: --no-telemetry (or set CODEDB_NO_TELEMETRY env var)

Example: agent explores a codebase

# 1. Get the file tree
curl localhost:7719/tree
# β†’ src/main.zig      (zig, 55L, 4 symbols)
#   src/store.zig     (zig, 156L, 12 symbols)
#   src/agent.zig     (zig, 135L, 8 symbols)

# 2. Drill into a file
curl "localhost:7719/outline?path=src/store.zig"
# β†’ L20: struct_def Store
#   L30: function init
#   L55: function recordSnapshot

# 3. Find a symbol across the codebase
curl "localhost:7719/symbol?name=AgentRegistry"
# β†’ {"path":"src/agent.zig","line":30,"kind":"struct_def"}

# 4. Full-text search
curl "localhost:7719/search?q=handleAuth&max=10"

# 5. Check what changed
curl "localhost:7719/changes?since=42"

πŸ“Š Benchmarks

Measured on Apple M4 Pro, 48GB RAM. MCP = pre-indexed warm queries (20 iterations avg). CLI/external tools include process startup (3 iterations avg). Ground truth verified against Python reference implementation.

Latency β€” codedb MCP vs codedb CLI vs ast-grep vs ripgrep vs grep

codedb repo (20 files, 12.6k lines):

Query codedb MCP codedb CLI ast-grep ripgrep grep MCP speedup
File tree 0.04 ms 52.9 ms β€” β€” β€” 1,253x vs CLI
Symbol search (init) 0.10 ms 54.1 ms 3.2 ms 6.3 ms 6.5 ms 549x vs CLI
Full-text search (allocator) 0.05 ms 60.7 ms 3.2 ms 5.3 ms 6.6 ms 1,340x vs CLI
Word index (self) 0.04 ms 59.7 ms n/a 7.2 ms 6.5 ms 1,404x vs CLI
Structural outline 0.05 ms 53.5 ms 3.1 ms β€” 2.4 ms 1,143x vs CLI
Dependency graph 0.05 ms 2.2 ms n/a n/a n/a 45x vs CLI

merjs repo (100 files, 17.3k lines):

Query codedb MCP codedb CLI ast-grep ripgrep grep MCP speedup
File tree 0.05 ms 54.0 ms β€” β€” β€” 1,173x vs CLI
Symbol search (init) 0.07 ms 54.4 ms 3.4 ms 6.3 ms 3.6 ms 758x vs CLI
Full-text search (allocator) 0.03 ms 54.1 ms 2.9 ms 5.1 ms 3.7 ms 1,554x vs CLI
Word index (self) 0.04 ms 54.7 ms n/a 6.3 ms 4.2 ms 1,518x vs CLI
Structural outline 0.04 ms 54.9 ms 3.4 ms β€” 2.5 ms 1,243x vs CLI

rtk-ai/rtk repo (329 files) β€” codedb vs rtk vs ripgrep vs grep:

Tool Search "agent" Speedup
codedb (pre-indexed) 0.065 ms baseline
rtk 37 ms 569x slower
ripgrep 45 ms 692x slower
grep 80 ms 1,231x slower

Token Efficiency

codedb returns structured, relevant results β€” not raw line dumps. For AI agents, this means dramatically fewer tokens per query:

Repo codedb MCP ripgrep / grep Reduction
codedb (search allocator) ~20 tokens ~32,564 tokens 1,628x fewer
merjs (search allocator) ~20 tokens ~4,007 tokens 200x fewer

Indexing Speed

Indexing Speed

codedb v0.2.52+ uses trigram v2 (integer doc IDs, batch-accumulate, sorted merge intersection):

Repo Files Lines Cold start Per file vs v0.2.3
codedb 20 12.6k 17 ms 0.85 ms β€”
merjs 100 17.3k 16 ms 0.16 ms β€”
5,200 mixed files 5,200 β€” 310 ms 0.06 ms -36%
openclaw/openclaw 11,281 2.29M 2.9 s 6.66 ms β€”

Indexes are built once on startup. After that, the file watcher keeps them updated incrementally (single-file re-index: <2ms). Queries never re-scan the filesystem. For repos >1000 files, file contents are released after indexing to save ~300-500MB.

Why codedb is fast

  • MCP server indexes once on startup β†’ all queries hit in-memory data structures (O(1) hash lookups)
  • CLI pays ~55ms process startup + full filesystem scan on every invocation
  • ast-grep re-parses all files through tree-sitter on every call (~3ms)
  • ripgrep/grep brute-force scan every file on every call (~5-7ms)
  • The MCP advantage: index once, query thousands of times at sub-millisecond latency

Feature Matrix

Feature codedb MCP codedb CLI ast-grep ripgrep grep ctags
Structural parsing βœ… βœ… βœ… ❌ ❌ βœ…
Trigram search index βœ… βœ… ❌ ❌ ❌ ❌
Inverted word index βœ… βœ… ❌ ❌ ❌ ❌
Dependency graph βœ… βœ… ❌ ❌ ❌ ❌
Version tracking βœ… βœ… ❌ ❌ ❌ ❌
Multi-agent locking βœ… βœ… ❌ ❌ ❌ ❌
Pre-indexed (warm) βœ… ❌ ❌ ❌ ❌ ❌
No process startup βœ… ❌ ❌ ❌ ❌ ❌
MCP protocol βœ… ❌ ❌ ❌ ❌ ❌
Full-text search βœ… βœ… βœ… βœ… βœ… ❌
Atomic file edits βœ… βœ… βœ… ❌ ❌ ❌
File watcher βœ… βœ… ❌ ❌ ❌ ❌

codedb = tree-sitter + search index + dependency graph + agent runtime. Zero external dependencies. Pure Zig. Single binary.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  HTTP :7719 β”‚     β”‚  MCP stdio  β”‚
β”‚  server.zig β”‚     β”‚  mcp.zig    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚                   β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚     Explorer        β”‚
    β”‚   explore.zig       β”‚
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
    β”‚  β”‚ WordIndex      β”‚  β”‚
    β”‚  β”‚ TrigramIndex   β”‚  β”‚
    β”‚  β”‚ Outlines       β”‚  β”‚
    β”‚  β”‚ Contents       β”‚  β”‚
    β”‚  β”‚ DepGraph       β”‚  β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚      Store          │──── data.log
    β”‚    store.zig        β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚     Watcher         β”‚ ← polls every 2s
    β”‚   watcher.zig       β”‚
    β”‚  (FilteredWalker)   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

No SQLite. No dependencies. Purpose-built data model:

  • Explorer β€” structural index engine. Parses Zig, Python, TypeScript/JavaScript. Maintains outlines, trigram index, inverted word index, content cache, and dependency graph behind a single mutex.
  • Store β€” append-only version log. Every mutation (snapshot, edit, delete) gets a monotonically increasing sequence number. Version history capped at 100 per file.
  • Watcher β€” polling file watcher (2s interval). FilteredWalker prunes .git, node_modules, zig-cache, __pycache__, etc. before descending.
  • Agents β€” first-class structs with cursors, heartbeats, and exclusive file locks. Stale agents reaped after 30s.

Threading Model

Thread Role
Main HTTP accept loop or MCP read loop
Watcher Polls filesystem every 2s via FilteredWalker
ISR Rebuilds snapshot when stale flag is set
Reap Cleans up stale agents every 5s
Per-connection HTTP server spawns a thread per connection

All threads share a shutdown: atomic.Value(bool) for graceful termination.


πŸ”’ Data & Privacy

codedb collects anonymous usage telemetry to improve the tool. Telemetry is on by default β€” written to ~/.codedb/telemetry.ndjson and periodically synced to the codedb analytics endpoint. No source code, file contents, file paths, or search queries are collected β€” only aggregate tool call counts, latency, and startup stats.

Location Contents Purpose
~/.codedb/projects/<hash>/ Trigram index, frequency table, data log Persistent index cache
~/.codedb/telemetry.ndjson Aggregate tool calls and startup stats Local telemetry log
./codedb.snapshot File tree, outlines, content, frequency table Portable snapshot for instant MCP startup

Not stored: No source code is sent anywhere. No file contents, file paths, or search queries are collected in telemetry. Sensitive files auto-excluded (.env*, credentials.json, secrets.*, .pem, .key, SSH keys, AWS configs).

To disable telemetry: set CODEDB_NO_TELEMETRY=1 or pass --no-telemetry.

To sync the local NDJSON file into Postgres for analysis or dashboards, use scripts/sync-telemetry.py with the schema in docs/telemetry/postgres-schema.sql. The data flow is documented in docs/telemetry.md.

rm -rf ~/.codedb/          # clear all cached indexes
rm -f codedb.snapshot      # remove snapshot from project

πŸ”¨ Building from Source

Requirements: Zig 0.15+

git clone https://github.com/justrach/codedb.git
cd codedb
zig build                              # debug build
zig build -Doptimize=ReleaseFast       # release build
zig build test                         # run tests
zig build bench                        # run benchmarks

Binary: zig-out/bin/codedb

Cross-compilation

zig build -Doptimize=ReleaseFast -Dtarget=x86_64-linux
zig build -Doptimize=ReleaseFast -Dtarget=aarch64-linux
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-macos

Releasing

./release.sh 0.2.0              # build, codesign, notarize, upload to GitHub Releases
./release.sh 0.2.0 --dry-run    # preview without executing

License

See LICENSE for details.

About

Zig code intelligence server and MCP toolset for AI agents. Fast tree, outline, symbol, search, read, edit, deps, snapshot, and remote GitHub repo queries.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors