Code-RAG is a CLI tool that makes codebases searchable using semantic search. It converts source code into vector embeddings, stores them in a database, and lets you query them using natural language.
Example: Instead of grepping for function names, ask "authentication logic" and find all relevant auth code.
┌─────────────┐
│ CLI / MCP │ Entry points
└──────┬──────┘
│
┌──────▼──────┐
│ API │ Orchestration layer (CodeRAGAPI)
└──────┬──────┘
│
┌────────────┼────────────┬────────────┐
│ │ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│Process│ │Search │ │Manage │ │ Embed │
└───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
│ │ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│Chunker│ │Rerank │ │Index │ │Storage│
└───────┘ └───────┘ └───────┘ └───────┘
Key Design: Orchestrated Plugin architecture. The CodeRAGAPI centralizes logic, while specialized components handle chunking, indexing, search analysis, and storage.
- What: The central hub for all Code-RAG operations.
- How: Integrates embedding, database, reranking, and indexing logic. Used by both CLI and MCP.
- Features: Session tracking, auto-generated collection names, and unified indexing flow.
- What: Discovers source files and breaks them into logical chunks.
- How: Uses
SyntaxChunker(tree-sitter based) for code-aware splitting, falling back to line-based. - Output: Text chunks with rich metadata (file path, line numbers, symbol names).
- What: Tracks state of indexed files for incremental updates.
- How: Stores
mtime,size, andsha256hashes. - Benefit: Only re-indexes modified files, significantly speeding up subsequent runs.
- What: Improves search relevance by combining vector search with exact identifier matching.
- How:
QueryAnalyzerdetects code identifiers (CamelCase, snake_case) in queries and boosts results containing those identifiers.
- What: Refines search results using Cross-Encoder models.
- How: Re-scores top-K candidates from vector search for higher precision.
- Models: Defaults to
jinaai/jina-reranker-v3.
- Embeddings: Swappable backends (SentenceTransformers, OpenAI, or Shared HTTP).
- Databases: ChromaDB (default) or Qdrant.
- Features: Automatic dimension mismatch handling and model idle timeouts.
- What: FastAPI-based server that hosts both embedding and reranker models.
- Why: Prevents multiple MCP instances from each loading ~500MB+ of models into RAM.
- Management: Auto-spawns on first request, auto-terminates after idle timeout, uses heartbeats.
- Files:
src/code_rag/embedding_server.py,http_embedding.py,http_reranker.py.
# Install
pip install -e .
# Run MCP server (default)
code-rag
# Index and start querying via CLI
code-rag-cli
# Index specific repo with different database
code-rag-cli --path /path/to/repo --database qdrant
# Force reindexing
code-rag-cli --reindexAdd a new embedding model?
→ Extend EmbeddingInterface, add to CodeRAGAPI._create_embedding_model
Add a new reranker?
→ Extend RerankerInterface, update CodeRAGAPI.__init__
Adjust reindexing behavior?
→ Modify CODE_RAG_REINDEX_DEBOUNCE_MINUTES or CODE_RAG_VERIFY_CHANGES_WITH_HASH in config.
Change identifier boosting?
→ Update QueryAnalyzer.get_boost_score in src/code_rag/search/query_analyzer.py
Add support for new languages?
→ Extend SyntaxChunker.LANGUAGE_PACKAGES with tree-sitter bindings.
Add new MCP tools?
→ Update list_tools() and call_tool() in src/code_rag/mcp_server.py
- Implementation details: See
IMPLEMENTATION.md - Code: Start with
src/code_rag/main.py(CLI orchestration) - Tests:
pytestto run test suite - Questions?: Read the code - it's well-structured and follows the plugin pattern
- Plugin architecture: New implementations extend interfaces, wired in at initialization
- Idempotency: Checks if codebase is already processed before re-embedding
- Batch processing: Chunks processed in batches for efficiency
- Metadata tracking: Every chunk knows its source file and position
That's it. Now go build something.