Documentation / User Guide / Document Search
Index Markdown and text files for semantic search (RAG).
While Codanna's primary focus is code intelligence, the document search feature lets you index documentation, notes, and other text files for semantic search. This is useful for:
- Searching project documentation alongside code
- Building RAG (Retrieval-Augmented Generation) pipelines
- Finding relevant context from Markdown files
# Add a collection
codanna documents add-collection docs docs/
# Index all collections
codanna documents index
# Search
codanna documents search "error handling"Document settings live in .codanna/settings.toml:
[documents]
enabled = true
[documents.defaults]
strategy = "hybrid" # Paragraph-based with size constraints
min_chunk_chars = 200 # Merge small paragraphs
max_chunk_chars = 1500 # Split large paragraphs
overlap_chars = 100 # Context overlap when splitting
[documents.search]
preview_mode = "kwic" # "kwic" or "full"
preview_chars = 600 # Preview window size
highlight = true # Highlight keywords with **markers**
[documents.collections.docs]
paths = ["docs/"]
patterns = ["**/*.md"]Collections organize documents into searchable groups.
codanna documents add-collection <name> <path>Examples:
# Index project docs
codanna documents add-collection docs docs/
# Index external documentation
codanna documents add-collection rust-book /path/to/rust-book/
# Index multiple paths (edit settings.toml)
[documents.collections.guides]
paths = ["docs/", "examples/", "README.md"]
patterns = ["**/*.md", "**/*.txt"]codanna documents remove-collection <name>This removes the collection from settings.toml. Run documents index to sync the index.
codanna documents listcodanna documents stats <name>codanna documents indexWith progress display:
codanna documents index --progresscodanna documents index --collection docsThe index command syncs with settings.toml:
- Collections in H.P.009-CONFIG but not indexed: indexed
- Collections indexed but not in H.P.009-CONFIG: removed
- Files deleted from disk: chunks removed
- Files modified: re-indexed
The hybrid strategy preserves document structure:
Document with paragraphs:
┌─────────────────────────────────────────┐
│ Short paragraph (50 chars) │ ← merged
│ Another short one (80 chars) │ ← merged } = 1 chunk
│ Medium paragraph (120 chars) │ ← merged
├─────────────────────────────────────────┤
│ Normal paragraph (400 chars) │ = 1 chunk
├─────────────────────────────────────────┤
│ Very long paragraph (2000 chars)... │ = 2 chunks
│ ...with 100 char overlap │ (overlap)
└─────────────────────────────────────────┘
- min_chunk_chars (200): Paragraphs smaller than this merge with neighbors
- max_chunk_chars (1500): Paragraphs larger than this split with overlap
- overlap_chars (100): Context preserved between split chunks
codanna documents search "authentication flow"codanna documents search "error handling" --collection docscodanna documents search "H.P.009-CONFIGuration" --limit 5codanna documents search "setup guide" --jsonDefault mode. Centers the preview window around the first keyword match:
1. docs/auth.md (score: 0.72)
Preview: ...the **authentication** flow handles user login and session...
Shows entire chunk content:
[documents.search]
preview_mode = "full"Keywords are wrapped with **markers**:
Preview: ## **Parser** Technology
Codanna uses **tree-sitter** for AST parsing...
Disable with:
[documents.search]
highlight = false| Command | Description |
|---|---|
documents add-collection <name> <path> |
Add collection to settings.toml |
documents remove-collection <name> |
Remove collection from settings.toml |
documents index |
Index all H.P.009-CONFIGured collections |
documents index --collection <name> |
Index specific collection |
documents index --progress |
Show progress during indexing |
documents search <query> |
Search indexed documents |
documents search <query> --collection <name> |
Search within collection |
documents search <query> --limit <n> |
Limit results |
documents search <query> --json |
JSON output |
documents list |
List indexed collections |
documents stats <name> |
Show collection statistics |
Document search is also available as an MCP tool for AI assistants:
codanna mcp search_documents query:"authentication flow" limit:5See MCP Tools for details.
-
Chunk size considerations: Larger chunks = more context but coarser matches. Smaller chunks = precise matches but may lose context. Choose based on your use case.
-
Collection organization: Group related documents. Search within a collection for focused results.
-
Re-indexing: Only changed files are re-indexed. Delete
.codanna/index/documents/for full rebuild. -
Progress display: Use
--progressfor large collections to see two-phase progress (file processing, then embedding generation).