documents.md

Documentation / User Guide / Document Search

Document Search

Index Markdown and text files for semantic search (RAG).

Overview

While Codanna's primary focus is code intelligence, the document search feature lets you index documentation, notes, and other text files for semantic search. This is useful for:

Searching project documentation alongside code
Building RAG (Retrieval-Augmented Generation) pipelines
Finding relevant context from Markdown files

Quick Start

# Add a collection
codanna documents add-collection docs docs/

# Index all collections
codanna documents index

# Search
codanna documents search "error handling"

Configuration

Document settings live in .codanna/settings.toml:

[documents]
enabled = true

[documents.defaults]
strategy = "hybrid"      # Paragraph-based with size constraints
min_chunk_chars = 200    # Merge small paragraphs
max_chunk_chars = 1500   # Split large paragraphs
overlap_chars = 100      # Context overlap when splitting

[documents.search]
preview_mode = "kwic"    # "kwic" or "full"
preview_chars = 600      # Preview window size
highlight = true         # Highlight keywords with **markers**

[documents.collections.docs]
paths = ["docs/"]
patterns = ["**/*.md"]

Collections

Collections organize documents into searchable groups.

Add a Collection

codanna documents add-collection <name> <path>

Examples:

# Index project docs
codanna documents add-collection docs docs/

# Index external documentation
codanna documents add-collection rust-book /path/to/rust-book/

# Index multiple paths (edit settings.toml)
[documents.collections.guides]
paths = ["docs/", "examples/", "README.md"]
patterns = ["**/*.md", "**/*.txt"]

Remove a Collection

codanna documents remove-collection <name>

This removes the collection from settings.toml. Run documents index to sync the index.

List Collections

codanna documents list

Collection Stats

codanna documents stats <name>

Indexing

Index All Collections

codanna documents index

With progress display:

codanna documents index --progress

Index Specific Collection

codanna documents index --collection docs

Sync Behavior

The index command syncs with settings.toml:

Collections in H.P.009-CONFIG but not indexed: indexed
Collections indexed but not in H.P.009-CONFIG: removed
Files deleted from disk: chunks removed
Files modified: re-indexed

Chunking Strategy

The hybrid strategy preserves document structure:

Document with paragraphs:
┌─────────────────────────────────────────┐
│ Short paragraph (50 chars)              │ ← merged
│ Another short one (80 chars)            │ ← merged  } = 1 chunk
│ Medium paragraph (120 chars)            │ ← merged
├─────────────────────────────────────────┤
│ Normal paragraph (400 chars)            │ = 1 chunk
├─────────────────────────────────────────┤
│ Very long paragraph (2000 chars)...     │ = 2 chunks
│ ...with 100 char overlap                │   (overlap)
└─────────────────────────────────────────┘

min_chunk_chars (200): Paragraphs smaller than this merge with neighbors
max_chunk_chars (1500): Paragraphs larger than this split with overlap
overlap_chars (100): Context preserved between split chunks

Search

Basic Search

codanna documents search "authentication flow"

Filter by Collection

codanna documents search "error handling" --collection docs

Limit Results

codanna documents search "H.P.009-CONFIGuration" --limit 5

JSON Output

codanna documents search "setup guide" --json

Search Result Display

KWIC (Keyword In Context)

Default mode. Centers the preview window around the first keyword match:

1. docs/auth.md (score: 0.72)
   Preview: ...the **authentication** flow handles user login and session...

Full Preview

Shows entire chunk content:

[documents.search]
preview_mode = "full"

Keyword Highlighting

Keywords are wrapped with **markers**:

Preview: ## **Parser** Technology

Codanna uses **tree-sitter** for AST parsing...

Disable with:

[documents.search]
highlight = false

Commands Reference

Command	Description
`documents add-collection <name> <path>`	Add collection to settings.toml
`documents remove-collection <name>`	Remove collection from settings.toml
`documents index`	Index all H.P.009-CONFIGured collections
`documents index --collection <name>`	Index specific collection
`documents index --progress`	Show progress during indexing
`documents search <query>`	Search indexed documents
`documents search <query> --collection <name>`	Search within collection
`documents search <query> --limit <n>`	Limit results
`documents search <query> --json`	JSON output
`documents list`	List indexed collections
`documents stats <name>`	Show collection statistics

MCP Tool

Document search is also available as an MCP tool for AI assistants:

codanna mcp search_documents query:"authentication flow" limit:5

See MCP Tools for details.

Tips

Chunk size considerations: Larger chunks = more context but coarser matches. Smaller chunks = precise matches but may lose context. Choose based on your use case.
Collection organization: Group related documents. Search within a collection for focused results.
Re-indexing: Only changed files are re-indexed. Delete .codanna/index/documents/ for full rebuild.
Progress display: Use --progress for large collections to see two-phase progress (file processing, then embedding generation).

Back to User Guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Search

Overview

Quick Start

Configuration

Collections

Add a Collection

Remove a Collection

List Collections

Collection Stats

Indexing

Index All Collections

Index Specific Collection

Sync Behavior

Chunking Strategy

Search

Basic Search

Filter by Collection

Limit Results

JSON Output

Search Result Display

KWIC (Keyword In Context)

Full Preview

Keyword Highlighting

Commands Reference

MCP Tool

Tips

FilesExpand file tree

documents.md

Latest commit

History

documents.md

File metadata and controls

Document Search

Overview

Quick Start

Configuration

Collections

Add a Collection

Remove a Collection

List Collections

Collection Stats

Indexing

Index All Collections

Index Specific Collection

Sync Behavior

Chunking Strategy

Search

Basic Search

Filter by Collection

Limit Results

JSON Output

Search Result Display

KWIC (Keyword In Context)

Full Preview

Keyword Highlighting

Commands Reference

MCP Tool

Tips