A RAG (Retrieval-Augmented Generation) service for Obsidian vaults.
The KnowledgeBaseService provides question-answering over your vault using RAG (Retrieval-Augmented Generation).
- Retrieval: Uses
SearchServicewith hybrid search to find the most relevant chunks for the question - Context Building: Assembles retrieved chunks into a context string with source references
- Generation: Sends the question + context to Ollama LLM to generate an answer
- Response: Returns the answer along with source references
spring.ai.ollama.chat.options.model=llama3.2
spring.ai.ollama.chat.options.temperature=0.7
knowledgebase.excerpt-length=200| Parameter | Default | Description |
|---|---|---|
spring.ai.ollama.chat.options.model |
llama3.2 | Ollama model for answer generation |
spring.ai.ollama.chat.options.temperature |
0.7 | Controls response creativity (0-1) |
knowledgebase.excerpt-length |
200 | Max characters for source excerpts in response |
GET /api/kb/ask?question=<question>
Parameters:
question(required): The question to ask about your vault
Response:
{
"answer": "Based on your notes...",
"sources": [
{
"path": "notes/topic.md",
"chunkIndex": 2,
"relevanceScore": 0.85,
"excerpt": "Preview of the source content..."
}
],
"modelUsed": "llama3.2"
}The system uses token-based text chunking via Spring AI's TokenTextSplitter to split markdown files into smaller pieces for embedding and retrieval.
Chunking is a two-step process:
-
Markdown Parsing:
MarkdownDocumentReaderparses the file with configuration that:- Preserves code blocks
- Preserves blockquotes
- Creates document boundaries at horizontal rules
-
Token Splitting:
TokenTextSplittersplits the parsed content into token-based chunks
Each chunk is stored with a sequential chunkIndex to preserve document order for reassembly.
chunker.token.chunk-size=800
chunker.token.min-chunk-size-chars=200
chunker.token.min-chunk-length-to-embed=50
chunker.token.max-num-chunks=1000
chunker.token.keep-separator=false| Parameter | Default | Description |
|---|---|---|
chunk-size |
800 | Target chunk size in tokens |
min-chunk-size-chars |
200 | Minimum character count for a chunk |
min-chunk-length-to-embed |
50 | Minimum length required for embedding generation |
max-num-chunks |
1000 | Maximum chunks allowed per document |
keep-separator |
false | Whether to preserve separators between chunks |
- File is chunked via
MarkdownChunker.chunkFile() - Each chunk becomes a
ChunkEntitywith sequential index - Chunks are persisted to PostgreSQL
- Embeddings are generated for each chunk via Ollama
- Embeddings are stored in pgvector for semantic search
The VaultSyncService keeps the search index synchronized with your Obsidian vault by detecting file changes and triggering re-indexing.
On application startup, the service performs a full reconcile:
- Scans all markdown files in the vault
- Loads the previous snapshot (file metadata stored on disk)
- Computes a diff to find created, updated, or deleted files
- Deletes index entries for removed files
- Re-indexes new or modified files
- Saves the new snapshot
- Starts the file watcher
After startup, the service watches for file changes:
- File Watcher (
LocalVaultWatcher): Monitors the vault directory for file system events - Debouncer (
VaultEventsDebouncer): Batches rapid changes to avoid redundant indexing - Batch Processing: Applies updates in batches (create/update/delete)
- Overflow Handling: If too many events occur, falls back to full reconcile
Files are considered changed when either:
lastModifiedMillisdiffers from snapshotsizediffers from snapshot
vault.path=/path/to/obsidian/vault
vault.debounce.ms=750| Parameter | Default | Description |
|---|---|---|
vault.path |
(required) | Absolute path to your Obsidian vault |
vault.debounce.ms |
750 | Delay in ms to batch rapid file changes |
The SearchService provides three search modes for querying indexed markdown chunks.
Uses PostgreSQL's built-in full-text search via chunkRepository.searchFullText().
- Searches for exact/stemmed word matches in chunk content
- Good for finding specific terms
- Uses PostgreSQL's
ts_rankfor relevance scoring
Uses vector embeddings via vectorStore.similaritySearch().
- Finds conceptually similar content (not just keyword matches)
- Example: searching "AI assistant" might find chunks about "chatbot" or "LLM"
- Uses Ollama with
nomic-embed-textmodel for embeddings - Configurable similarity threshold to filter low-relevance results
Combines FTS and semantic search for best results:
- Runs both FTS and semantic search (each with
limit * 2) - Normalizes scores to 0-1 range using min-max normalization
- Combines scores with configurable weights:
- FTS: 30% (default)
- Semantic: 70% (default)
- Returns top
limitresults sorted by combined score
ts_rank is a PostgreSQL function that scores how well a document matches a text query.
It considers:
- Term frequency: How often query words appear in the document
- Document length: Normalizes for shorter vs longer documents
- Word proximity: How close matching words are to each other
Returns a float (usually 0.0 to ~0.1) - higher means better match.
Measures the angle between two vectors in high-dimensional space.
similarity = cos(θ) = (A · B) / (||A|| × ||B||)
How it works:
- Text is converted to a vector (array of 768 numbers with
nomic-embed-text) - Similar meanings → vectors point in similar directions
- Cosine of angle between them = similarity score
Score range:
1.0= identical direction (same meaning)0.0= perpendicular (unrelated)-1.0= opposite direction (opposite meaning)
In practice, text embeddings usually range 0.0 to 1.0.
| ts_rank | Cosine Similarity | |
|---|---|---|
| Matches | Exact words (stemmed) | Conceptual meaning |
| "AI assistant" finds "chatbot" | No | Yes |
| "claude" finds "Claude" | Yes | Yes |
| Speed | Fast (index lookup) | Slower (vector math) |
The hybrid search uses min-max normalization to combine FTS and semantic scores fairly.
FTS and semantic search return scores on different scales:
- FTS (
ts_rank): typically 0.0 to ~0.1 - Semantic (cosine similarity): typically 0.0 to 1.0
Without normalization, semantic scores would dominate even with lower weight.
normalized = (score - min) / (max - min)
Example with FTS scores [0.02, 0.05, 0.08]:
- min = 0.02, max = 0.08, range = 0.06
- 0.02 → (0.02 - 0.02) / 0.06 = 0.0
- 0.05 → (0.05 - 0.02) / 0.06 = 0.5
- 0.08 → (0.08 - 0.02) / 0.06 = 1.0
Now both FTS and semantic scores are 0-1, making the weighted combination fair.
# Search weights for hybrid mode
search.hybrid.fts-weight=0.3
search.hybrid.semantic-weight=0.7GET /api/search?query=<query>&limit=<limit>
Parameters:
query(required): Search query stringlimit(optional): Maximum results to return (default:20)