Engram is a local-first MCP memory server that gives AI agents genuinely useful long-term memory. It uses real semantic embeddings for search, provides three-tier retrieval to keep token costs proportional to need, and includes an intelligent layer that prevents duplicates, tracks access patterns, surfaces stale memories, indexes codebases automatically, and captures session outcomes for human approval.
- Semantic search via sentence-transformers (all-MiniLM-L6-v2) — local, offline, zero cost
- Three-tier retrieval — snippets, chunks, or full content based on what's actually needed
- Markdown-aware chunking — splits on headers first, then paragraphs, max 800 chars per chunk
- Dual storage — JSON flat files (source of truth) + ChromaDB vector index (search)
- Web dashboard at localhost:5000 — full CRUD, search, tag filtering, memory templates
- Deduplication gate — blocks near-duplicate stores (cosine similarity >= 0.92, configurable) with
force=Trueoverride - Access tracking —
last_accessedtimestamp on every retrieval (fire-and-forget, non-blocking) - Relationship links —
related_tofield with bidirectionalget_related_memoriesqueries - Staleness detection — surfaces time-stale (not accessed in N days) and code-stale memories via WebUI tab and MCP tool
- Architectural synthesis —
engram_index.pyuses Claude Code CLI to synthesize "Model B" understanding (why, decisions, patterns, watch-outs) from any codebase - Three modes — bootstrap (full synthesis), evolve (incremental hash-diff), full (re-index everything)
- Per-project config —
.engram/config.jsonwith custom domains, file globs, and synthesis questions - Auto-generated skills — thin skill files that trigger Engram retrieval when editing domain files
- Git hook — post-commit hook runs evolve mode automatically in the background
- Stop hook — evaluates every Claude Code session after it ends against configurable criteria
- Approval gate — worthy sessions produce a memory draft saved to
.engram/pending_memories/ - Next-session surfacing —
engram-pendingskill auto-loads and presents drafts for approval - Dedup-protected — deduplication gate runs automatically before any draft is stored
Agents should pay token costs proportional to what they actually need.
Tier 1 — search_memories("dispatch calendar")
-> 5 scored snippets, ~50 tokens each
-> identify the right key + chunk_id
Tier 2 — retrieve_chunk("sylvara_scheduler", chunk_id=3)
-> one relevant section, ~200 tokens
-> usually sufficient
Tier 3 — retrieve_memory("sylvara_scheduler")
-> full content, intentional and explicit
Engram uses all-MiniLM-L6-v2 for local embeddings and ChromaDB for vector storage. No API calls, no privacy exposure, no ongoing cost. The model runs on CPU and downloads once (~80MB).
# These match with zero keyword overlap:
search_memories("CRM overlap with competitor")
# -> "Arbostar integration decision"
search_memories("audio transcription pipeline")
# -> "VoIP-first WebRTC architecture"When storing a memory, Engram automatically checks for near-duplicates:
store_memory("billing_fix", content="...")
# -> "WARNING: Similar memory exists: billing_webhook_pattern (score: 0.94)"
# -> Memory NOT stored. Use force=True to override.
store_memory("billing_fix", content="...", force=True)
# -> Stored (dedup overridden)The threshold (default 0.92) is configurable in config.json.
Engram exposes 8 tools to any MCP-compatible agent:
| Tool | Signature | Purpose | Token Cost |
|---|---|---|---|
search_memories |
(query, limit=5) |
Semantic search, returns scored snippets | Low |
list_all_memories |
() |
Full directory: keys, titles, tags, timestamps | Very low |
retrieve_chunk |
(key, chunk_id) |
Single chunk by key + chunk_id | Medium |
retrieve_memory |
(key) |
Full memory content | High (intentional) |
store_memory |
(key, content, tags, title, related_to, force) |
Create or update a memory | -- |
get_related_memories |
(key) |
Bidirectional relationship traversal | Low |
get_stale_memories |
(days=90, type='all') |
Surface stale memories (time/code/all) | Low |
delete_memory |
(key) |
Permanently delete a memory | -- |
- Python 3.10+
- Git
- Claude Code CLI (for codebase indexer and session evaluator)
git clone https://github.com/ckwich/Engram.git
cd Engram
python install.pyThe installer creates a virtual environment, installs dependencies, pre-downloads the embedding model, and generates your MCP config.
claude mcp add engram --scope user \
/path/to/engram/venv/bin/python \
/path/to/engram/server.pyWindows:
claude mcp add engram --scope user `
"C:\path\to\engram\venv\Scripts\python.exe" `
"C:\path\to\engram\server.py"Add to your claude_desktop_config.json:
{
"mcpServers": {
"engram": {
"command": "/path/to/engram/venv/bin/python",
"args": ["/path/to/engram/server.py"]
}
}
}python server.py --transport sse --port 5100Engram installs two Claude Code skills at ~/.claude/skills/:
Create memories naturally mid-session:
/engramize the billing webhook race condition fix we just figured out
Claude looks back at the session context, drafts a properly formatted memory (key, title, tags, content), shows it for approval, then stores. Enforces naming conventions (snake_case keys, em-dash titles, three-tag standard).
Auto-loads at session start. Checks for pending memory drafts from the session evaluator and presents them for approval, editing, or deletion.
Synthesize architectural understanding from any codebase into Engram memories.
# Interactive setup — auto-detects domains, you confirm
python engram_index.py --project /path/to/project --init
# Full synthesis from planning docs + source code
python engram_index.py --project /path/to/project --mode bootstrap
# Incremental — only changed domains since last run
python engram_index.py --project /path/to/project --mode evolve
# Complete re-index
python engram_index.py --project /path/to/project --mode full
# Preview without synthesizing
python engram_index.py --project /path/to/project --dry-run
# Re-index a specific domain
python engram_index.py --project /path/to/project --domain billing
# Install git post-commit hook for automatic evolve
python engram_index.py --project /path/to/project --install-hookCreate .engram/config.json in your project root (or use --init):
{
"project_name": "sylvara",
"domains": {
"billing": {
"file_globs": ["src/billing/**", "src/stripe/**"],
"questions": [
"How does the billing pipeline work?",
"What are the key integration points?"
]
},
"auth": {
"file_globs": ["src/auth/**", "src/middleware/auth*"],
"questions": [
"How does authentication flow work?",
"What session management decisions were made?"
]
}
},
"planning_paths": [".planning/", "docs/"],
"model": "sonnet",
"max_file_size_kb": 100
}- Bootstrap reads planning artifacts + source files per domain
- Sends context to Claude Code CLI (
claude -p) for synthesis — uses your Max plan, zero extra cost - Stores memories in the
codebase_{project}_{domain}_architecturenamespace - Generates thin skill files at
~/.claude/skills/{project}-{domain}-context/that trigger Engram retrieval when editing matching files - Tracks file hashes in
.engram/index.jsonfor incremental re-indexing
Manual edits to Engram memories always win over re-indexing (unless --force is passed).
Automatically captures significant session outcomes as memories.
Register the Stop hook in Claude Code settings (one-time):
// In ~/.claude/settings.json, add to hooks.Stop array:
{
"type": "command",
"command": "C:/Dev/Engram/venv/Scripts/python.exe C:/Dev/Engram/hooks/engram_stop.py"
}- After every session, the Stop hook fires and spawns a detached evaluator subprocess (never blocks)
- Evaluator reads
last_assistant_messagefrom the session and calls Claude CLI with configured criteria - If criteria are met (bug resolved, new capability, architectural decision, milestone), a memory draft is written to
.engram/pending_memories/ - Dedup gate runs before writing — if a near-duplicate exists, it's noted in the draft
- Next session, the
engram-pendingskill surfaces drafts for approval
stop_hook_activecheck is the absolute first action — prevents infinite evaluation loops- Evaluator runs as a detached subprocess — hook exits in under 10 seconds
auto_approve_threshold: 0.0means always ask (configurable per project)- No memory is ever stored without explicit human approval (unless threshold is raised)
Add to your project's .engram/config.json:
{
"session_evaluator": {
"logic_win_triggers": [
"bug resolved",
"new system capability added",
"architectural decision made"
],
"milestone_triggers": [
"phase completed",
"feature shipped",
"significant refactor done"
],
"auto_approve_threshold": 0.0
}
}Full-featured web UI at http://localhost:5000:
python webui.py- Grid and List views with metadata cards
- Semantic search with relevance scores and three-tier expansion
- Full CRUD from the browser with dedup warnings
- Related memories displayed as clickable links on detail view
- Stale Memories tab showing time-stale and code-stale memories with Mark Reviewed action
- Memory templates for common types (Project, Decision, Reference, Snippet)
- Tag filtering sidebar
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/search?q=...&limit=10 |
Semantic search |
GET |
/api/chunk/<key>/<chunk_id> |
Retrieve a single chunk |
GET |
/api/memory/<key> |
Retrieve full memory |
POST |
/api/memory |
Create a memory |
PUT |
/api/memory/<key> |
Update a memory |
DELETE |
/api/memory/<key> |
Delete a memory |
GET |
/api/related/<key> |
Get related memories |
GET |
/api/stale |
List stale memories |
POST |
/api/memory/<key>/reviewed |
Mark memory as reviewed |
GET |
/api/stats |
Memory count, chunk count |
GET |
/health |
Health check |
# MCP server (stdio transport, default)
python server.py
# SSE transport for remote access
python server.py --transport sse --port 5100
# Rebuild ChromaDB index from JSON (recovery)
python server.py --rebuild-index
# Export/import all memories
python server.py --export
python server.py --import-file engram_export_2026-03-16.json
# Health check and self-test
python server.py --health
python server.py --self-test
# Generate MCP client config
python server.py --generate-configengram/
├── core/
│ ├── embedder.py # sentence-transformers wrapper
│ ├── chunker.py # markdown-aware content chunker
│ └── memory_manager.py # storage engine (JSON + ChromaDB, dedup, relationships, staleness)
├── hooks/
│ ├── engram_stop.py # Claude Code Stop hook entry point
│ ├── engram_evaluator.py # detached session evaluator
│ └── test_engram_evaluator.py
├── data/
│ ├── memories/ # JSON flat files — source of truth
│ └── chroma/ # ChromaDB vector index
├── templates/
│ └── index.html # web dashboard
├── static/
│ └── style.css # dashboard styles
├── server.py # FastMCP server (8 MCP tools)
├── webui.py # Flask web dashboard + REST API
├── engram_index.py # codebase indexer CLI
├── config.json # runtime config (dedup threshold, stale days, evaluator criteria)
├── install.py # setup wizard
└── requirements.txt # pinned dependencies
| Package | Version | Purpose |
|---|---|---|
fastmcp |
~3.1.1 | MCP server layer (stdio + SSE) |
sentence-transformers |
~5.3.0 | Local semantic embeddings |
chromadb |
~1.5.5 | Persistent vector store |
flask |
~3.1.3 | Web dashboard |
No additional dependencies for the codebase indexer or session evaluator — both use the Claude Code CLI (claude -p) which runs under your existing subscription.
JSON as source of truth. If the vector index is corrupted, --rebuild-index reconstructs it from JSON. Your memories are never solely in a binary database.
Local embeddings only. No external API calls. The model runs on CPU, works offline, zero ongoing cost. Memory contents never leave your machine.
Dedup before store. Every store_memory call checks for semantic near-duplicates. The threshold (0.92 cosine) is configurable. Self-updates (same key) are always allowed through.
Fire-and-forget access tracking. last_accessed updates run in background tasks — retrieval is never slowed down by tracking writes.
CLI-based synthesis. The codebase indexer and session evaluator use claude -p subprocess calls instead of the Anthropic API. This uses your existing Claude Code subscription with zero marginal cost.
Non-blocking hooks. The Stop hook spawns the evaluator as a detached subprocess and exits immediately. Sessions are never blocked by evaluation.
Human approval for automated captures. The session evaluator never stores memories directly — it writes drafts to pending files. A human must approve, edit, or delete each draft.
Memories are stored as plain JSON:
{
"key": "sylvara_architecture",
"title": "Sylvara — Architecture and Technical Decisions",
"content": "## Stack\n...",
"tags": ["sylvara", "architecture", "decisions"],
"created_at": "2026-03-16T14:23:00-07:00",
"updated_at": "2026-03-16T14:23:00-07:00",
"last_accessed": "2026-04-01T09:15:00-07:00",
"related_to": ["sylvara_ops", "sylvara_billing"],
"potentially_stale": false,
"chunk_count": 19,
"chars": 7099,
"lines": 142
}Human-readable, portable, and editable with any text editor.
Issues and PRs welcome. If you find a bug or have a feature idea, open an issue.
MIT — see LICENSE for details.
Built by Cole Wichman