Skip to content

ckwich/Engram

Repository files navigation

Engram

Intelligent Semantic Memory for AI Agents

Python 3.10+ MCP Compliant License: MIT Platform: Windows | macOS | Linux

Engram is a local-first MCP memory server that gives AI agents genuinely useful long-term memory. It uses real semantic embeddings for search, provides three-tier retrieval to keep token costs proportional to need, and includes an intelligent layer that prevents duplicates, tracks access patterns, surfaces stale memories, indexes codebases automatically, and captures session outcomes for human approval.


Features

Core Memory Server

  • Semantic search via sentence-transformers (all-MiniLM-L6-v2) — local, offline, zero cost
  • Three-tier retrieval — snippets, chunks, or full content based on what's actually needed
  • Markdown-aware chunking — splits on headers first, then paragraphs, max 800 chars per chunk
  • Dual storage — JSON flat files (source of truth) + ChromaDB vector index (search)
  • Web dashboard at localhost:5000 — full CRUD, search, tag filtering, memory templates

Memory Quality Layer

  • Deduplication gate — blocks near-duplicate stores (cosine similarity >= 0.92, configurable) with force=True override
  • Access trackinglast_accessed timestamp on every retrieval (fire-and-forget, non-blocking)
  • Relationship linksrelated_to field with bidirectional get_related_memories queries
  • Staleness detection — surfaces time-stale (not accessed in N days) and code-stale memories via WebUI tab and MCP tool

Codebase Indexer

  • Architectural synthesisengram_index.py uses Claude Code CLI to synthesize "Model B" understanding (why, decisions, patterns, watch-outs) from any codebase
  • Three modes — bootstrap (full synthesis), evolve (incremental hash-diff), full (re-index everything)
  • Per-project config.engram/config.json with custom domains, file globs, and synthesis questions
  • Auto-generated skills — thin skill files that trigger Engram retrieval when editing domain files
  • Git hook — post-commit hook runs evolve mode automatically in the background

Session Evaluator

  • Stop hook — evaluates every Claude Code session after it ends against configurable criteria
  • Approval gate — worthy sessions produce a memory draft saved to .engram/pending_memories/
  • Next-session surfacingengram-pending skill auto-loads and presents drafts for approval
  • Dedup-protected — deduplication gate runs automatically before any draft is stored

How It Works

Three-Tier Retrieval

Agents should pay token costs proportional to what they actually need.

Tier 1 — search_memories("dispatch calendar")
         -> 5 scored snippets, ~50 tokens each
         -> identify the right key + chunk_id

Tier 2 — retrieve_chunk("sylvara_scheduler", chunk_id=3)
         -> one relevant section, ~200 tokens
         -> usually sufficient

Tier 3 — retrieve_memory("sylvara_scheduler")
         -> full content, intentional and explicit

Semantic Search

Engram uses all-MiniLM-L6-v2 for local embeddings and ChromaDB for vector storage. No API calls, no privacy exposure, no ongoing cost. The model runs on CPU and downloads once (~80MB).

# These match with zero keyword overlap:
search_memories("CRM overlap with competitor")
# -> "Arbostar integration decision"

search_memories("audio transcription pipeline")
# -> "VoIP-first WebRTC architecture"

Deduplication Gate

When storing a memory, Engram automatically checks for near-duplicates:

store_memory("billing_fix", content="...")
# -> "WARNING: Similar memory exists: billing_webhook_pattern (score: 0.94)"
# -> Memory NOT stored. Use force=True to override.

store_memory("billing_fix", content="...", force=True)
# -> Stored (dedup overridden)

The threshold (default 0.92) is configurable in config.json.


MCP Tools

Engram exposes 8 tools to any MCP-compatible agent:

Tool Signature Purpose Token Cost
search_memories (query, limit=5) Semantic search, returns scored snippets Low
list_all_memories () Full directory: keys, titles, tags, timestamps Very low
retrieve_chunk (key, chunk_id) Single chunk by key + chunk_id Medium
retrieve_memory (key) Full memory content High (intentional)
store_memory (key, content, tags, title, related_to, force) Create or update a memory --
get_related_memories (key) Bidirectional relationship traversal Low
get_stale_memories (days=90, type='all') Surface stale memories (time/code/all) Low
delete_memory (key) Permanently delete a memory --

Installation

Prerequisites

  • Python 3.10+
  • Git
  • Claude Code CLI (for codebase indexer and session evaluator)

Quick Start

git clone https://github.com/ckwich/Engram.git
cd Engram
python install.py

The installer creates a virtual environment, installs dependencies, pre-downloads the embedding model, and generates your MCP config.

Connect to Claude Code

claude mcp add engram --scope user \
  /path/to/engram/venv/bin/python \
  /path/to/engram/server.py

Windows:

claude mcp add engram --scope user `
  "C:\path\to\engram\venv\Scripts\python.exe" `
  "C:\path\to\engram\server.py"

Connect to Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "engram": {
      "command": "/path/to/engram/venv/bin/python",
      "args": ["/path/to/engram/server.py"]
    }
  }
}

SSE Mode (Remote / Homelab)

python server.py --transport sse --port 5100

Skills

Engram installs two Claude Code skills at ~/.claude/skills/:

/engramize

Create memories naturally mid-session:

/engramize the billing webhook race condition fix we just figured out

Claude looks back at the session context, drafts a properly formatted memory (key, title, tags, content), shows it for approval, then stores. Enforces naming conventions (snake_case keys, em-dash titles, three-tag standard).

engram-pending

Auto-loads at session start. Checks for pending memory drafts from the session evaluator and presents them for approval, editing, or deletion.


Codebase Indexer

Synthesize architectural understanding from any codebase into Engram memories.

# Interactive setup — auto-detects domains, you confirm
python engram_index.py --project /path/to/project --init

# Full synthesis from planning docs + source code
python engram_index.py --project /path/to/project --mode bootstrap

# Incremental — only changed domains since last run
python engram_index.py --project /path/to/project --mode evolve

# Complete re-index
python engram_index.py --project /path/to/project --mode full

# Preview without synthesizing
python engram_index.py --project /path/to/project --dry-run

# Re-index a specific domain
python engram_index.py --project /path/to/project --domain billing

# Install git post-commit hook for automatic evolve
python engram_index.py --project /path/to/project --install-hook

Per-Project Config

Create .engram/config.json in your project root (or use --init):

{
  "project_name": "sylvara",
  "domains": {
    "billing": {
      "file_globs": ["src/billing/**", "src/stripe/**"],
      "questions": [
        "How does the billing pipeline work?",
        "What are the key integration points?"
      ]
    },
    "auth": {
      "file_globs": ["src/auth/**", "src/middleware/auth*"],
      "questions": [
        "How does authentication flow work?",
        "What session management decisions were made?"
      ]
    }
  },
  "planning_paths": [".planning/", "docs/"],
  "model": "sonnet",
  "max_file_size_kb": 100
}

How It Works

  1. Bootstrap reads planning artifacts + source files per domain
  2. Sends context to Claude Code CLI (claude -p) for synthesis — uses your Max plan, zero extra cost
  3. Stores memories in the codebase_{project}_{domain}_architecture namespace
  4. Generates thin skill files at ~/.claude/skills/{project}-{domain}-context/ that trigger Engram retrieval when editing matching files
  5. Tracks file hashes in .engram/index.json for incremental re-indexing

Manual edits to Engram memories always win over re-indexing (unless --force is passed).


Session Evaluator

Automatically captures significant session outcomes as memories.

Setup

Register the Stop hook in Claude Code settings (one-time):

// In ~/.claude/settings.json, add to hooks.Stop array:
{
  "type": "command",
  "command": "C:/Dev/Engram/venv/Scripts/python.exe C:/Dev/Engram/hooks/engram_stop.py"
}

How It Works

  1. After every session, the Stop hook fires and spawns a detached evaluator subprocess (never blocks)
  2. Evaluator reads last_assistant_message from the session and calls Claude CLI with configured criteria
  3. If criteria are met (bug resolved, new capability, architectural decision, milestone), a memory draft is written to .engram/pending_memories/
  4. Dedup gate runs before writing — if a near-duplicate exists, it's noted in the draft
  5. Next session, the engram-pending skill surfaces drafts for approval

Safety

  • stop_hook_active check is the absolute first action — prevents infinite evaluation loops
  • Evaluator runs as a detached subprocess — hook exits in under 10 seconds
  • auto_approve_threshold: 0.0 means always ask (configurable per project)
  • No memory is ever stored without explicit human approval (unless threshold is raised)

Configuration

Add to your project's .engram/config.json:

{
  "session_evaluator": {
    "logic_win_triggers": [
      "bug resolved",
      "new system capability added",
      "architectural decision made"
    ],
    "milestone_triggers": [
      "phase completed",
      "feature shipped",
      "significant refactor done"
    ],
    "auto_approve_threshold": 0.0
  }
}

Web Dashboard

Full-featured web UI at http://localhost:5000:

python webui.py
  • Grid and List views with metadata cards
  • Semantic search with relevance scores and three-tier expansion
  • Full CRUD from the browser with dedup warnings
  • Related memories displayed as clickable links on detail view
  • Stale Memories tab showing time-stale and code-stale memories with Mark Reviewed action
  • Memory templates for common types (Project, Decision, Reference, Snippet)
  • Tag filtering sidebar

Web API

Method Endpoint Description
GET /api/search?q=...&limit=10 Semantic search
GET /api/chunk/<key>/<chunk_id> Retrieve a single chunk
GET /api/memory/<key> Retrieve full memory
POST /api/memory Create a memory
PUT /api/memory/<key> Update a memory
DELETE /api/memory/<key> Delete a memory
GET /api/related/<key> Get related memories
GET /api/stale List stale memories
POST /api/memory/<key>/reviewed Mark memory as reviewed
GET /api/stats Memory count, chunk count
GET /health Health check

CLI Utilities

# MCP server (stdio transport, default)
python server.py

# SSE transport for remote access
python server.py --transport sse --port 5100

# Rebuild ChromaDB index from JSON (recovery)
python server.py --rebuild-index

# Export/import all memories
python server.py --export
python server.py --import-file engram_export_2026-03-16.json

# Health check and self-test
python server.py --health
python server.py --self-test

# Generate MCP client config
python server.py --generate-config

Architecture

engram/
├── core/
│   ├── embedder.py          # sentence-transformers wrapper
│   ├── chunker.py           # markdown-aware content chunker
│   └── memory_manager.py    # storage engine (JSON + ChromaDB, dedup, relationships, staleness)
├── hooks/
│   ├── engram_stop.py       # Claude Code Stop hook entry point
│   ├── engram_evaluator.py  # detached session evaluator
│   └── test_engram_evaluator.py
├── data/
│   ├── memories/            # JSON flat files — source of truth
│   └── chroma/              # ChromaDB vector index
├── templates/
│   └── index.html           # web dashboard
├── static/
│   └── style.css            # dashboard styles
├── server.py                # FastMCP server (8 MCP tools)
├── webui.py                 # Flask web dashboard + REST API
├── engram_index.py          # codebase indexer CLI
├── config.json              # runtime config (dedup threshold, stale days, evaluator criteria)
├── install.py               # setup wizard
└── requirements.txt         # pinned dependencies

Dependencies

Package Version Purpose
fastmcp ~3.1.1 MCP server layer (stdio + SSE)
sentence-transformers ~5.3.0 Local semantic embeddings
chromadb ~1.5.5 Persistent vector store
flask ~3.1.3 Web dashboard

No additional dependencies for the codebase indexer or session evaluator — both use the Claude Code CLI (claude -p) which runs under your existing subscription.

Design Decisions

JSON as source of truth. If the vector index is corrupted, --rebuild-index reconstructs it from JSON. Your memories are never solely in a binary database.

Local embeddings only. No external API calls. The model runs on CPU, works offline, zero ongoing cost. Memory contents never leave your machine.

Dedup before store. Every store_memory call checks for semantic near-duplicates. The threshold (0.92 cosine) is configurable. Self-updates (same key) are always allowed through.

Fire-and-forget access tracking. last_accessed updates run in background tasks — retrieval is never slowed down by tracking writes.

CLI-based synthesis. The codebase indexer and session evaluator use claude -p subprocess calls instead of the Anthropic API. This uses your existing Claude Code subscription with zero marginal cost.

Non-blocking hooks. The Stop hook spawns the evaluator as a detached subprocess and exits immediately. Sessions are never blocked by evaluation.

Human approval for automated captures. The session evaluator never stores memories directly — it writes drafts to pending files. A human must approve, edit, or delete each draft.


Storage Layout

Memories are stored as plain JSON:

{
  "key": "sylvara_architecture",
  "title": "Sylvara — Architecture and Technical Decisions",
  "content": "## Stack\n...",
  "tags": ["sylvara", "architecture", "decisions"],
  "created_at": "2026-03-16T14:23:00-07:00",
  "updated_at": "2026-03-16T14:23:00-07:00",
  "last_accessed": "2026-04-01T09:15:00-07:00",
  "related_to": ["sylvara_ops", "sylvara_billing"],
  "potentially_stale": false,
  "chunk_count": 19,
  "chars": 7099,
  "lines": 142
}

Human-readable, portable, and editable with any text editor.


Contributing

Issues and PRs welcome. If you find a bug or have a feature idea, open an issue.


License

MIT — see LICENSE for details.


Built by Cole Wichman

About

Engram MCP context vault

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors