OpenExp

Skills tell your AI how. OpenExp teaches it what works.
Outcome-based learning for AI agents. Q-learning memory that gets smarter with every session.

Quick Start · How It Works · MCP Tools · Configuration · Architecture · Contributing

You wrote a skill: "how to work with CRM." Your agent follows it perfectly. But it doesn't know that approach A closed deals and approach B didn't. Tomorrow it'll do the same thing as yesterday — even if yesterday didn't work.

Skills say how. OpenExp teaches what works.

Every outcome — commit, closed deal, resolved ticket — feeds back as a reward signal. Memories that led to results get higher Q-values and surface first next time. Noise sinks.

Example: sales agent

Your agent sent 200 emails this month. Which formulations got replies? Which approaches closed deals? Skills don't know — there's no feedback loop.

# .openexp.yaml in your sales project
experience: sales

1. Define your pipeline:  lead → contacted → qualified → proposal → won
2. Work normally — Claude remembers client preferences, deal context, pricing
3. Deal closes → all memories tagged with that client get rewarded
4. Next similar deal → the insights that led to the close surface first

After a month, your agent "knows" not just how to write emails — but which emails lead to results.

The Problem

Skills and CLAUDE.md solve the "agent doesn't remember" problem. But they're static instructions — written once, never learning from outcomes. Your agent follows the playbook perfectly, but doesn't know which plays actually work.

Existing memory tools (Mem0, Zep, LangMem) add storage — but every memory is equally important. A two-month-old note about a deleted feature has the same weight as yesterday's critical architecture decision.

The missing piece: there's no learning. No feedback loop from outcomes to retrieval quality.

The Solution

OpenExp adds a closed-loop learning system with outcome-based rewards:

Session starts → recall memories (ranked by Q-value)
    ↓
Agent works → observations + decisions captured automatically
    ↓
Outcomes happen → deal closes, prediction verified, retrospective runs
    ↓
    WIN  → memories that contributed get rewarded (Q-values go up)
    LOSS → memories that misled get penalized (Q-values go down)
    ↓
Next session → better memories surface first

Five Reward Paths

OpenExp doesn't rely on heuristics. It learns from real outcomes through five distinct reward paths:

Path	Trigger	Example
Prediction	`log_outcome` resolves a prediction	"Predicted client would accept proposal" → confirmed → +0.8
Business	CRM stage transition detected	Deal moved negotiation → won → +0.8 to tagged memories
Calibration	Manual Q-value override	Expert judgment: "this insight was critical" → set q=0.9
Retrospective	Daily LLM analysis (Opus 4.6)	Cross-session patterns: promote undervalued, demote noise
Decision extraction	Session end (async)	Opus 4.6 reads transcript, extracts strategic decisions

add_memory(content="Acme prefers Google stack", client_id="comp-acme")
    ↓
... weeks of work ...
    ↓
CRM: Acme deal moves negotiation → won
    ↓
resolve_outcomes → finds memories tagged comp-acme → reward +0.8

After a few sessions, OpenExp learns what context actually helps you get work done.

Experience Library

Memories capture individual moments. The Experience Library captures entire journeys — from first contact to final outcome — and distills them into reusable lessons.

Raw conversations (26K messages)
    ↓ chunk into ~200K token batches
18 chunks
    ↓ Opus extracts topics per chunk
170 topics
    ↓ group across chunks by work thread
36 threads (e.g., "Enterprise Chatbot Deal", "Document Automation Pipeline")
    ↓ Opus labels each thread
269 experience labels (context → actions → outcome → lesson)
    ↓ stored in Qdrant as type="experience"
Searchable via search_memory

Each experience label is a structured training triplet:

{
  "context": {
    "situation": "Client needs automated report generation from 40-page template",
    "constraints": ["Non-technical operators", "14 regional offices"],
    "stakeholders": ["Client PM", "Builder (you)"]
  },
  "actions": [
    {"what": "Built 7-stage pipeline with --auto flag", "why": "Remove human bottleneck"}
  ],
  "outcome": {
    "result": "Pipeline generates documents end-to-end, demo successful",
    "success": true
  },
  "lesson": {
    "insight": "When human is bottleneck, make the agent the worker — give it tools + DoD",
    "applies_when": "Manual data entry is blocking a pipeline that otherwise works"
  }
}

When a new situation arises, search_memory finds relevant experiences by matching the situation, not keywords — so "document automation client" finds lessons from a completely different industry project because the pattern matches.

Three levels of use:

Now: Experience layer as system prompt — skill queries Qdrant, formats advice
Soon: Compress with compresr.ai to fit all 269 labels in context
Later: LoRA fine-tune on labeled data (context→actions→outcome format)

Why OpenExp?

Feature	OpenExp	Mem0	Zep/Graphiti	LangMem
Learns from outcomes	Yes — Q-learning from real business results	No	No	No
Process-aware	Define pipeline stages with reward signals	No	No	No
Memory type filtering	Reward only decisions/insights, not noise	No	No	No
Outcome-based rewards	CRM deal closes → tagged memories get rewarded	No	No	No
Claude Code native	Zero-config hooks, works out of the box	Requires integration	Requires integration	Requires integration
Local-first	Qdrant + FastEmbed, no cloud, no API key for core	Cloud API	Cloud or self-hosted	Cloud API
Hybrid retrieval	BM25 + vector + recency + importance + Q-value (5 signals)	Vector only	Graph + vector	Vector only
Privacy	All data stays on your machine	Data sent to cloud	Depends on setup	Data sent to cloud

The key difference: skills say how. Memory tools store. OpenExp learns what works — from real outcomes.

Quick Start

git clone https://github.com/anthroos/openexp.git
cd openexp
./setup.sh

That's it. Open Claude Code in any project — it now has memory.

Tip

No API key needed for core functionality. Embeddings run locally via FastEmbed. An Anthropic API key is optional — it enables auto-enrichment (type classification, tags, validity windows) but everything works great without it.

Prerequisites: Python 3.11+, Docker, jq

What You'll See

When you open Claude Code after a few sessions:

# OpenExp Memory (Q-value ranked)
Query: my-project | Monday 2026-03-22

## Relevant Context
[sim=0.82 q=0.73] Fixed auth bug by adding token refresh logic in api/auth.py
[sim=0.76 q=0.65] Project uses FastAPI + PostgreSQL, deployed on Railway
[sim=0.71 q=0.58] User prefers pytest with fixtures, not unittest

q=0.73 means this memory consistently leads to productive sessions. q=0.31 means it's been recalled but didn't help — it'll rank lower next time.

How It Works

Three hooks integrate with Claude Code automatically:

Hook	When	What
SessionStart	Session opens	Searches Qdrant for relevant memories, injects top results as context
UserPromptSubmit	Every message	Lightweight recall — adds relevant memories to each prompt
PostToolUse	After Write/Edit/Bash	Captures what Claude does as observations (JSONL)
SessionEnd	Session closes	Summary → ingest → reward → decision extraction (async)

After each session, Opus 4.6 reads the conversation transcript and extracts decisions (not actions) — strategic choices, insights, and commitments that have value for future similar situations. See Decision Extraction.

The MCP server provides 16 tools for memory operations, introspection, and calibration.

The Learning Loop

┌──────────────────────────────────────────────────────────────┐
│                                                              │
│   ┌─────────┐    search     ┌────────┐    inject    ┌─────┐ │
│   │ Qdrant  │──────────────→│ Scorer │────────────→│ LLM │ │
│   │ (384d)  │               │        │              │     │ │
│   └────┬────┘               └────────┘              └──┬──┘ │
│        │                    BM25 10%                    │    │
│        │                    Vector 30%                  │    │
│   Q-values                  Recency 15%            observations
│   updated                   Importance 15%             │    │
│        │                    Q-value 30%                 │    │
│        │                                               │    │
│   ┌────┴────┐   reward    ┌──────────┐   ingest   ┌───┴──┐ │
│   │ Q-Cache │←────────────│ Reward   │←───────────│ JSONL│ │
│   │  (LRU)  │             │ Tracker  │            │ obs  │ │
│   └─────────┘             └──────────┘            └──────┘ │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Q-Learning Details

Every memory has a Q-value (starts at 0.0 — earn value from zero). Three layers capture different aspects:

Layer	Weight	Measures
action	50%	Did recalling this help get work done?
hypothesis	20%	Was the information accurate?
fit	30%	Was it relevant to the context?

Update rule:

Q_new = clamp(Q_old + α × reward, floor, ceiling)

α = 0.25 (learning rate)
reward ∈ [-1.0, 1.0] (productivity signal)
floor = -0.5, ceiling = 1.0

Retrieval scoring combines five signals:

score = 0.30 × vector_similarity    # semantic match
      + 0.10 × bm25_score           # keyword match
      + 0.15 × recency              # exponential decay (90-day half-life)
      + 0.15 × importance           # type-weighted metadata
      + 0.30 × q_value              # learned quality

With 10% epsilon-greedy exploration — occasionally surfaces low-Q memories to give them another chance.

MCP Tools

Five focused tools (hippocampus model — write everything, retrieve selectively):

Tool	Description
`search_memory`	Hybrid search: BM25 + vector + recency + importance + Q-value reranking. Filter by type (e.g., `type="experience"` for experience labels)
`add_memory`	Store memory with auto-enrichment (type, tags, validity). Supports `client_id` for entity tagging
`log_prediction`	Track a prediction for later outcome resolution
`log_outcome`	Resolve prediction with reward → updates Q-values
`memory_stats`	Collection stats, point counts by source/type, session count

CLI

# Search memories
openexp search -q "authentication flow" -n 5

# Search only experience labels
openexp search -q "client demo" -n 5 -t experience

# Ingest transcripts into Qdrant
openexp ingest

# Experience Library pipeline
openexp chunk                    # chunk transcripts into ~200K token batches
openexp topics                   # extract topics per chunk via LLM
# Thread grouping + experience labeling via scripts/batch_label.py

# Show stats
openexp stats

# Memory compaction (merge similar memories)
openexp compact --dry-run

# Manage experience profiles
openexp experience list
openexp experience show sales

# Visualization
openexp viz --replay latest      # session replay

Configuration

All settings via environment variables (.env):

Variable	Default	Description
`QDRANT_HOST`	`localhost`	Qdrant server host
`QDRANT_PORT`	`6333`	Qdrant server port
`QDRANT_API_KEY`	(none)	Optional: Qdrant auth (also passed to Docker)
`OPENEXP_COLLECTION`	`openexp_memories`	Qdrant collection name
`OPENEXP_DATA_DIR`	`~/.openexp/data`	Q-cache, predictions, retrieval logs
`OPENEXP_OBSERVATIONS_DIR`	`~/.openexp/observations`	Where hooks write observations
`OPENEXP_SESSIONS_DIR`	`~/.openexp/sessions`	Session summary files
`OPENEXP_EMBEDDING_MODEL`	`BAAI/bge-small-en-v1.5`	Embedding model (local, free)
`OPENEXP_EMBEDDING_DIM`	`384`	Embedding dimensions
`OPENEXP_INGEST_BATCH_SIZE`	`50`	Batch size for ingestion
`OPENEXP_OUTCOME_RESOLVERS`	(none)	Outcome resolvers (format: `module:Class`)
`OPENEXP_CRM_DIR`	(none)	CRM directory for CRMCSVResolver
`ANTHROPIC_API_KEY`	(none)	Optional: enables LLM-based enrichment
`OPENEXP_ENRICHMENT_MODEL`	`claude-haiku-4-5-20251001`	Model for auto-enrichment

Anthropic API key is optional. Without it, memories get default metadata. With it, each memory is automatically classified (type, importance, tags, validity window).

Architecture

openexp/
├── core/                       # Q-learning memory engine
│   ├── q_value.py              # Q-learning: QCache, QValueUpdater, QValueScorer
│   ├── direct_search.py        # FastEmbed (384d) + Qdrant vector search
│   ├── hybrid_search.py        # BM25 keyword + vector + Q-value hybrid scoring
│   ├── scoring.py              # Composite relevance: similarity × recency × importance
│   ├── lifecycle.py            # 8-state memory lifecycle (active→confirmed→archived→...)
│   ├── experience.py           # Per-domain Q-value contexts (default, sales, dealflow)
│   ├── enrichment.py           # Auto-metadata extraction (LLM or defaults)
│   ├── explanation.py          # L4: LLM-generated reward explanations
│   ├── reward_log.py           # L3: cold storage of reward events
│   ├── compaction.py           # Memory merging/clustering
│   ├── v7_extensions.py        # Lifecycle filter + hybrid scoring integration
│   └── config.py               # Environment-based configuration
│
├── ingest/                     # Observation → Qdrant pipeline
│   ├── observation.py          # JSONL observations → embeddings → Qdrant
│   ├── session_summary.py      # Session .md files → memory objects
│   ├── reward.py               # Reward utilities (used by outcome resolvers)
│   ├── retrieval_log.py        # Closed-loop: which memories were recalled
│   ├── watermark.py            # Idempotent ingestion tracking
│   ├── filters.py              # Filter trivial observations
│   └── extract_decisions.py    # Opus 4.6 decision extraction from transcripts
│
├── resolvers/                  # Outcome resolvers (pluggable)
│   └── crm_csv.py              # CRM CSV stage transition → reward events
│
├── data/experiences/           # Shipped experience configs
│   ├── default.yaml            # Software engineering
│   ├── sales.yaml              # Sales & outreach
│   └── dealflow.yaml           # Deal pipeline
│
├── outcome.py                  # Outcome resolution framework
│
├── hooks/                      # Claude Code integration
│   ├── session-start.sh        # Inject Q-ranked memories at startup
│   ├── user-prompt-recall.sh   # Per-message context recall
│   ├── post-tool-use.sh        # Capture observations from tool calls
│   └── session-end.sh          # Summary + ingest + reward (closes the loop)
│
├── mcp_server.py               # MCP STDIO server (16 tools, JSON-RPC 2.0)
├── reward_tracker.py           # Prediction → outcome → Q-value updates
├── viz.py                      # Visualization + session replay
└── cli.py                      # CLI: search, ingest, stats, viz, compact, experience

Memory Lifecycle

Memories move through 8 states to prevent stale context:

active ──→ confirmed ──→ outdated ──→ archived ──→ deleted
  │            │                          ↑
  ├──→ contradicted ──────────────────────┘
  ├──→ merged
  └──→ superseded

Only active and confirmed memories are returned in searches. Status weights affect scoring: confirmed=1.2×, active=1.0×, outdated=0.5×, archived=0.3×.

Data Flow

PostToolUse hook                                  SessionStart hook
      │                                                 ↑
      ↓                                                 │
~/.openexp/observations/*.jsonl                Qdrant search (top 10)
      │                                          + Q-value reranking
      ↓                                                 ↑
SessionEnd hook ──→ summary .md                         │
      │                                                 │
      ↓ (async)                                         │
openexp ingest ──→ FastEmbed ──→ Qdrant ─────────────────┘
      │                            ↑
      ↓                            │
Q-Cache (q_cache.json) ←── reward signal ←── outcomes (CRM, predictions, retro)

Technical Details

Component	Choice	Why
Embeddings	FastEmbed (BAAI/bge-small-en-v1.5)	Local, free, no API key, 384 dimensions
Vector DB	Qdrant	Fast ANN search, payload filtering, Docker-ready
Q-Cache	In-memory LRU (100K entries)	Fast lookup, delta-based persistence for concurrent sessions
Transport	MCP STDIO (JSON-RPC 2.0)	Native Claude Code integration
Hooks	Bash scripts	Minimal dependencies, shell-level integration

Troubleshooting

Docker / Qdrant won't start:

# Check Docker is running
docker info

# Check Qdrant container
docker ps -a | grep openexp-qdrant
docker logs openexp-qdrant

Hooks not firing:

# Verify hooks are registered
cat ~/.claude/settings.local.json | jq '.hooks'

# Re-run setup to fix registration
./setup.sh

No memories appearing: Memories need to be ingested first. After a few Claude Code sessions:

openexp ingest --dry-run   # preview what will be ingested
openexp ingest             # ingest into Qdrant
openexp stats              # check Q-cache state

Experiences — Define Your Process

Not everyone writes code. An Experience defines what "productive" means for your workflow, including pipeline stages and which memory types matter.

Experience	Process	Top Signals
`default`	backlog → in_progress → review → merged → deployed	commits, PRs, tests
`sales`	lead → contacted → qualified → proposal → negotiation → won	decisions, emails, follow-ups
`dealflow`	lead → discovery → nda → proposal → negotiation → invoice → paid	proposals, invoices, payments

Switch with one env var:

export OPENEXP_EXPERIENCE=dealflow

Each experience also controls which memory types get rewarded — sales rewards decisions and insights, not raw tool actions. This means the system learns faster because it focuses on the signal, not the noise.

Create your own with the interactive wizard:

openexp experience create
# Pick a process type (dev/sales/support/content)
# Customize stages, signal weights, memory type filters

See the Experiences Guide for full details.

Documentation

Detailed docs are available in the docs/ directory:

How It Works — the 4-phase learning cycle
Decision Extraction — Opus 4.6 extracts decisions, not actions
Storage System — 5-level pyramid (L0-L4), all 5 reward paths
Experiences — domain-specific reward profiles (create your own)
Architecture — system design and data flow
Configuration — all environment variables and options

Contributing

This project is in early stages. See CONTRIBUTING.md for setup and workflow.

Key areas where help is welcome:

New experiences — domain-specific reward profiles (DevOps, writing, research, etc.)
Outcome resolvers — new integrations beyond CRM (Jira, Linear, GitHub Issues)
Multi-project learning — sharing relevant context across projects
Benchmarks — measuring retrieval quality improvement over time
Automated lifecycle transitions — contradiction detection, staleness heuristics

Research

OpenExp implements value-driven memory retrieval inspired by MemRL, adapted for episodic memory in AI coding assistants.

Core insight: treating memory retrieval as a reinforcement learning problem — where the reward signal comes from real session outcomes — produces better context selection than similarity-only search.

Citation

If you use OpenExp in your research, please cite:

@article{pasichnyk2026yerkes,
  title={The Yerkes-Dodson Curve for AI Agents: Optimal Pressure in Multi-Agent Survival Games},
  author={Pasichnyk, Ivan},
  journal={arXiv preprint arXiv:2603.07360},
  year={2026},
  url={https://arxiv.org/abs/2603.07360}
}

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github		.github
docs		docs
openexp		openexp
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
backlog.yaml		backlog.yaml
backlog_cli.py		backlog_cli.py
landing.html		landing.html
openexp-architecture.html		openexp-architecture.html
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenExp

Example: sales agent

The Problem

The Solution

Five Reward Paths

Experience Library

Why OpenExp?

Quick Start

What You'll See

How It Works

The Learning Loop

Q-Learning Details

MCP Tools

CLI

Configuration

Architecture

Memory Lifecycle

Data Flow

Technical Details

Troubleshooting

Experiences — Define Your Process

Documentation

Contributing

Research

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenExp

Example: sales agent

The Problem

The Solution

Five Reward Paths

Experience Library

Why OpenExp?

Quick Start

What You'll See

How It Works

The Learning Loop

Q-Learning Details

MCP Tools

CLI

Configuration

Architecture

Memory Lifecycle

Data Flow

Technical Details

Troubleshooting

Experiences — Define Your Process

Documentation

Contributing

Research

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages