How to integrate Ghost into custom agents, bots, and scripts. For Claude Code specifically, see quickstart-claude-code.md.
# Homebrew (macOS / Linux)
brew install rcliao/tap/ghost
# Or from source (requires Go)
go install github.com/rcliao/ghost/cmd/ghost@latestPre-built binaries for all platforms are available on GitHub Releases.
| Method | Best For | Setup |
|---|---|---|
| Go Library | Custom Go agents, Telegram bots, orchestrators | import memory "github.com/rcliao/ghost" |
| CLI | Shell scripts, cron jobs, any language via subprocess | ghost put, ghost search, ghost context |
| MCP Server | Any MCP-compatible client | ghost mcp-serve as stdio transport |
import memory "github.com/rcliao/ghost"
store, err := memory.NewSQLiteStore("~/.ghost/memory.db")
if err != nil { log.Fatal(err) }
defer store.Close()mem, err := store.Put(ctx, memory.PutParams{
NS: "project:myapp",
Key: "auth-architecture",
Content: "Using JWT with refresh tokens, 15min access / 7d refresh",
Kind: "semantic", // semantic | episodic | procedural
Priority: "high", // low | normal | high | critical
Importance: 0.8, // 0.0-1.0, affects retrieval ranking
Tier: "ltm", // sensory | stm (default) | ltm | dormant
Pinned: false, // true = always loaded in context, exempt from decay
Tags: []string{"auth", "architecture"},
})result, err := store.Context(ctx, memory.ContextParams{
NS: "project:myapp",
Query: "authentication flow",
Budget: 2000, // max tokens
})
for _, mem := range result.Memories {
fmt.Printf("[%s] %s (score: %.2f)\n", mem.Key, mem.Content, mem.Score)
}
if result.CompactionSuggested {
// Too many memories competing for budget — run reflect
store.Reflect(ctx, memory.ReflectParams{NS: "project:myapp"})
}For conversational agents, store exchanges as episodic memory with TTL:
store.Put(ctx, memory.PutParams{
NS: "agent:mybot",
Key: fmt.Sprintf("exchange-%d", time.Now().UnixMilli()),
Content: fmt.Sprintf("User: %s\nAssistant: %s", userMsg, response),
Kind: "episodic",
Tags: []string{"chat:123"},
TTL: "7d",
Importance: 0.3,
})result, err := store.Curate(ctx, memory.CurateParams{
NS: "project:myapp",
Key: "auth-architecture",
Op: "promote", // promote | demote | boost | diminish | archive | delete | pin | unpin
})
fmt.Printf("%s: %s → %s\n", result.Op, result.OldTier, result.NewTier)result, err := store.Reflect(ctx, memory.ReflectParams{
NS: "project:myapp", // optional, empty = all namespaces
DryRun: false,
})
fmt.Printf("Evaluated: %d, Promoted: %d, Decayed: %d, Deleted: %d\n",
result.MemoriesEvaluated, result.Promoted, result.Decayed, result.Deleted)Load pinned memories into the system prompt on every request:
result, _ := store.Context(ctx, memory.ContextParams{
NS: "agent:mybot",
Query: "", // empty query = pinned only
Budget: 2000,
})
systemPrompt := "## Core Knowledge\n"
for _, m := range result.Memories {
systemPrompt += "- " + m.Content + "\n"
}Prepend relevant memories to the user message before sending to the LLM:
result, _ := store.Context(ctx, memory.ContextParams{
NS: "agent:mybot",
Query: userMessage,
Tags: []string{"chat:123"},
Budget: 2000,
})
augmented := "[Relevant memories]\n"
for _, m := range result.Memories {
augmented += "- " + m.Content + "\n"
}
augmented += "[End memories]\n\n" + userMessageFor non-Go agents, use the CLI via subprocess. All output is JSON by default.
# Store a memory
ghost put -n "project:myapp" -k "decision-db" \
--kind semantic --importance 0.8 \
"Chose PostgreSQL over MySQL for JSONB support"
# Get context for a task
ghost context -n "project:myapp" -q "database setup" --budget 2000
# Retrieve a specific memory by key
ghost get -n "project:myapp" -k "decision-db"
# Search for specific knowledge
ghost search "PostgreSQL" -n "project:myapp"
# Consolidate related memories into a summary
ghost consolidate -n "project:myapp" --summary-key db-overview \
--keys "decision-db,db-migration,db-indexing" \
--content "Database: PostgreSQL with JSONB, UUID PKs, GIN indexes"
# Curate a specific memory
ghost curate -n "project:myapp" -k "old-decision" --op archive
ghost curate -n "project:myapp" -k "key-insight" --op promote
ghost curate -n "project:myapp" -k "critical-fact" --op boost
# Manage edges
ghost edge -n "project:myapp" --from-key db-decision --to-key db-migration -r depends_on
ghost edge -n "project:myapp" --from-key db-decision --list
# Run lifecycle maintenance
ghost reflect --ns "project:myapp"
ghost reflect --dry-run # preview without applyingimport subprocess, json
def ghost_put(ns, key, content, importance=0.5):
subprocess.run([
"ghost", "put", "-n", ns, "-k", key,
"--importance", str(importance), content
])
def ghost_context(query, ns=None, budget=2000):
cmd = ["ghost", "context", "-q", query, "--budget", str(budget)]
if ns:
cmd.extend(["-n", ns])
result = subprocess.run(cmd, capture_output=True, text=True)
return json.loads(result.stdout)After N conversational exchanges, consolidate old episodic memories into a single semantic summary:
- List episodic exchanges for the chat namespace
- Keep the most recent 5 exchanges intact
- Merge older exchanges into a summary
- Store the summary as semantic memory
- Delete the individual exchanges
Use ghost_search or ghost_context to surface memories, then ghost_curate to act on specific ones:
- Query memories for a topic:
ghost_search(query="deployment") - Review each result — is it still accurate? still useful?
- Promote valuable ones:
ghost_curate(ns, key, op="promote") - Archive outdated ones:
ghost_curate(ns, key, op="archive") - Boost frequently-needed facts:
ghost_curate(ns, key, op="boost")
When ghost_context returns compaction_suggested: true:
- Run
ghost_reflectto promote/decay/prune - Optionally run
ghost gcto hard-delete expired memories - Re-query context — it should now fit better within budget
Ghost returns full session chunks optimized for grep-style retrieval. For a
question-answering agent, an LLM between Ghost and the answering call
consistently improves quality by compressing retrieved context into a
bullet list of query-relevant facts. This is the pattern that wins the
LoCoMo-Plus cognitive-memory benchmark (see docs/eval.md).
Pipeline:
ghost search -q "<user question>" -n <ns> --limit 15— wider recall than you'd normally show, since compression filters noise.- Feed the JSON results to an LLM with a short prompt:
"Extract every fact from the memories that could inform the question. Preserve specific details (names, dates, preferences). Output 1-8 bullets. Do not answer — just extract."
- Prepend the compressed bullets to the user's question and send to the answering model.
# Pseudocode — adapt to your LLM client
mems = ghost_cli(["search", "-q", question, "-n", ns, "--limit", "15"])
bullets = llm_call(system=COMPRESS_PROMPT,
user=f"QUESTION: {question}\n\nMEMORIES:\n{mems}")
answer = llm_call(system=ANSWER_PROMPT,
user=f"{bullets}\n\n{question}")Why it works: Raw chunks contain off-topic conversation. The compression step surfaces the latent signal (user values, past decisions, emotional context) that the answering model would otherwise drown out. Wider recall (top-15 vs top-5) gives the compressor more candidates; because its output is short, the answering prompt stays small.
Cost: ~2× the LLM calls of a direct answer, but typically fewer answering input tokens because the bullets are shorter than raw chunks.
relates_to edges are created automatically by embedding similarity, but
they don't distinguish topical similarity from causal reasoning. To enrich
the graph with typed reasoning edges (caused_by, prevents, implies):
# Uses claude -p by default; ANTHROPIC_API_KEY overrides to API
ghost infer-edges --ns agent:mybot --max-pairs 100 --dry-run
# When happy with the proposals, re-run without --dry-run
ghost infer-edges --ns agent:mybot --max-pairs 100This is an offline batch step — Ghost's hot path (Search, Context) remains LLM-free. Idempotent: pairs that already have a reasoning edge are skipped.
When many memories accumulate on a topic:
- Run
ghost clusters -n <ns>to discover groups (CLI), orghost_expand(ns=...)with no key (MCP) to see existing consolidation nodes - Review each cluster — do they belong together?
- Write a summary and consolidate (creates summary + contains edges in one call):
# CLI ghost consolidate -n agent:mybot --summary-key auth-overview \ --keys "auth-jwt,auth-expiry,auth-cookies" \ --content "Auth: JWT+RSA256, 24h expiry, refresh via cookies"
# MCP tool ghost_consolidate(ns="agent:mybot", summary_key="auth-overview", source_keys=["auth-jwt", "auth-expiry", "auth-cookies"], content="Auth: JWT+RSA256, 24h expiry, refresh via cookies")
- The summary gets
containsedges to sources; in context assembly, the summary replaces its children (parent boosting + child suppression) - Use
ghost_expand(ns="agent:mybot", key="auth-overview")to drill into the summary and see its children
Ghost ships with 7 built-in rules. Pinned memories are exempt from all rules. Add custom rules:
# Archive procedural memories older than 30 days with low access
ghost rule set \
--name "archive-old-procedures" \
--cond-tier stm \
--cond-kind procedural \
--cond-age-gt 720 \
--cond-access-lt 2 \
--action ARCHIVE
# Promote high-importance memories quickly
ghost rule set \
--name "fast-promote-important" \
--cond-tier stm \
--cond-age-gt 12 \
--cond-access-gt 2 \
--action PROMOTE
# Merge similar episodic memories (deduplication)
ghost rule set \
--name "merge-similar-episodes" \
--cond-tier stm \
--cond-kind episodic \
--cond-similarity-gt 0.85 \
--action MERGE \
--action-params '{"strategy": "keep_highest_importance"}'Rules are evaluated in two passes during ghost reflect:
- Per-memory pass — first matching rule wins per memory
- Similarity merge pass — rules with
--cond-similarity-gtcompare embeddings pairwise and consolidate
Namespaces — Agent identity. Each namespace is isolated. Use agent:<name> for per-agent memory.
Tags — First-class metadata: identity, lore, project:<name>, chat:<id>, learning, convention, user:<name>.
Pinned memories — Always loaded in context (Phase 1). Exempt from lifecycle decay.
Compaction signal — ghost_context returns compaction_suggested: true when budget is exhausted. Trigger ghost_reflect.
Token budgets — ghost_context accepts a budget parameter (default 4000 tokens).
Curate vs Reflect — curate acts on a specific memory (intent-driven). reflect applies rules across all memories (rule-driven).