integration-guide.md

Ghost Integration Guide

How to integrate Ghost into custom agents, bots, and scripts. For Claude Code specifically, see quickstart-claude-code.md.

Install

# Homebrew (macOS / Linux)
brew install rcliao/tap/ghost

# Or from source (requires Go)
go install github.com/rcliao/ghost/cmd/ghost@latest

Pre-built binaries for all platforms are available on GitHub Releases.

Integration Methods

Method	Best For	Setup
Go Library	Custom Go agents, Telegram bots, orchestrators	`import memory "github.com/rcliao/ghost"`
CLI	Shell scripts, cron jobs, any language via subprocess	`ghost put`, `ghost search`, `ghost context`
MCP Server	Any MCP-compatible client	`ghost mcp-serve` as stdio transport

Go Library

import memory "github.com/rcliao/ghost"

store, err := memory.NewSQLiteStore("~/.ghost/memory.db")
if err != nil { log.Fatal(err) }
defer store.Close()

Storing Memories

mem, err := store.Put(ctx, memory.PutParams{
    NS:         "project:myapp",
    Key:        "auth-architecture",
    Content:    "Using JWT with refresh tokens, 15min access / 7d refresh",
    Kind:       "semantic",       // semantic | episodic | procedural
    Priority:   "high",           // low | normal | high | critical
    Importance: 0.8,              // 0.0-1.0, affects retrieval ranking
    Tier:       "ltm",            // sensory | stm (default) | ltm | dormant
    Pinned:     false,            // true = always loaded in context, exempt from decay
    Tags:       []string{"auth", "architecture"},
})

Retrieving Context

result, err := store.Context(ctx, memory.ContextParams{
    NS:     "project:myapp",
    Query:  "authentication flow",
    Budget: 2000, // max tokens
})

for _, mem := range result.Memories {
    fmt.Printf("[%s] %s (score: %.2f)\n", mem.Key, mem.Content, mem.Score)
}

if result.CompactionSuggested {
    // Too many memories competing for budget — run reflect
    store.Reflect(ctx, memory.ReflectParams{NS: "project:myapp"})
}

Logging Exchanges

For conversational agents, store exchanges as episodic memory with TTL:

store.Put(ctx, memory.PutParams{
    NS:         "agent:mybot",
    Key:        fmt.Sprintf("exchange-%d", time.Now().UnixMilli()),
    Content:    fmt.Sprintf("User: %s\nAssistant: %s", userMsg, response),
    Kind:       "episodic",
    Tags:       []string{"chat:123"},
    TTL:        "7d",
    Importance: 0.3,
})

Curating Memories

result, err := store.Curate(ctx, memory.CurateParams{
    NS:  "project:myapp",
    Key: "auth-architecture",
    Op:  "promote",  // promote | demote | boost | diminish | archive | delete | pin | unpin
})
fmt.Printf("%s: %s → %s\n", result.Op, result.OldTier, result.NewTier)

Running Reflect

result, err := store.Reflect(ctx, memory.ReflectParams{
    NS:     "project:myapp",   // optional, empty = all namespaces
    DryRun: false,
})
fmt.Printf("Evaluated: %d, Promoted: %d, Decayed: %d, Deleted: %d\n",
    result.MemoriesEvaluated, result.Promoted, result.Decayed, result.Deleted)

System Prompt Injection

Load pinned memories into the system prompt on every request:

result, _ := store.Context(ctx, memory.ContextParams{
    NS:     "agent:mybot",
    Query:  "",        // empty query = pinned only
    Budget: 2000,
})

systemPrompt := "## Core Knowledge\n"
for _, m := range result.Memories {
    systemPrompt += "- " + m.Content + "\n"
}

Per-Query Context Injection (RAG)

Prepend relevant memories to the user message before sending to the LLM:

result, _ := store.Context(ctx, memory.ContextParams{
    NS:    "agent:mybot",
    Query: userMessage,
    Tags:  []string{"chat:123"},
    Budget: 2000,
})

augmented := "[Relevant memories]\n"
for _, m := range result.Memories {
    augmented += "- " + m.Content + "\n"
}
augmented += "[End memories]\n\n" + userMessage

CLI Integration

For non-Go agents, use the CLI via subprocess. All output is JSON by default.

# Store a memory
ghost put -n "project:myapp" -k "decision-db" \
  --kind semantic --importance 0.8 \
  "Chose PostgreSQL over MySQL for JSONB support"

# Get context for a task
ghost context -n "project:myapp" -q "database setup" --budget 2000

# Retrieve a specific memory by key
ghost get -n "project:myapp" -k "decision-db"

# Search for specific knowledge
ghost search "PostgreSQL" -n "project:myapp"

# Consolidate related memories into a summary
ghost consolidate -n "project:myapp" --summary-key db-overview \
  --keys "decision-db,db-migration,db-indexing" \
  --content "Database: PostgreSQL with JSONB, UUID PKs, GIN indexes"

# Curate a specific memory
ghost curate -n "project:myapp" -k "old-decision" --op archive
ghost curate -n "project:myapp" -k "key-insight" --op promote
ghost curate -n "project:myapp" -k "critical-fact" --op boost

# Manage edges
ghost edge -n "project:myapp" --from-key db-decision --to-key db-migration -r depends_on
ghost edge -n "project:myapp" --from-key db-decision --list

# Run lifecycle maintenance
ghost reflect --ns "project:myapp"
ghost reflect --dry-run  # preview without applying

Example: Python Agent

import subprocess, json

def ghost_put(ns, key, content, importance=0.5):
    subprocess.run([
        "ghost", "put", "-n", ns, "-k", key,
        "--importance", str(importance), content
    ])

def ghost_context(query, ns=None, budget=2000):
    cmd = ["ghost", "context", "-q", query, "--budget", str(budget)]
    if ns:
        cmd.extend(["-n", ns])
    result = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(result.stdout)

Memory Lifecycle Patterns

Session Summarization

After N conversational exchanges, consolidate old episodic memories into a single semantic summary:

List episodic exchanges for the chat namespace
Keep the most recent 5 exchanges intact
Merge older exchanges into a summary
Store the summary as semantic memory
Delete the individual exchanges

Memory Review with Curate

Use ghost_search or ghost_context to surface memories, then ghost_curate to act on specific ones:

Query memories for a topic: ghost_search(query="deployment")
Review each result — is it still accurate? still useful?
Promote valuable ones: ghost_curate(ns, key, op="promote")
Archive outdated ones: ghost_curate(ns, key, op="archive")
Boost frequently-needed facts: ghost_curate(ns, key, op="boost")

Compaction-Triggered Reflect

When ghost_context returns compaction_suggested: true:

Run ghost_reflect to promote/decay/prune
Optionally run ghost gc to hard-delete expired memories
Re-query context — it should now fit better within budget

Contextual Compression (recommended for question-answering agents)

Ghost returns full session chunks optimized for grep-style retrieval. For a question-answering agent, an LLM between Ghost and the answering call consistently improves quality by compressing retrieved context into a bullet list of query-relevant facts. This is the pattern that wins the LoCoMo-Plus cognitive-memory benchmark (see docs/eval.md).

Pipeline:

ghost search -q "<user question>" -n <ns> --limit 15 — wider recall than you'd normally show, since compression filters noise.
Feed the JSON results to an LLM with a short prompt:

"Extract every fact from the memories that could inform the question. Preserve specific details (names, dates, preferences). Output 1-8 bullets. Do not answer — just extract."
Prepend the compressed bullets to the user's question and send to the answering model.

# Pseudocode — adapt to your LLM client
mems = ghost_cli(["search", "-q", question, "-n", ns, "--limit", "15"])
bullets = llm_call(system=COMPRESS_PROMPT,
                   user=f"QUESTION: {question}\n\nMEMORIES:\n{mems}")
answer  = llm_call(system=ANSWER_PROMPT,
                   user=f"{bullets}\n\n{question}")

Why it works: Raw chunks contain off-topic conversation. The compression step surfaces the latent signal (user values, past decisions, emotional context) that the answering model would otherwise drown out. Wider recall (top-15 vs top-5) gives the compressor more candidates; because its output is short, the answering prompt stays small.

Cost: ~2× the LLM calls of a direct answer, but typically fewer answering input tokens because the bullets are shorter than raw chunks.

Reasoning Edge Inference

relates_to edges are created automatically by embedding similarity, but they don't distinguish topical similarity from causal reasoning. To enrich the graph with typed reasoning edges (caused_by, prevents, implies):

# Uses claude -p by default; ANTHROPIC_API_KEY overrides to API
ghost infer-edges --ns agent:mybot --max-pairs 100 --dry-run

# When happy with the proposals, re-run without --dry-run
ghost infer-edges --ns agent:mybot --max-pairs 100

This is an offline batch step — Ghost's hot path (Search, Context) remains LLM-free. Idempotent: pairs that already have a reasoning edge are skipped.

Consolidation Workflow

When many memories accumulate on a topic:

Run ghost clusters -n <ns> to discover groups (CLI), or ghost_expand(ns=...) with no key (MCP) to see existing consolidation nodes
Review each cluster — do they belong together?

Write a summary and consolidate (creates summary + contains edges in one call):

# CLI
ghost consolidate -n agent:mybot --summary-key auth-overview \
  --keys "auth-jwt,auth-expiry,auth-cookies" \
  --content "Auth: JWT+RSA256, 24h expiry, refresh via cookies"

# MCP tool
ghost_consolidate(ns="agent:mybot", summary_key="auth-overview",
  source_keys=["auth-jwt", "auth-expiry", "auth-cookies"],
  content="Auth: JWT+RSA256, 24h expiry, refresh via cookies")

The summary gets contains edges to sources; in context assembly, the summary replaces its children (parent boosting + child suppression)
Use ghost_expand(ns="agent:mybot", key="auth-overview") to drill into the summary and see its children

Reflect Rules

Ghost ships with 7 built-in rules. Pinned memories are exempt from all rules. Add custom rules:

# Archive procedural memories older than 30 days with low access
ghost rule set \
  --name "archive-old-procedures" \
  --cond-tier stm \
  --cond-kind procedural \
  --cond-age-gt 720 \
  --cond-access-lt 2 \
  --action ARCHIVE

# Promote high-importance memories quickly
ghost rule set \
  --name "fast-promote-important" \
  --cond-tier stm \
  --cond-age-gt 12 \
  --cond-access-gt 2 \
  --action PROMOTE

# Merge similar episodic memories (deduplication)
ghost rule set \
  --name "merge-similar-episodes" \
  --cond-tier stm \
  --cond-kind episodic \
  --cond-similarity-gt 0.85 \
  --action MERGE \
  --action-params '{"strategy": "keep_highest_importance"}'

Rules are evaluated in two passes during ghost reflect:

Per-memory pass — first matching rule wins per memory
Similarity merge pass — rules with --cond-similarity-gt compare embeddings pairwise and consolidate

Key Concepts

Namespaces — Agent identity. Each namespace is isolated. Use agent:<name> for per-agent memory.

Tags — First-class metadata: identity, lore, project:<name>, chat:<id>, learning, convention, user:<name>.

Pinned memories — Always loaded in context (Phase 1). Exempt from lifecycle decay.

Compaction signal — ghost_context returns compaction_suggested: true when budget is exhausted. Trigger ghost_reflect.

Token budgets — ghost_context accepts a budget parameter (default 4000 tokens).

Curate vs Reflect — curate acts on a specific memory (intent-driven). reflect applies rules across all memories (rule-driven).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ghost Integration Guide

Install

Integration Methods

Go Library

Storing Memories

Retrieving Context

Logging Exchanges

Curating Memories

Running Reflect

System Prompt Injection

Per-Query Context Injection (RAG)

CLI Integration

Example: Python Agent

Memory Lifecycle Patterns

Session Summarization

Memory Review with Curate

Compaction-Triggered Reflect

Contextual Compression (recommended for question-answering agents)

Reasoning Edge Inference

Consolidation Workflow

Reflect Rules

Key Concepts

FilesExpand file tree

integration-guide.md

Latest commit

History

integration-guide.md

File metadata and controls

Ghost Integration Guide

Install

Integration Methods

Go Library

Storing Memories

Retrieving Context

Logging Exchanges

Curating Memories

Running Reflect

System Prompt Injection

Per-Query Context Injection (RAG)

CLI Integration

Example: Python Agent

Memory Lifecycle Patterns

Session Summarization

Memory Review with Curate

Compaction-Triggered Reflect

Contextual Compression (recommended for question-answering agents)

Reasoning Edge Inference

Consolidation Workflow

Reflect Rules

Key Concepts