Skip to content

Siddhant-K-code/agent-trace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agent-trace

strace for AI agents. Capture and replay every tool call, prompt, and response from Claude Code, Cursor, or any MCP client.

Why

A coding agent rewrites 20 files in a background session. You get a pull request. You do not get the story. Which files did it read first? Why did it call the same tool three times? What failed before it found the fix?

Most tools trace LLM calls. That is one layer. The gap is everything around it: tool calls, file operations, decision points, error recovery, the actual commands the agent ran. agent-strace captures the full session and lets you replay it later. Export to Datadog, Honeycomb, New Relic, or Splunk when you need production observability.

Install

# With uv (recommended)
uv tool install agent-strace

# Or with pip
pip install agent-strace

# Or run without installing
uvx agent-strace replay

Zero dependencies. Python 3.10+ standard library only.

Quick start

Option 1: Claude Code hooks (full session capture)

Captures everything: user prompts, assistant responses, and every tool call (Bash, Edit, Write, Read, Agent, Grep, Glob, WebFetch, WebSearch, all MCP tools).

agent-strace setup        # prints hooks config JSON
agent-strace setup --global  # for all projects

Add the output to .claude/settings.json. Or paste it manually:

{
  "hooks": {
    "UserPromptSubmit": [{ "hooks": [{ "type": "command", "command": "agent-strace hook user-prompt" }] }],
    "PreToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook pre-tool" }] }],
    "PostToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool" }] }],
    "PostToolUseFailure": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool-failure" }] }],
    "Stop": [{ "hooks": [{ "type": "command", "command": "agent-strace hook stop" }] }],
    "SessionStart": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-start" }] }],
    "SessionEnd": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-end" }] }]
  }
}

Then use Claude Code normally.

agent-strace list     # list sessions
agent-strace replay   # replay the latest
agent-strace stats    # tool call frequency and timing

Option 2: MCP proxy (any MCP client)

Wraps any MCP server. Works with Cursor, Windsurf, or any MCP client.

agent-strace record -- npx -y @modelcontextprotocol/server-filesystem /tmp
agent-strace replay

Option 3: Python decorator

Wraps your tool functions directly. No MCP required.

from agent_trace import trace_tool, trace_llm_call, start_session, end_session, log_decision

start_session(name="my-agent")  # add redact=True to strip secrets

@trace_tool
def search_codebase(query: str) -> str:
    return search(query)

@trace_llm_call
def call_llm(messages: list, model: str = "claude-4") -> str:
    return client.chat(messages=messages, model=model)

# Log decision points explicitly
log_decision(
    choice="read_file_first",
    reason="Need to understand current implementation before making changes",
    alternatives=["read_file_first", "search_codebase", "write_fix_directly"],
)

search_codebase("authenticate")
call_llm([{"role": "user", "content": "Fix the bug"}])

meta = end_session()
print(f"Replay with: agent-strace replay {meta.session_id}")

CLI commands

agent-strace setup [--redact] [--global]   Generate Claude Code hooks config
agent-strace hook <event>                  Handle a Claude Code hook event (internal)
agent-strace record -- <command>           Record an MCP stdio server session
agent-strace record-http <url> [--port N]  Record an MCP HTTP/SSE server session
agent-strace replay [session-id]           Replay a session (default: latest)
agent-strace list                          List all sessions
agent-strace stats [session-id]            Show tool call frequency and timing
agent-strace inspect <session-id>          Dump full session as JSON
agent-strace export <session-id>           Export as JSON, CSV, NDJSON, or OTLP

Secret redaction

Pass --redact to strip API keys, tokens, and credentials from traces before they hit disk.

# Stdio proxy with redaction
agent-strace record --redact -- npx -y @modelcontextprotocol/server-filesystem /tmp

# HTTP proxy with redaction
agent-strace record-http https://mcp.example.com --redact

Detected patterns: OpenAI (sk-*), GitHub (ghp_*, github_pat_*), AWS (AKIA*), Anthropic (sk-ant-*), Slack (xox*), JWTs, Bearer tokens, connection strings (postgres://, mysql://), and any value under keys like password, secret, token, api_key, authorization.

HTTP/SSE proxy

For MCP servers that use HTTP transport instead of stdio:

# Proxy a remote MCP server
agent-strace record-http https://mcp.example.com --port 3100

# Your agent connects to http://127.0.0.1:3100 instead of the remote server
# All JSON-RPC messages are captured, tool call latency is measured

The proxy forwards POST /message and GET /sse to the remote server, capturing every JSON-RPC message in both directions.

Replay output

A real Claude Code session captured with hooks:

Session Summary

Session Summary
──────────────────────────────────────────────────
  Session:    201da364-edd6-49
  Command:    claude-code (startup)
  Agent:      claude-code
  Duration:   112.54s
  Tool calls: 8
  Errors:     3
──────────────────────────────────────────────────

+  0.00s ▶ session_start
+  0.07s 👤 user_prompt
              "how many tests does this project have? run them and tell me the results"
+  3.55s → tool_call Glob
              **/*.test.*
+  3.55s → tool_call Glob
              **/test_*.*
+  3.60s ← tool_result Glob (51ms)
+  6.06s → tool_call Bash
              $ python -m pytest tests/ -v 2>&1
+ 27.65s ✗ error Bash
              Command failed with exit code 1
+ 29.89s → tool_call Bash
              $ python3 -m pytest tests/ -v 2>&1
+ 40.56s ✗ error Bash
              No module named pytest
+ 45.96s → tool_call Bash
              $ which pytest || ls /Users/siddhant/Desktop/test-agent-trace/ 2>&1
+ 46.01s ← tool_result Bash (51ms)
+ 48.18s → tool_call Read
              /Users/siddhant/Desktop/test-agent-trace/pyproject.toml
+ 48.23s ← tool_result Read (43ms)
+ 51.43s → tool_call Bash
              $ uv run --with pytest pytest tests/ -v 2>&1
+1m43.67s ← tool_result Bash (5.88s)
              75 tests, all passing in 3.60s
+1m52.54s 🤖 assistant_response
              "75 tests, all passing in 3.60s. Breakdown by file: ..."

Tool calls show actual values: commands, file paths, glob patterns. Errors show what failed. Assistant responses are stripped of markdown.

Filtering

# Show only tool calls and errors
agent-strace replay --filter tool_call,error

# Replay with timing (watch it unfold)
agent-strace replay --live --speed 2

Export

# JSON array
agent-strace export a84664 --format json

# CSV (for spreadsheets)
agent-strace export a84664 --format csv

# NDJSON (for streaming pipelines)
agent-strace export a84664 --format ndjson

Trace format

Traces are stored as directories in .agent-traces/:

.agent-traces/
  a84664242afa4516/
    meta.json        # session metadata
    events.ndjson    # newline-delimited JSON events

Each event is a single JSON line:

{
  "event_type": "tool_call",
  "timestamp": 1773562735.09,
  "event_id": "bf1207728ee6",
  "session_id": "a84664242afa4516",
  "data": {
    "tool_name": "read_file",
    "arguments": {"path": "src/auth.py"}
  }
}

Event types

Type Description
session_start Trace session began
session_end Trace session ended
user_prompt User submitted a prompt to the agent
assistant_response Agent produced a text response
tool_call Agent invoked a tool
tool_result Tool returned a result
llm_request Agent sent a prompt to an LLM
llm_response LLM returned a completion
file_read Agent read a file
file_write Agent wrote a file
decision Agent chose between alternatives
error Something failed

Events link to each other. A tool_result has a parent_id pointing to its tool_call. This lets you measure latency per tool and trace the full call chain.

Use with Claude Code, Cursor, Windsurf

Claude Code (hooks, recommended)

Captures the full session: prompts, responses, and every tool call. See examples/claude_code_config.md for the full config.

agent-strace setup                    # per-project config
agent-strace setup --redact --global  # all projects, with secret redaction

Cursor

Edit ~/.cursor/mcp.json (global) or .cursor/mcp.json (per-project):

{
  "mcpServers": {
    "filesystem": {
      "command": "agent-strace",
      "args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    }
  }
}

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "filesystem": {
      "command": "agent-strace",
      "args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    }
  }
}

Any MCP client

The pattern is the same for any tool that uses MCP over stdio:

  1. Replace the server command with agent-strace
  2. Prepend record --name <label> -- to the original args
  3. Use the tool normally
  4. Run agent-strace replay to see what happened

See the examples/ directory for full config files.

Production tracing (OTLP export)

Export sessions as OpenTelemetry spans to your existing observability stack. Sessions become traces. Tool calls become spans with duration and inputs. Errors get exception events. Zero new dependencies.

Datadog

# Via the Datadog Agent's OTLP receiver (port 4318)
agent-strace export <session-id> --format otlp \
  --endpoint http://localhost:4318

# Or via Datadog's OTLP intake directly
agent-strace export <session-id> --format otlp \
  --endpoint https://http-intake.logs.datadoghq.com:443 \
  --header "DD-API-KEY: $DD_API_KEY"

Honeycomb

agent-strace export <session-id> --format otlp \
  --endpoint https://api.honeycomb.io \
  --header "x-honeycomb-team: $HONEYCOMB_API_KEY" \
  --service-name my-agent

New Relic

agent-strace export <session-id> --format otlp \
  --endpoint https://otlp.nr-data.net \
  --header "api-key: $NEW_RELIC_LICENSE_KEY"

Splunk

agent-strace export <session-id> --format otlp \
  --endpoint https://ingest.<realm>.signalfx.com \
  --header "X-SF-Token: $SPLUNK_ACCESS_TOKEN"

Grafana Tempo / Jaeger

# Local collector
agent-strace export <session-id> --format otlp \
  --endpoint http://localhost:4318

Dump OTLP JSON without sending

# Inspect the OTLP payload
agent-strace export <session-id> --format otlp > trace.json

How it maps

agent-trace OpenTelemetry
session trace
tool_call + tool_result span (with duration)
error span with error status + exception event
user_prompt event on root span
assistant_response event on root span
session_id trace ID
event_id span ID
parent_id parent span ID

How it works

Claude Code hooks

Claude Code agentic loop
  ├── UserPromptSubmit   → agent-strace hook user-prompt
  ├── PreToolUse         → agent-strace hook pre-tool
  ├── PostToolUse        → agent-strace hook post-tool
  ├── PostToolUseFailure → agent-strace hook post-tool-failure
  ├── Stop               → agent-strace hook stop
  ├── SessionStart       → agent-strace hook session-start
  └── SessionEnd         → agent-strace hook session-end
                               ↓
                         .agent-traces/

Claude Code fires hook events at every stage of its agentic loop. agent-strace registers as a handler, reads JSON from stdin, and writes trace events. Each hook runs as a separate process. Session state lives in .agent-traces/.active-session so PreToolUse and PostToolUse can be correlated for latency measurement.

MCP stdio proxy

Agent ←→ agent-strace proxy ←→ MCP Server (stdio)
              ↓
         .agent-traces/

The proxy reads JSON-RPC messages (Content-Length framed or newline-delimited), classifies each one, and writes a trace event. Messages are forwarded unchanged. The agent and server do not know the proxy exists.

MCP HTTP/SSE proxy

Agent ←→ agent-strace proxy (localhost:3100) ←→ Remote MCP Server (HTTPS)
              ↓
         .agent-traces/

Same idea, different transport. Listens on a local port, forwards POST and SSE requests to the remote server, captures every JSON-RPC message in both directions.

Decorator mode

@trace_tool
def my_function(x):
    return x * 2

The decorator logs a tool_call event before execution and a tool_result after. Errors and timing are captured automatically.

Secret redaction

When --redact is enabled (or redact=True in the decorator API), trace events pass through a redaction filter before hitting disk. The filter checks key names (password, api_key) and value patterns (sk-*, ghp_*, JWTs). Redacted values become [REDACTED]. The original data is never stored.

Project structure

src/agent_trace/
  __init__.py       # version
  models.py         # TraceEvent, SessionMeta, EventType
  store.py          # NDJSON file storage
  hooks.py          # Claude Code hooks integration
  proxy.py          # MCP stdio proxy
  http_proxy.py     # MCP HTTP/SSE proxy
  redact.py         # secret redaction
  otlp.py           # OTLP/HTTP JSON exporter
  replay.py         # terminal replay and display
  decorator.py      # @trace_tool, @trace_llm_call, log_decision
  cli.py            # CLI entry point

Running tests

python -m unittest discover -s tests -v

Development

git clone https://github.com/Siddhant-K-code/agent-trace.git
cd agent-trace

# Run tests
python -m unittest discover -s tests -v

# Run the example
PYTHONPATH=src python examples/basic_agent.py

# Replay the example
PYTHONPATH=src python -m agent_trace.cli replay

# Build the package
uv build

# Install locally for testing
uv tool install -e .

Related

Sponsor

If agent-trace saves you time debugging agent sessions, consider sponsoring the project. It helps me keep building tools like this and releasing them for free.

License

MIT. Use it however you want.