strace for AI agents. Capture and replay every tool call, prompt, and response from Claude Code, Cursor, or any MCP client.
A coding agent rewrites 20 files in a background session. You get a pull request. You do not get the story. Which files did it read first? Why did it call the same tool three times? What failed before it found the fix?
Most tools trace LLM calls. That is one layer. The gap is everything around it: tool calls, file operations, decision points, error recovery, the actual commands the agent ran. agent-strace captures the full session and lets you replay it later. Export to Datadog, Honeycomb, New Relic, or Splunk when you need production observability.
# With uv (recommended)
uv tool install agent-strace
# Or with pip
pip install agent-strace
# Or run without installing
uvx agent-strace replayZero dependencies. Python 3.10+ standard library only.
Captures everything: user prompts, assistant responses, and every tool call (Bash, Edit, Write, Read, Agent, Grep, Glob, WebFetch, WebSearch, all MCP tools).
agent-strace setup # prints hooks config JSON
agent-strace setup --global # for all projectsAdd the output to .claude/settings.json. Or paste it manually:
{
"hooks": {
"UserPromptSubmit": [{ "hooks": [{ "type": "command", "command": "agent-strace hook user-prompt" }] }],
"PreToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook pre-tool" }] }],
"PostToolUse": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool" }] }],
"PostToolUseFailure": [{ "matcher": "", "hooks": [{ "type": "command", "command": "agent-strace hook post-tool-failure" }] }],
"Stop": [{ "hooks": [{ "type": "command", "command": "agent-strace hook stop" }] }],
"SessionStart": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-start" }] }],
"SessionEnd": [{ "hooks": [{ "type": "command", "command": "agent-strace hook session-end" }] }]
}
}Then use Claude Code normally.
agent-strace list # list sessions
agent-strace replay # replay the latest
agent-strace stats # tool call frequency and timingWraps any MCP server. Works with Cursor, Windsurf, or any MCP client.
agent-strace record -- npx -y @modelcontextprotocol/server-filesystem /tmp
agent-strace replayWraps your tool functions directly. No MCP required.
from agent_trace import trace_tool, trace_llm_call, start_session, end_session, log_decision
start_session(name="my-agent") # add redact=True to strip secrets
@trace_tool
def search_codebase(query: str) -> str:
return search(query)
@trace_llm_call
def call_llm(messages: list, model: str = "claude-4") -> str:
return client.chat(messages=messages, model=model)
# Log decision points explicitly
log_decision(
choice="read_file_first",
reason="Need to understand current implementation before making changes",
alternatives=["read_file_first", "search_codebase", "write_fix_directly"],
)
search_codebase("authenticate")
call_llm([{"role": "user", "content": "Fix the bug"}])
meta = end_session()
print(f"Replay with: agent-strace replay {meta.session_id}")agent-strace setup [--redact] [--global] Generate Claude Code hooks config
agent-strace hook <event> Handle a Claude Code hook event (internal)
agent-strace record -- <command> Record an MCP stdio server session
agent-strace record-http <url> [--port N] Record an MCP HTTP/SSE server session
agent-strace replay [session-id] Replay a session (default: latest)
agent-strace list List all sessions
agent-strace stats [session-id] Show tool call frequency and timing
agent-strace inspect <session-id> Dump full session as JSON
agent-strace export <session-id> Export as JSON, CSV, NDJSON, or OTLP
Pass --redact to strip API keys, tokens, and credentials from traces before they hit disk.
# Stdio proxy with redaction
agent-strace record --redact -- npx -y @modelcontextprotocol/server-filesystem /tmp
# HTTP proxy with redaction
agent-strace record-http https://mcp.example.com --redactDetected patterns: OpenAI (sk-*), GitHub (ghp_*, github_pat_*), AWS (AKIA*), Anthropic (sk-ant-*), Slack (xox*), JWTs, Bearer tokens, connection strings (postgres://, mysql://), and any value under keys like password, secret, token, api_key, authorization.
For MCP servers that use HTTP transport instead of stdio:
# Proxy a remote MCP server
agent-strace record-http https://mcp.example.com --port 3100
# Your agent connects to http://127.0.0.1:3100 instead of the remote server
# All JSON-RPC messages are captured, tool call latency is measuredThe proxy forwards POST /message and GET /sse to the remote server, capturing every JSON-RPC message in both directions.
A real Claude Code session captured with hooks:
Session Summary
Session Summary
──────────────────────────────────────────────────
Session: 201da364-edd6-49
Command: claude-code (startup)
Agent: claude-code
Duration: 112.54s
Tool calls: 8
Errors: 3
──────────────────────────────────────────────────
+ 0.00s ▶ session_start
+ 0.07s 👤 user_prompt
"how many tests does this project have? run them and tell me the results"
+ 3.55s → tool_call Glob
**/*.test.*
+ 3.55s → tool_call Glob
**/test_*.*
+ 3.60s ← tool_result Glob (51ms)
+ 6.06s → tool_call Bash
$ python -m pytest tests/ -v 2>&1
+ 27.65s ✗ error Bash
Command failed with exit code 1
+ 29.89s → tool_call Bash
$ python3 -m pytest tests/ -v 2>&1
+ 40.56s ✗ error Bash
No module named pytest
+ 45.96s → tool_call Bash
$ which pytest || ls /Users/siddhant/Desktop/test-agent-trace/ 2>&1
+ 46.01s ← tool_result Bash (51ms)
+ 48.18s → tool_call Read
/Users/siddhant/Desktop/test-agent-trace/pyproject.toml
+ 48.23s ← tool_result Read (43ms)
+ 51.43s → tool_call Bash
$ uv run --with pytest pytest tests/ -v 2>&1
+1m43.67s ← tool_result Bash (5.88s)
75 tests, all passing in 3.60s
+1m52.54s 🤖 assistant_response
"75 tests, all passing in 3.60s. Breakdown by file: ..."
Tool calls show actual values: commands, file paths, glob patterns. Errors show what failed. Assistant responses are stripped of markdown.
# Show only tool calls and errors
agent-strace replay --filter tool_call,error
# Replay with timing (watch it unfold)
agent-strace replay --live --speed 2# JSON array
agent-strace export a84664 --format json
# CSV (for spreadsheets)
agent-strace export a84664 --format csv
# NDJSON (for streaming pipelines)
agent-strace export a84664 --format ndjsonTraces are stored as directories in .agent-traces/:
.agent-traces/
a84664242afa4516/
meta.json # session metadata
events.ndjson # newline-delimited JSON events
Each event is a single JSON line:
{
"event_type": "tool_call",
"timestamp": 1773562735.09,
"event_id": "bf1207728ee6",
"session_id": "a84664242afa4516",
"data": {
"tool_name": "read_file",
"arguments": {"path": "src/auth.py"}
}
}| Type | Description |
|---|---|
session_start |
Trace session began |
session_end |
Trace session ended |
user_prompt |
User submitted a prompt to the agent |
assistant_response |
Agent produced a text response |
tool_call |
Agent invoked a tool |
tool_result |
Tool returned a result |
llm_request |
Agent sent a prompt to an LLM |
llm_response |
LLM returned a completion |
file_read |
Agent read a file |
file_write |
Agent wrote a file |
decision |
Agent chose between alternatives |
error |
Something failed |
Events link to each other. A tool_result has a parent_id pointing to its tool_call. This lets you measure latency per tool and trace the full call chain.
Captures the full session: prompts, responses, and every tool call. See examples/claude_code_config.md for the full config.
agent-strace setup # per-project config
agent-strace setup --redact --global # all projects, with secret redactionEdit ~/.cursor/mcp.json (global) or .cursor/mcp.json (per-project):
{
"mcpServers": {
"filesystem": {
"command": "agent-strace",
"args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}Edit ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"filesystem": {
"command": "agent-strace",
"args": ["record", "--name", "filesystem", "--", "npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}The pattern is the same for any tool that uses MCP over stdio:
- Replace the server
commandwithagent-strace - Prepend
record --name <label> --to the original args - Use the tool normally
- Run
agent-strace replayto see what happened
See the examples/ directory for full config files.
Export sessions as OpenTelemetry spans to your existing observability stack. Sessions become traces. Tool calls become spans with duration and inputs. Errors get exception events. Zero new dependencies.
# Via the Datadog Agent's OTLP receiver (port 4318)
agent-strace export <session-id> --format otlp \
--endpoint http://localhost:4318
# Or via Datadog's OTLP intake directly
agent-strace export <session-id> --format otlp \
--endpoint https://http-intake.logs.datadoghq.com:443 \
--header "DD-API-KEY: $DD_API_KEY"agent-strace export <session-id> --format otlp \
--endpoint https://api.honeycomb.io \
--header "x-honeycomb-team: $HONEYCOMB_API_KEY" \
--service-name my-agentagent-strace export <session-id> --format otlp \
--endpoint https://otlp.nr-data.net \
--header "api-key: $NEW_RELIC_LICENSE_KEY"agent-strace export <session-id> --format otlp \
--endpoint https://ingest.<realm>.signalfx.com \
--header "X-SF-Token: $SPLUNK_ACCESS_TOKEN"# Local collector
agent-strace export <session-id> --format otlp \
--endpoint http://localhost:4318# Inspect the OTLP payload
agent-strace export <session-id> --format otlp > trace.json| agent-trace | OpenTelemetry |
|---|---|
| session | trace |
| tool_call + tool_result | span (with duration) |
| error | span with error status + exception event |
| user_prompt | event on root span |
| assistant_response | event on root span |
| session_id | trace ID |
| event_id | span ID |
| parent_id | parent span ID |
Claude Code agentic loop
├── UserPromptSubmit → agent-strace hook user-prompt
├── PreToolUse → agent-strace hook pre-tool
├── PostToolUse → agent-strace hook post-tool
├── PostToolUseFailure → agent-strace hook post-tool-failure
├── Stop → agent-strace hook stop
├── SessionStart → agent-strace hook session-start
└── SessionEnd → agent-strace hook session-end
↓
.agent-traces/
Claude Code fires hook events at every stage of its agentic loop. agent-strace registers as a handler, reads JSON from stdin, and writes trace events. Each hook runs as a separate process. Session state lives in .agent-traces/.active-session so PreToolUse and PostToolUse can be correlated for latency measurement.
Agent ←→ agent-strace proxy ←→ MCP Server (stdio)
↓
.agent-traces/
The proxy reads JSON-RPC messages (Content-Length framed or newline-delimited), classifies each one, and writes a trace event. Messages are forwarded unchanged. The agent and server do not know the proxy exists.
Agent ←→ agent-strace proxy (localhost:3100) ←→ Remote MCP Server (HTTPS)
↓
.agent-traces/
Same idea, different transport. Listens on a local port, forwards POST and SSE requests to the remote server, captures every JSON-RPC message in both directions.
@trace_tool
def my_function(x):
return x * 2The decorator logs a tool_call event before execution and a tool_result after. Errors and timing are captured automatically.
When --redact is enabled (or redact=True in the decorator API), trace events pass through a redaction filter before hitting disk. The filter checks key names (password, api_key) and value patterns (sk-*, ghp_*, JWTs). Redacted values become [REDACTED]. The original data is never stored.
src/agent_trace/
__init__.py # version
models.py # TraceEvent, SessionMeta, EventType
store.py # NDJSON file storage
hooks.py # Claude Code hooks integration
proxy.py # MCP stdio proxy
http_proxy.py # MCP HTTP/SSE proxy
redact.py # secret redaction
otlp.py # OTLP/HTTP JSON exporter
replay.py # terminal replay and display
decorator.py # @trace_tool, @trace_llm_call, log_decision
cli.py # CLI entry point
python -m unittest discover -s tests -vgit clone https://github.com/Siddhant-K-code/agent-trace.git
cd agent-trace
# Run tests
python -m unittest discover -s tests -v
# Run the example
PYTHONPATH=src python examples/basic_agent.py
# Replay the example
PYTHONPATH=src python -m agent_trace.cli replay
# Build the package
uv build
# Install locally for testing
uv tool install -e .- The agent observability gap (blog) - the problem this tool addresses
- The agent observability gap (thread) - discussion on X
- The Agentic Engineering Guide - chapters 7, 9, 10 cover agent security; chapters 14, 15, 16 cover observability
- OpenTelemetry GenAI - semantic conventions for LLM tracing (complementary)
If agent-trace saves you time debugging agent sessions, consider sponsoring the project. It helps me keep building tools like this and releasing them for free.
MIT. Use it however you want.