See how you really use AI.
X-ray your AI coding sessions across Claude Code, Cursor, ChatGPT, and 6 more tools. Discover your patterns, find wasted tokens, catch leaked secrets — all locally, nothing leaves your machine.
pip install ctxray
ctxray scan # discover prompts from your AI tools
ctxray wrapped # your AI coding persona + shareable card
ctxray insights # your patterns vs research-optimal
ctxray privacy # what sensitive data you've exposedDrop ctxray into your CI as a prompt quality gate. No LLM, no API key, no network — <50ms per prompt.
# .github/workflows/prompt-quality.yml
- uses: ctxray/ctxray@main
with:
score-threshold: 43 # experimentally validated quality threshold
model: claude # model-specific rules (claude/gpt/gemini)
comment-on-pr: true# .pre-commit-config.yaml
repos:
- repo: https://github.com/ctxray/ctxray
rev: v3.0.0
hooks:
- id: ctxray-lint-score # fail below quality threshold
# or: id: ctxray-lint-claude # Claude-specific rules + threshold- Deterministic — same prompt, same score, every run. No flaky LLM-based checks.
- Air-gapped — runs in offline and private networks. All analysis stays on your infrastructure.
- Configurable —
.ctxray.tomlor[tool.ctxray.lint]in pyproject.toml. Per-project rules.
Full setup: GitHub Action · pre-commit · .ctxray.toml
ctxray wrapped generates a Spotify Wrapped-style report of your AI interactions — your persona (Debugger? Architect? Explorer?), top patterns, and a shareable card.
ctxray insights compares your actual prompting habits against research-backed benchmarks. Are your prompts specific enough? Do you front-load instructions? How much context do you provide?
ctxray privacy --deep scans every prompt you've sent for API keys, tokens, passwords, and PII. See exactly what you've shared with which AI tool.
ctxray check "your prompt" scores, lints, and rewrites in one command — no LLM, <50ms.
Experimentally validated on 3000+ LLM calls across 8 models (1.5B → 27B): prompts at or above score 43 hit ~93% pass rate on executable code tests. Below 43 they average 72% or lower. ctxray tells you which side you're on and what to fix — see experiments/RESULTS.md for the full cross-model data.
ctxray check "fix the auth bug in login.ts" # threshold pass/fail + diagnostics
ctxray check "fix bug" --model claude # model-specific scoring for Claude
ctxray check "refactor middleware" --threshold 50 # custom threshold for stricter teams| Command | Description |
|---|---|
ctxray wrapped |
AI coding persona + shareable card |
ctxray insights |
Personal patterns vs research-optimal benchmarks |
ctxray tools |
Cross-tool comparison — how your Claude Code / Cursor / ChatGPT habits differ |
ctxray sessions |
Session quality scores with frustration signal detection |
ctxray agent |
Agent workflow analysis — error loops, tool patterns, efficiency |
ctxray repetition |
Cross-session repetition detection — spot recurring prompts |
ctxray patterns |
Personal prompt weaknesses — recurring gaps by task type |
ctxray distill |
Extract important turns from conversations with 6-signal scoring |
ctxray projects |
Per-project quality breakdown |
ctxray style |
Prompting fingerprint with --trends for evolution tracking |
ctxray privacy |
See what data you sent where — file paths, errors, PII exposure |
| Command | Description |
|---|---|
ctxray check "prompt" |
Full diagnostic — score + lint + rewrite + threshold pass/fail |
ctxray score "prompt" |
Research-backed 0-100 scoring with 30+ features |
ctxray score "prompt" --model claude |
Model-specific scoring — Claude, GPT, or Gemini adjustments |
ctxray rewrite "prompt" |
Rule-based improvement — filler removal, restructuring, hedging cleanup |
ctxray build "task" |
Build prompts from components — task, context, files, errors, constraints |
ctxray compress "prompt" |
4-layer prompt compression (40-60% token savings typical) |
ctxray compare "a" "b" |
Side-by-side prompt analysis (or --best-worst for auto-selection) |
ctxray lint |
Configurable linter with CI/GitHub Action support |
| Command | Description |
|---|---|
ctxray |
Instant dashboard — prompts, sessions, avg score, top categories |
ctxray scan |
Auto-discover prompts from 9 AI tools |
ctxray report |
Full analytics: hot phrases, clusters, patterns (--html for dashboard) |
ctxray digest |
Weekly summary comparing current vs previous period |
ctxray template save|list|use |
Save and reuse your best prompts |
ctxray distill --export |
Recover context when a session runs out — paste into new session |
ctxray init |
Generate .ctxray.toml config for your project |
| Tool | Format | Auto-discovered by scan |
|---|---|---|
| Claude Code | JSONL | Yes |
| Codex CLI | JSONL | Yes |
| Cursor | .vscdb | Yes |
| Aider | Markdown | Yes |
| Gemini CLI | JSON | Yes |
| Cline (VS Code) | JSON | Yes |
| OpenClaw / OpenCode | JSON | Yes |
| ChatGPT | JSON | Via ctxray import |
| Claude.ai | JSON/ZIP | Via ctxray import |
pip install ctxray # core (all features, zero config)
pip install ctxray[chinese] # + Chinese prompt analysis (jieba)
pip install ctxray[mcp] # + MCP server for Claude Code / Continue.dev / Zedctxray install-hook # adds post-session hook to Claude CodeCapture prompts from ChatGPT, Claude.ai, and Gemini directly in your browser. Live quality badge shows prompt tier as you type — click "Rewrite & Apply" to improve and replace the text directly in the input box.
- Install the extension from Chrome Web Store or Firefox Add-ons
- Connect to the CLI:
ctxray install-extension - Verify:
ctxray extension-status
Captured prompts sync locally via Native Messaging — nothing leaves your machine.
# .github/workflows/prompt-lint.yml
name: Prompt Quality
on: pull_request
jobs:
lint:
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: ctxray/ctxray@main
with:
score-threshold: 43 # experimentally validated (below = 83% failure rate)
model: claude # optional: model-specific rules
strict: true
comment-on-pr: true# .pre-commit-config.yaml
repos:
- repo: https://github.com/ctxray/ctxray
rev: v3.0.0
hooks:
- id: ctxray-lint-score # quality threshold gate (score >= 43)
# - id: ctxray-lint-claude # Claude-specific rules + threshold
# - id: ctxray-lint-gpt # GPT-specific rules + thresholdctxray lint --score-threshold 43 # exit 1 below experimentally validated threshold
ctxray lint --score-threshold 50 # or set your own bar
ctxray lint --model claude # model-specific lint rules
ctxray lint --strict # exit 1 on warnings
ctxray lint --json # machine-readable outputctxray init # generates .ctxray.toml with all rules documented# .ctxray.toml (or [tool.ctxray.lint] in pyproject.toml)
[lint]
score-threshold = 43 # experimentally validated quality threshold
model = "claude" # model-specific rules (claude/gpt/gemini)
[lint.rules]
min-length = 20
short-prompt = 40
vague-prompt = true
debug-needs-reference = truePrompt Science — research foundation
Scoring is calibrated against 10 peer-reviewed papers covering 30+ features across 5 dimensions:
| Dimension | What it measures | Key papers |
|---|---|---|
| Structure | Markdown, code blocks, explicit constraints | Prompt Report (2406.06608) |
| Context | File paths, error messages, I/O specs, edge cases | Zi+ (2508.03678), Google (2512.14982) |
| Position | Instruction placement relative to context | Stanford (2307.03172), Veseli+ (2508.07479), Chowdhury (2603.10123) |
| Repetition | Redundancy that degrades model attention | Google (2512.14982) |
| Clarity | Readability, sentence length, ambiguity | SPELL (EMNLP 2023), PEEM (2603.10477) |
Cross-validated findings that inform our engine:
- Position bias is architectural — present at initialization, not learned. Front-loading instructions is effective for prompts under 50% of context window (3 papers agree)
- Moderate compression improves output — rule-based filler removal doesn't just save tokens, it enhances LLM performance (2505.00019)
- Prompt quality is independently measurable — prompt-only scoring predicts output quality without seeing the response (ACL 2025, 2503.10084)
- Quality threshold at score ~43 — our own experiment (30 prompts, 5 tiers, 2 models) found a step function: below 43, 83% failure rate; above 43, 94% success (Pearson r=0.56, Spearman ρ=0.64)
- Format preferences are model-dependent — XML benefits Claude, Markdown benefits GPT, but having any structure matters more than the specific format (PromptBridge 2512.01420)
Model-specific scoring (--model claude/gpt/gemini) applies research-backed adjustments for each model's known preferences and sensitivities.
All analysis runs locally in <1ms per prompt. No LLM calls, no network requests.
How it works — architecture
Data sources:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│Claude Code│ │ Cursor │ │ Aider │ │ ChatGPT │ │ 5 more.. │
└─────┬────┘ └─────┬────┘ └─────┬────┘ └─────┬────┘ └─────┬────┘
└─────────────┴───────────┴─────────────┴─────────────┘
│
scan -> dedup -> store -> analyze
│
┌──────────────────┼──────────────────┐
v v v
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ insights │ │ patterns │ │ sessions │
│ wrapped │ │ repetition │ │ projects │
│ style │ │ privacy │ │ agent │
└──────────┘ └──────────────┘ └──────────┘
Key design decisions:
- Pure rules, no LLM — scoring and rewriting use regex + TF-IDF + research heuristics. Deterministic, private, <1ms per prompt.
- Adapter pattern — each AI tool gets a parser that normalizes to a common
Promptmodel. Adding a new tool = one file. - Two-layer dedup — SHA-256 for exact matches, TF-IDF cosine similarity for near-dupes.
- Research-calibrated — 10 peer-reviewed papers inform the scoring weights.
Conversation Distillation
ctxray distill scores every turn in a conversation using 6 signals:
- Position — first/last turns carry framing and conclusions
- Length — substantial turns contain more information
- Tool trigger — turns that cause tool calls are action-driving
- Error recovery — turns that follow errors show problem-solving
- Semantic shift — topic changes mark conversation boundaries
- Uniqueness — novel phrasing vs repetitive follow-ups
Session type (debugging, feature-dev, exploration, refactoring) is auto-detected and signal weights adapt accordingly.
After Promptfoo joined OpenAI and Humanloop joined Anthropic, ctxray is the independent, open-source alternative for understanding your AI interactions.
- 100% local — your prompts never leave your machine
- No LLM required — pure rule-based analysis, <50ms per prompt
- 9 AI tools — the only tool that works across Claude Code, Cursor, ChatGPT, and more
- Research-backed — calibrated against 10 peer-reviewed papers, not vibes
Previously published as
reprompt-cli. Same tool, new name, clean namespace.
- All analysis runs locally. No prompts leave your machine.
ctxray privacyshows exactly what you've sent to which AI tool.- Optional telemetry sends only anonymous feature vectors — never prompt text.
- Open source: audit exactly what's collected.
- PyPI: ctxray
- Chrome Extension: Chrome Web Store
- Firefox Add-on: Firefox Add-ons
- Changelog: CHANGELOG.md
See CONTRIBUTING.md for development setup and guidelines.
MIT
