A set of tools and documentation for examining what AI platforms (Claude, ChatGPT, Gemini, Grok, DeepSeek, Perplexity) are doing during your sessions, using standard browser developer tools.
When you interact with an AI chatbot, your browser exchanges data with the platform's servers. This data includes not only your conversation, but also system prompts, experiment assignments, rate limit configurations, model routing decisions, and analytics events — most of which are never shown in the UI.
This toolkit provides:
- A step-by-step guide for capturing and analyzing this data using F12 / DevTools
- Per-platform replication checklists with exact endpoints and field paths for six platforms
- Python scripts for parsing HAR files, extracting conversations, classifying fields, and identifying sensitive data
- A field classification registry covering 3,919 unique fields across platforms
- Forensic methodology and documentation from structured privacy mode testing conducted March 14, 2026
On March 14, 2026, structured forensic tests were conducted across three paid AI platforms — Claude (Anthropic), ChatGPT (OpenAI), and Grok (xAI) — while using each platform's privacy mode feature (Incognito Chat, Temporary Chat, and Private Chat respectively).
For each platform, paired privacy-mode and normal-mode sessions were captured under comparable conditions. All captures were recorded contemporaneously, preserved, and cryptographically hashed immediately after collection. Platform privacy-mode feature disclosures were archived at the time of testing.
Pattern confirmed across all three platforms:
In the captured sessions, privacy-mode use was still associated with session-linked analytics or monitoring transmissions to third-party recipients not named in the privacy-mode feature disclosures reviewed for those platforms. In the reviewed captures, activating the privacy feature did not produce an observable reduction in the account-identity or session-linked fields transmitted to the identified third-party recipients, compared to equivalent normal-mode sessions. In two of the three platforms, activation of the privacy feature was itself followed immediately by a third-party analytics transmission associated with that interaction.
The captures show transmission. They do not establish downstream use, retention, intent, or legal violation, and do not establish that the platforms' statements about training or history were false on their own stated terms.
The issue is not only what the platform says about training or history, but whether the consumer-facing disclosure meaningfully captures the broader session architecture that still operates when a user deliberately activates a privacy-protective feature.
The docs/ folder contains the full methodology, narrative, and
regulatory submission documents from this work.
- Read
har_forensics_guide.md— the full guide, organized for reproducibility - Capture a HAR file from any AI platform (instructions in the guide, Section 2)
- Run
scripts/har_universal_explorer.pyon your HAR file to see every field it contains - Use the platform-specific checklists (Section 3) to find conversation data, experiment assignments, and rate limits
├── har_forensics_guide.md # The full guide (start here)
├── docs/
│ ├── AI_FORENSIC_METHODOLOGY.md # What was done and why — no scripts
│ ├── AI_FORENSIC_TECHNICAL_DOC.md # Scripts, commands, replication guide
│ ├── AI_PRIVACY_NARRATIVE_MASTER_v1.2.md # Three-version consumer narrative
│ ├── CAISI_LISTENING_SESSION_REQUEST.md # Regulatory submission template
│ └── DSAR_EMAILS_ALL_FOUR.md # Deletion request templates
├── scripts/
│ ├── har_universal_explorer.py # Extract all unique field paths
│ ├── har_conversation_extractor.py # Extract conversation content
│ ├── har_raw_capture.py # Raw field extraction
│ ├── har_field_classifier.py # Apply classification rules
│ ├── build_schema.py # Generate deduplicated JSON schema
│ ├── generate_schema.py # Schema generation (genson)
│ ├── parse_har_sse.py # Parse SSE streaming events
│ ├── har_separate_outputs.py # Split HAR into category files
│ └── redact_for_github.py # Scan and redact sensitive content
├── registry/
│ └── FIELD_REGISTRY.csv # Pattern-based field classification
└── examples/
├── classification_report.md
└── category_map.json
| Platform | Conversation endpoint | Analytics vendor | Documented |
|---|---|---|---|
| Claude (Anthropic) | chat_conversations REST API | Segment via a-api.anthropic.com | Full field paths |
| ChatGPT (OpenAI) | backend-api/conversation + SSE | Segment via chatgpt.com/ces/v1/ | Full field paths |
| Gemini (Google) | batchexecute RPC (non-standard) | Not Statsig-based | Parsing notes |
| Grok (xAI) | Multi-step REST fetch | Sentry + _data/v1/events | Field paths |
| DeepSeek | biz_data envelope REST API | Not documented | Field paths |
| Perplexity | Thread-based REST API | Not documented | Field paths |
- Python 3.8+
- Optional: genson library (
pip install genson) for generate_schema.py - A web browser with developer tools (Chrome, Edge, or Firefox)
- No proprietary tools required
- No platform cooperation required
- No privileged access required
Do not commit HAR files to this repo. HAR files contain session tokens, cookies, email addresses, conversation content, and other sensitive data. The scripts in this repo process HAR files locally — the files themselves stay on your machine.
The redact_for_github.py script scans files for sensitive patterns
(user paths, emails, UUIDs in identity contexts, tokens, API keys,
IP addresses) and produces clean copies with placeholders. Run it on
any file before sharing publicly.
All findings in the guide and in docs/ are verifiable by following
the procedures on your own accounts using standard browser developer
tools. Nothing in this repo requires unauthorized access or special
privileges.
The methodology documented in docs/AI_FORENSIC_METHODOLOGY.md and
docs/AI_FORENSIC_TECHNICAL_DOC.md is fully reproducible by any
researcher with an authenticated paid account on the target platform,
Google Chrome, a screen recording tool, and approximately 30-45
minutes per platform.
This repository and its contents are provided for educational and research purposes only, as-is without warranty of any kind. The author is not a lawyer and nothing here constitutes legal advice. Platform architectures may change at any time. Use at your own risk. See the guide's Disclaimer section and LICENSE for full terms.
MIT License. See LICENSE for details.
Work conducted between January and March 2026. The evaluation frameworks and measurement methods built on top of these findings are the subject of ongoing research to be published later this year.