HAR-Based AI Platform Forensics Toolkit

A set of tools and documentation for examining what AI platforms (Claude, ChatGPT, Gemini, Grok, DeepSeek, Perplexity) are doing during your sessions, using standard browser developer tools.

What This Is

When you interact with an AI chatbot, your browser exchanges data with the platform's servers. This data includes not only your conversation, but also system prompts, experiment assignments, rate limit configurations, model routing decisions, and analytics events — most of which are never shown in the UI.

This toolkit provides:

A step-by-step guide for capturing and analyzing this data using F12 / DevTools
Per-platform replication checklists with exact endpoints and field paths for six platforms
Python scripts for parsing HAR files, extracting conversations, classifying fields, and identifying sensitive data
A field classification registry covering 3,919 unique fields across platforms
Forensic methodology and documentation from structured privacy mode testing conducted March 14, 2026

March 2026 Findings — Privacy Mode Testing

On March 14, 2026, structured forensic tests were conducted across three paid AI platforms — Claude (Anthropic), ChatGPT (OpenAI), and Grok (xAI) — while using each platform's privacy mode feature (Incognito Chat, Temporary Chat, and Private Chat respectively).

For each platform, paired privacy-mode and normal-mode sessions were captured under comparable conditions. All captures were recorded contemporaneously, preserved, and cryptographically hashed immediately after collection. Platform privacy-mode feature disclosures were archived at the time of testing.

Pattern confirmed across all three platforms:

In the captured sessions, privacy-mode use was still associated with session-linked analytics or monitoring transmissions to third-party recipients not named in the privacy-mode feature disclosures reviewed for those platforms. In the reviewed captures, activating the privacy feature did not produce an observable reduction in the account-identity or session-linked fields transmitted to the identified third-party recipients, compared to equivalent normal-mode sessions. In two of the three platforms, activation of the privacy feature was itself followed immediately by a third-party analytics transmission associated with that interaction.

The captures show transmission. They do not establish downstream use, retention, intent, or legal violation, and do not establish that the platforms' statements about training or history were false on their own stated terms.

The issue is not only what the platform says about training or history, but whether the consumer-facing disclosure meaningfully captures the broader session architecture that still operates when a user deliberately activates a privacy-protective feature.

The docs/ folder contains the full methodology, narrative, and regulatory submission documents from this work.

Quick Start

Read har_forensics_guide.md — the full guide, organized for reproducibility
Capture a HAR file from any AI platform (instructions in the guide, Section 2)
Run scripts/har_universal_explorer.py on your HAR file to see every field it contains
Use the platform-specific checklists (Section 3) to find conversation data, experiment assignments, and rate limits

Repository Structure

├── har_forensics_guide.md              # The full guide (start here)
├── docs/
│   ├── AI_FORENSIC_METHODOLOGY.md      # What was done and why — no scripts
│   ├── AI_FORENSIC_TECHNICAL_DOC.md    # Scripts, commands, replication guide
│   ├── AI_PRIVACY_NARRATIVE_MASTER_v1.2.md  # Three-version consumer narrative
│   ├── CAISI_LISTENING_SESSION_REQUEST.md   # Regulatory submission template
│   └── DSAR_EMAILS_ALL_FOUR.md         # Deletion request templates
├── scripts/
│   ├── har_universal_explorer.py       # Extract all unique field paths
│   ├── har_conversation_extractor.py   # Extract conversation content
│   ├── har_raw_capture.py              # Raw field extraction
│   ├── har_field_classifier.py         # Apply classification rules
│   ├── build_schema.py                 # Generate deduplicated JSON schema
│   ├── generate_schema.py              # Schema generation (genson)
│   ├── parse_har_sse.py                # Parse SSE streaming events
│   ├── har_separate_outputs.py         # Split HAR into category files
│   └── redact_for_github.py            # Scan and redact sensitive content
├── registry/
│   └── FIELD_REGISTRY.csv              # Pattern-based field classification
└── examples/
    ├── classification_report.md
    └── category_map.json

Platforms Covered

Platform	Conversation endpoint	Analytics vendor	Documented
Claude (Anthropic)	chat_conversations REST API	Segment via a-api.anthropic.com	Full field paths
ChatGPT (OpenAI)	backend-api/conversation + SSE	Segment via chatgpt.com/ces/v1/	Full field paths
Gemini (Google)	batchexecute RPC (non-standard)	Not Statsig-based	Parsing notes
Grok (xAI)	Multi-step REST fetch	Sentry + _data/v1/events	Field paths
DeepSeek	biz_data envelope REST API	Not documented	Field paths
Perplexity	Thread-based REST API	Not documented	Field paths

Requirements

Python 3.8+
Optional: genson library (pip install genson) for generate_schema.py
A web browser with developer tools (Chrome, Edge, or Firefox)
No proprietary tools required
No platform cooperation required
No privileged access required

Important Notes

Do not commit HAR files to this repo. HAR files contain session tokens, cookies, email addresses, conversation content, and other sensitive data. The scripts in this repo process HAR files locally — the files themselves stay on your machine.

The redact_for_github.py script scans files for sensitive patterns (user paths, emails, UUIDs in identity contexts, tokens, API keys, IP addresses) and produces clean copies with placeholders. Run it on any file before sharing publicly.

All findings in the guide and in docs/ are verifiable by following the procedures on your own accounts using standard browser developer tools. Nothing in this repo requires unauthorized access or special privileges.

Replication

The methodology documented in docs/AI_FORENSIC_METHODOLOGY.md and docs/AI_FORENSIC_TECHNICAL_DOC.md is fully reproducible by any researcher with an authenticated paid account on the target platform, Google Chrome, a screen recording tool, and approximately 30-45 minutes per platform.

Disclaimer

This repository and its contents are provided for educational and research purposes only, as-is without warranty of any kind. The author is not a lawyer and nothing here constitutes legal advice. Platform architectures may change at any time. Use at your own risk. See the guide's Disclaimer section and LICENSE for full terms.

License

MIT License. See LICENSE for details.

Author

Work conducted between January and March 2026. The evaluation frameworks and measurement methods built on top of these findings are the subject of ongoing research to be published later this year.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAR-Based AI Platform Forensics Toolkit

What This Is

March 2026 Findings — Privacy Mode Testing

Quick Start

Repository Structure

Platforms Covered

Requirements

Important Notes

Replication

Disclaimer

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
examples		examples
registry		registry
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
har_forensics_guide.md		har_forensics_guide.md

Folders and files

Latest commit

History

Repository files navigation

HAR-Based AI Platform Forensics Toolkit

What This Is

March 2026 Findings — Privacy Mode Testing

Quick Start

Repository Structure

Platforms Covered

Requirements

Important Notes

Replication

Disclaimer

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages