The missing API for any website. Sits between any agent (human or AI) and any website. Compiles pages into structured data, enforces policy, and receipts every action.
SiteSitter is part of Authensor — the open-source safety stack for AI agents. While Authensor handles action authorization and policy enforcement, SiteSitter provides web governance for browsing agents.
- Authensor — Policy engine & control plane for agent action authorization
- SafeClaw — Local agent gating with approval workflows
- SiteSitter — Web governance for browsing agents (you are here)
AI agents can browse the web. But they browse it blind:
- No structure — An agent sees raw HTML. A "Submit" button on a search form and a "Submit" button on a wire transfer look identical.
- No policy — Nothing stops an agent from clicking a dark-pattern upsell it didn't recognize. Agents fall for dark patterns 41% of the time.
- No receipts — If an agent fills a form with your SSN, there's no record of what was sent, where, or why.
- No memory — The agent doesn't know you saw this product cheaper elsewhere last week.
Humans have the same problems. We just suffer through them manually.
SiteSitter is a local HTTP server + browser extension that creates a governance layer for web browsing. Three things happen on every page:
Any webpage goes in. A typed, machine-readable Page IR comes out — entities, actions, risk levels, dark patterns, accessibility issues, provenance for every claim:
{
"pageKind": "checkout",
"entities": [
{ "type": "product", "name": "Wireless Headphones", "price": "$79.99", "confidence": 0.94 }
],
"actions": [
{ "type": "purchase", "riskClass": "high_consequence", "requiresApproval": true },
{ "type": "add_warranty", "riskClass": "mutation", "darkPatterns": ["preselection"] }
]
}No site-specific code needed. Works on any webpage via an 8-stage compilation pipeline with 16-stage universal extraction fallback.
Every action is evaluated against a browsing constitution — 26 enforceable rules across 6 categories (privacy, safety, financial, healthcare, consumer protection). Actions are risk-classified:
| Risk Class | Example | Requires |
|---|---|---|
read |
View a page | Nothing |
soft_interaction |
Filter search results | Nothing |
mutation |
Submit a form | Policy check |
high_consequence |
Wire transfer, delete account | Human approval |
Dark patterns are detected and flagged. Agents get clean, classified data instead of adversarial HTML.
Every observation, policy decision, and action produces an immutable receipt in a content-addressed chain:
- Who saw what, when
- What was proposed, what policy said, what happened
- W3C Verifiable Credentials with Ed25519 proofs
- Merkle tree verification for any subset
- GDPR-compliant erasure via key deletion
git clone https://github.com/AUTHENSOR/SiteSitter.git
cd sitesitter
pnpm install && pnpm build
# Start the governance server
node packages/runtime/dist/cli.js --db-path ./data.db
# Load the Chrome extension:
# chrome://extensions → Developer mode → Load unpacked → extensions/chrome/
# Navigate to any page → click SiteSitter → Capture# Compile raw HTML into Page IR
node packages/cli/dist/main.js compile-html page.html --url https://example.com
# Accessibility audit
node packages/cli/dist/main.js audit bundle.json
# Dark pattern scan with regulatory citations
node packages/cli/dist/main.js compliance-report bundle.jsonnode packages/mcp-server/dist/cli.jsEvery extracted action becomes an MCP tool with risk metadata. Claude sees structured entities and policy-evaluated actions instead of raw HTML.
# One-command start with persistent data
docker compose up -d
# Server is running at http://localhost:3838
curl http://localhost:3838/healthData persists in a Docker volume. Set SPIRO_AUTH_TOKEN in your environment for token-based auth.
# Compile a page
curl -X POST http://localhost:3838/compile \
-H "Content-Type: application/json" \
-d @observation-bundle.json
# Evaluate an action against policy
curl -X POST http://localhost:3838/evaluate \
-H "Content-Type: application/json" \
-d '{"action": {"actionType": "purchase", "riskClass": "high_consequence"}, "context": {"url": "https://shop.example.com"}}'
# Get receipts
curl http://localhost:3838/receipts90+ API endpoints. Full reference in CLAUDE.md.
| Layer | Without SiteSitter | With SiteSitter |
|---|---|---|
| What agents see | Raw HTML, adversarial CSS, dark patterns | Structured entities, classified actions, risk levels |
| Policy enforcement | None — agents act freely | Constitutional rules evaluated before every action |
| Dark pattern defense | None — agents are more susceptible than humans | 8-category detection with FTC/EU DSA citations |
| Audit trail | None — browsers have no accountability | Immutable receipt chain with Merkle proofs |
| Memory | None — every session starts blank | Entity fingerprinting, cross-site matching, temporal recall |
| Approval gates | None — no distinction between read and delete | Risk-classified actions with human-in-the-loop for high consequence |
Existing tools solve fragments: ad blockers filter, accessibility tools audit, browser automation acts. Nothing governs the full loop from observation to action to proof.
Browser Extension (capture)
│
▼ ObservationBundle (DOM + AX tree + screenshots)
┌──────────────────────┐
│ Compiler (8 stages) │ ── adapters, site families, dark pattern detection
└──────────┬───────────┘
▼ Page IR (entities, actions, regions, provenance)
┌──────────────────────┐
│ Policy Engine │ ── 26 constitutional rules, risk classification
└──────────┬───────────┘
▼ PolicyEvaluation (allow / escalate / block)
┌──────────────────────┐
│ Runtime Server │ ── HTTP API, SQLite, approval queue, SSE, auth
└──────────┬───────────┘
▼ Receipt (content-addressed, Merkle tree, W3C VC)
┌──────────────────────┐
│ Replay / Eval │ ── benchmark, trace, diff, remix views
└──────────────────────┘
| Package | What it does |
|---|---|
@sitesitter/web-ir |
Core types — Page, Entity, Action, Region, Provenance, State |
@sitesitter/compiler |
8-stage pipeline: classify, extract, detect dark patterns, validate |
@sitesitter/policy |
Constitutional browsing rules, risk evaluation, approval gates |
@sitesitter/receipts |
Immutable receipt chain, Merkle tree, W3C VCs, GDPR erasure |
@sitesitter/runtime |
HTTP server, SQLite, SSE, search, memory, reputation, federation |
@sitesitter/replay |
Record/replay, benchmarks, traces, differential testing |
@sitesitter/adapters |
Site-specific extraction, 20 built-in adapters, federated registry |
@sitesitter/mcp-server |
MCP bridge for Claude and compatible clients |
@sitesitter/cli |
33+ headless commands for compilation, audit, and analysis |
@sitesitter/playwright |
Playwright integration — compile pages in E2E tests |
@sitesitter/remix |
12 alternative HTML views (readable, spreadsheet, voice-nav, etc.) |
| Chrome extension | MV3 — capture, inspect, approve, investigate |
| Firefox extension | MV2 — cross-browser parity |
Compilation — Any webpage to structured IR. No per-site code. 16-stage universal extraction as fallback. Site family grammars for common patterns (e-commerce, news, social, forums, SaaS, healthcare, finance, academic).
Dark Patterns — 8 categories (confirmshaming, urgency, preselection, sneaking, obstruction, trick wording, social proof, misdirection). Regulatory references (FTC Act, EU DSA Art. 25, GDPR, CCPA). Evidence packages for filing complaints.
Streaming — MutationObserver-based real-time capture. Incremental compilation. Server-Sent Events push updates to clients. Works with SPAs and infinite scroll.
Memory — Entity fingerprinting across sites and time. Ebbinghaus-inspired forgetting engine. Deja vu detection ("you saw this product 3 weeks ago for $47 less"). Workflow memory for repeated tasks.
Federation — Federated dark pattern observatory with Ed25519-signed reports. Peer-to-peer receipt witnesses. Consensus health scoring across instances.
Governance — Self-writing policies that learn from your decisions. Consent auto-handler. Web reputation scoring (8 dimensions). All learned rules are user-overridable.
Compliance — EU AI Act audit logging. GDPR receipt erasure. WCAG accessibility audits. FINRA/SOX/EDRM evidence exports. Berkeley Protocol case files.
Security — Prompt injection sanitizer for LLM pipelines. Ed25519 adapter signing with enforced verification. AES-256-GCM credential vault. Configurable CORS. Rate limiting. Circuit breakers.
- Read-only by default. Mutation is opt-in.
- High-consequence actions always require human approval.
- Websites are treated as adversarial.
- Inference is separated from execution.
- Provenance on every claim. No silent escalations.
- Constitutional rules enforced below the prompt level.
- SiteSitter does not "solve prompt injection." It contains, measures, and governs around it.
pnpm install # install dependencies
pnpm build # build all packages
pnpm test # 1,003 tests
pnpm typecheck # 19 typecheck tasks
pnpm lint # ESLint
pnpm format # PrettierRequires Node.js >= 20 and pnpm >= 9.
We welcome contributions! See CONTRIBUTING.md for setup instructions and PR process.
For architecture documentation and API reference, see CLAUDE.md.
SiteSitter is licensed under the Apache License 2.0 (LICENSE-APACHE).
All packages, including the runtime server, are Apache 2.0. You can freely use, modify, and distribute SiteSitter for any purpose.