SiteSitter

The missing API for any website. Sits between any agent (human or AI) and any website. Compiles pages into structured data, enforces policy, and receipts every action.

Part of the Authensor Safety Stack

SiteSitter is part of Authensor — the open-source safety stack for AI agents. While Authensor handles action authorization and policy enforcement, SiteSitter provides web governance for browsing agents.

Authensor — Policy engine & control plane for agent action authorization
SafeClaw — Local agent gating with approval workflows
SiteSitter — Web governance for browsing agents (you are here)

The Problem

AI agents can browse the web. But they browse it blind:

No structure — An agent sees raw HTML. A "Submit" button on a search form and a "Submit" button on a wire transfer look identical.
No policy — Nothing stops an agent from clicking a dark-pattern upsell it didn't recognize. Agents fall for dark patterns 41% of the time.
No receipts — If an agent fills a form with your SSN, there's no record of what was sent, where, or why.
No memory — The agent doesn't know you saw this product cheaper elsewhere last week.

Humans have the same problems. We just suffer through them manually.

What SiteSitter Does

SiteSitter is a local HTTP server + browser extension that creates a governance layer for web browsing. Three things happen on every page:

1. Compile: HTML in, structured data out

Any webpage goes in. A typed, machine-readable Page IR comes out — entities, actions, risk levels, dark patterns, accessibility issues, provenance for every claim:

{
  "pageKind": "checkout",
  "entities": [
    { "type": "product", "name": "Wireless Headphones", "price": "$79.99", "confidence": 0.94 }
  ],
  "actions": [
    { "type": "purchase", "riskClass": "high_consequence", "requiresApproval": true },
    { "type": "add_warranty", "riskClass": "mutation", "darkPatterns": ["preselection"] }
  ]
}

No site-specific code needed. Works on any webpage via an 8-stage compilation pipeline with 16-stage universal extraction fallback.

2. Govern: Policy before action

Every action is evaluated against a browsing constitution — 26 enforceable rules across 6 categories (privacy, safety, financial, healthcare, consumer protection). Actions are risk-classified:

Risk Class	Example	Requires
`read`	View a page	Nothing
`soft_interaction`	Filter search results	Nothing
`mutation`	Submit a form	Policy check
`high_consequence`	Wire transfer, delete account	Human approval

Dark patterns are detected and flagged. Agents get clean, classified data instead of adversarial HTML.

3. Receipt: Cryptographic proof of everything

Every observation, policy decision, and action produces an immutable receipt in a content-addressed chain:

Who saw what, when
What was proposed, what policy said, what happened
W3C Verifiable Credentials with Ed25519 proofs
Merkle tree verification for any subset
GDPR-compliant erasure via key deletion

Quick Start

Option A: Server + Browser Extension (interactive use)

git clone https://github.com/AUTHENSOR/SiteSitter.git
cd sitesitter
pnpm install && pnpm build

# Start the governance server
node packages/runtime/dist/cli.js --db-path ./data.db

# Load the Chrome extension:
# chrome://extensions → Developer mode → Load unpacked → extensions/chrome/
# Navigate to any page → click SiteSitter → Capture

Option B: CLI (headless / scripting)

# Compile raw HTML into Page IR
node packages/cli/dist/main.js compile-html page.html --url https://example.com

# Accessibility audit
node packages/cli/dist/main.js audit bundle.json

# Dark pattern scan with regulatory citations
node packages/cli/dist/main.js compliance-report bundle.json

Option C: MCP Server (plug into Claude or any MCP client)

node packages/mcp-server/dist/cli.js

Every extracted action becomes an MCP tool with risk metadata. Claude sees structured entities and policy-evaluated actions instead of raw HTML.

Option D: Docker (self-hosted)

# One-command start with persistent data
docker compose up -d

# Server is running at http://localhost:3838
curl http://localhost:3838/health

Data persists in a Docker volume. Set SPIRO_AUTH_TOKEN in your environment for token-based auth.

Option E: HTTP API (integrate with any agent)

# Compile a page
curl -X POST http://localhost:3838/compile \
  -H "Content-Type: application/json" \
  -d @observation-bundle.json

# Evaluate an action against policy
curl -X POST http://localhost:3838/evaluate \
  -H "Content-Type: application/json" \
  -d '{"action": {"actionType": "purchase", "riskClass": "high_consequence"}, "context": {"url": "https://shop.example.com"}}'

# Get receipts
curl http://localhost:3838/receipts

90+ API endpoints. Full reference in CLAUDE.md.

Why This Doesn't Exist Yet

Layer	Without SiteSitter	With SiteSitter
What agents see	Raw HTML, adversarial CSS, dark patterns	Structured entities, classified actions, risk levels
Policy enforcement	None — agents act freely	Constitutional rules evaluated before every action
Dark pattern defense	None — agents are more susceptible than humans	8-category detection with FTC/EU DSA citations
Audit trail	None — browsers have no accountability	Immutable receipt chain with Merkle proofs
Memory	None — every session starts blank	Entity fingerprinting, cross-site matching, temporal recall
Approval gates	None — no distinction between read and delete	Risk-classified actions with human-in-the-loop for high consequence

Existing tools solve fragments: ad blockers filter, accessibility tools audit, browser automation acts. Nothing governs the full loop from observation to action to proof.

Architecture

Browser Extension (capture)
       │
       ▼ ObservationBundle (DOM + AX tree + screenshots)
┌──────────────────────┐
│   Compiler (8 stages) │ ── adapters, site families, dark pattern detection
└──────────┬───────────┘
           ▼ Page IR (entities, actions, regions, provenance)
┌──────────────────────┐
│   Policy Engine       │ ── 26 constitutional rules, risk classification
└──────────┬───────────┘
           ▼ PolicyEvaluation (allow / escalate / block)
┌──────────────────────┐
│   Runtime Server      │ ── HTTP API, SQLite, approval queue, SSE, auth
└──────────┬───────────┘
           ▼ Receipt (content-addressed, Merkle tree, W3C VC)
┌──────────────────────┐
│   Replay / Eval       │ ── benchmark, trace, diff, remix views
└──────────────────────┘

Packages

Package	What it does
`@sitesitter/web-ir`	Core types — Page, Entity, Action, Region, Provenance, State
`@sitesitter/compiler`	8-stage pipeline: classify, extract, detect dark patterns, validate
`@sitesitter/policy`	Constitutional browsing rules, risk evaluation, approval gates
`@sitesitter/receipts`	Immutable receipt chain, Merkle tree, W3C VCs, GDPR erasure
`@sitesitter/runtime`	HTTP server, SQLite, SSE, search, memory, reputation, federation
`@sitesitter/replay`	Record/replay, benchmarks, traces, differential testing
`@sitesitter/adapters`	Site-specific extraction, 20 built-in adapters, federated registry
`@sitesitter/mcp-server`	MCP bridge for Claude and compatible clients
`@sitesitter/cli`	33+ headless commands for compilation, audit, and analysis
`@sitesitter/playwright`	Playwright integration — compile pages in E2E tests
`@sitesitter/remix`	12 alternative HTML views (readable, spreadsheet, voice-nav, etc.)
Chrome extension	MV3 — capture, inspect, approve, investigate
Firefox extension	MV2 — cross-browser parity

Key Capabilities

Compilation — Any webpage to structured IR. No per-site code. 16-stage universal extraction as fallback. Site family grammars for common patterns (e-commerce, news, social, forums, SaaS, healthcare, finance, academic).

Dark Patterns — 8 categories (confirmshaming, urgency, preselection, sneaking, obstruction, trick wording, social proof, misdirection). Regulatory references (FTC Act, EU DSA Art. 25, GDPR, CCPA). Evidence packages for filing complaints.

Streaming — MutationObserver-based real-time capture. Incremental compilation. Server-Sent Events push updates to clients. Works with SPAs and infinite scroll.

Memory — Entity fingerprinting across sites and time. Ebbinghaus-inspired forgetting engine. Deja vu detection ("you saw this product 3 weeks ago for $47 less"). Workflow memory for repeated tasks.

Federation — Federated dark pattern observatory with Ed25519-signed reports. Peer-to-peer receipt witnesses. Consensus health scoring across instances.

Governance — Self-writing policies that learn from your decisions. Consent auto-handler. Web reputation scoring (8 dimensions). All learned rules are user-overridable.

Compliance — EU AI Act audit logging. GDPR receipt erasure. WCAG accessibility audits. FINRA/SOX/EDRM evidence exports. Berkeley Protocol case files.

Security — Prompt injection sanitizer for LLM pipelines. Ed25519 adapter signing with enforced verification. AES-256-GCM credential vault. Configurable CORS. Rate limiting. Circuit breakers.

Security Model

Read-only by default. Mutation is opt-in.
High-consequence actions always require human approval.
Websites are treated as adversarial.
Inference is separated from execution.
Provenance on every claim. No silent escalations.
Constitutional rules enforced below the prompt level.
SiteSitter does not "solve prompt injection." It contains, measures, and governs around it.

Development

pnpm install          # install dependencies
pnpm build            # build all packages
pnpm test             # 1,003 tests
pnpm typecheck        # 19 typecheck tasks
pnpm lint             # ESLint
pnpm format           # Prettier

Requires Node.js >= 20 and pnpm >= 9.

Contributing

We welcome contributions! See CONTRIBUTING.md for setup instructions and PR process.

For architecture documentation and API reference, see CLAUDE.md.

License

SiteSitter is licensed under the Apache License 2.0 (LICENSE-APACHE).

All packages, including the runtime server, are Apache 2.0. You can freely use, modify, and distribute SiteSitter for any purpose.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.changeset		.changeset
.github		.github
action		action
docs		docs
examples		examples
extensions		extensions
packages		packages
.dockerignore		.dockerignore
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AUTONOMOUS-WEB-GOVERNANCE.ts		AUTONOMOUS-WEB-GOVERNANCE.ts
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE-AGPL		LICENSE-AGPL
LICENSE-APACHE		LICENSE-APACHE
README.md		README.md
action.yml		action.yml
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SiteSitter

Part of the Authensor Safety Stack

The Problem

What SiteSitter Does

1. Compile: HTML in, structured data out

2. Govern: Policy before action

3. Receipt: Cryptographic proof of everything

Quick Start

Option A: Server + Browser Extension (interactive use)

Option B: CLI (headless / scripting)

Option C: MCP Server (plug into Claude or any MCP client)

Option D: Docker (self-hosted)

Option E: HTTP API (integrate with any agent)

Why This Doesn't Exist Yet

Architecture

Packages

Key Capabilities

Security Model

Development

Contributing

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SiteSitter

Part of the Authensor Safety Stack

The Problem

What SiteSitter Does

1. Compile: HTML in, structured data out

2. Govern: Policy before action

3. Receipt: Cryptographic proof of everything

Quick Start

Option A: Server + Browser Extension (interactive use)

Option B: CLI (headless / scripting)

Option C: MCP Server (plug into Claude or any MCP client)

Option D: Docker (self-hosted)

Option E: HTTP API (integrate with any agent)

Why This Doesn't Exist Yet

Architecture

Packages

Key Capabilities

Security Model

Development

Contributing

License

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages