[ 200 OK ][ SCRAPE ][ HTML ][ .MD ][ JSON ][ EXTRACT ][ CRAWL ][ API ]

The extraction engine

Every page.
Every format.
One claw.

Turn any website into LLM-ready markdown, JSON, or structured data. No browser. No Selenium. Pure HTTP speed.

GET STARTED DOCUMENTATION

One command setup

MCP + CLI

Give your AI agents web data with a single command. Auto-detects your tools and configures everything.

Learn more

Terminal

$npx create-webclaw

Works with Claude Code, Cursor,
Windsurf, Codex, OpenCode, and more

Try it live

Paste any URL

0ms

avg extraction

success rate

token reduction

API endpoints

Every page.
Every defense.

Fast by default. Smart when needed.

118ms average for static pages. Multi-layer rendering pipeline for JS-heavy sites. You don't configure anything — the engine picks the fastest path automatically.

Best-in-class bot protection.

Challenge pages, CAPTCHAs, browser fingerprinting — handled transparently. No manual cookies, no config. Your requests just work, even on the hardest sites.

Agentic scraping.

Give a goal, get structured data. The AI agent reasons about page content, clicks buttons, navigates, and extracts exactly what you asked for. Powered by the best available models.

Every format, every extraction.

Markdown, JSON, plain text, LLM-optimized. Schema-based extraction, prompt-based extraction, summarization, brand identity, content diffing. 14 endpoints, one API key.

Built for AI agents.

MCP server with 12 tools for Claude, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP client. REST API for everything else. Web search, batch processing, crawling, sitemap discovery.

Documents, screenshots, mobile.

Auto-detects PDFs, DOCX, XLSX, CSV. Take full-page screenshots. Mobile emulation for cleaner layouts. Browser actions — click, type, scroll, wait — before extraction.

Firecrawl compatible.

Drop-in /v2 endpoints. Change your base URL, keep your existing SDK code. Same API shape, better extraction quality, faster response times.

Deep content recovery.

Embedded JSON, structured data, server-rendered payloads — extracted even when the visible DOM is empty. Multiple fallback strategies ensure nothing gets missed. If the content exists, webclaw finds it.

FROM THE BLOG

Latest posts

VIEW ALL →

Mar 31, 2026

Extract structured data from any webpage

You don't always need the full page. Sometimes you need three fields from a product listing. Here's how to pull exactly the data you want from any URL.

Mar 27, 2026

Build a RAG pipeline with live web data

Most RAG tutorials stop at "upload a PDF." Real apps need live web data. Here's how to build a pipeline that fetches, extracts, and indexes pages.

Mar 24, 2026

MCP and Web Scraping for AI Agents

Your AI agent can reason, write code, and hold conversations. But it can't read a webpage. MCP fixes that. Here's how to set it up.

Mar 20, 2026

HTML to Markdown for LLMs

A standard webpage is 50,000 tokens of HTML. The content you need is 800. Here's how to stop paying for the other 49,200.

VIEW ALL →

One credit.
One page.

No hidden multipliers. No per-feature charges. Pick a plan, start extracting.

FREE

$0/mo

PAGES················································································500/mo

CONCURRENCY················································································2

ANTIBOT················································································—

JS RENDER················································································—

LLM CALLS················································································—

RESEARCH················································································—

PROXY················································································—

SUPPORT················································································COMMUNITY

GET STARTED

STARTER

$49/mo

PAGES················································································10,000/mo

CONCURRENCY················································································10

ANTIBOT················································································—

JS RENDER················································································—

LLM CALLS················································································—

RESEARCH················································································5/mo

PROXY················································································—

SUPPORT················································································EMAIL

JOIN WAITLIST

PROPOPULAR

$99/mo

PAGES················································································100,000/mo

CONCURRENCY················································································50

ANTIBOT················································································500/mo

JS RENDER················································································2,000/mo

LLM CALLS················································································1,000/mo

RESEARCH················································································25/mo

PROXY················································································2 GB

SUPPORT················································································PRIORITY

JOIN WAITLIST

SCALE

$399/mo

PAGES················································································500,000/mo

CONCURRENCY················································································100

ANTIBOT················································································5,000/mo

JS RENDER················································································10,000/mo

LLM CALLS················································································10,000/mo

RESEARCH················································································100/mo

PROXY················································································10 GB

SUPPORT················································································PRIORITY + SLACK

JOIN WAITLIST

DEDICATED

Unlimited pages. Unlimited research. 200 concurrent. Single-tenant on your cloud, your proxies, your rules. Dedicated Slack channel + SLA.

OPEN SOURCE

Self-host forever. AGPL-3.0 license. CLI + server + MCP server. No limits on your hardware.

VIEW ON GITHUB

1 CREDIT = 1 PAGE, ALWAYS · NO HIDDEN MULTIPLIERS · OPEN SOURCE

Common questions

FAQ

Webclaw is a web extraction toolkit that converts any website into clean, structured data. It supports multiple output formats — Markdown, JSON, HTML, plain text, and an LLM-optimized format that strips noise and reduces token count by up to 67%.

Webclaw uses raw HTTP requests with TLS fingerprint impersonation instead of spinning up a headless browser. This means sub-200ms response times, zero browser overhead, and no Selenium or Playwright dependency. It achieves the same results through intelligent content extraction and readability scoring.

Yes. The Starter plan is completely free — 500 pages per month, 5 output formats, sitemap discovery, and full API access. No credit card required. You can upgrade anytime if you need higher limits or advanced features like LLM extraction.

Absolutely. Webclaw is open source under the AGPL-3.0 license. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available for quick setup.

Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.

Webclaw ships a dedicated MCP (Model Context Protocol) server binary that exposes 8 tools — scrape, crawl, map, batch, extract, summarize, diff, and brand. It works with any MCP-compatible client like Claude Desktop, Claude Code, Cursor, Windsurf, OpenCode, Codex, or Antigravity over stdio transport.

Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.

Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.

Ready to build?

Start extracting.

Free tier. No credit card. Deploy in under a minute — or self-host forever. Open source.

▌

GET YOUR API KEY VIEW ON GITHUB

Every page.Every format.One claw.

MCP + CLI

Paste any URL

Every page.Every defense.

Fast by default. Smart when needed.

Best-in-class bot protection.

Agentic scraping.

Every format, every extraction.

Built for AI agents.

Documents, screenshots, mobile.

Firecrawl compatible.

Deep content recovery.

Latest posts

Extract structured data from any webpage

Build a RAG pipeline with live web data

MCP and Web Scraping for AI Agents

HTML to Markdown for LLMs

One credit.One page.

FAQ

Start extracting.

Stay in the loop

Every page.
Every format.
One claw.

Every page.
Every defense.

One credit.
One page.