Every page.
Every format.
One claw.
One command setup
MCP + CLI
Give your AI agents web data with a single command. Auto-detects your tools and configures everything.
Learn morenpx create-webclawWorks with Claude Code, Cursor,
Windsurf, Codex, OpenCode, and more
Try it live
Paste any URL
Every page.
Every defense.
Fast by default. Smart when needed.
118ms average for static pages. Multi-layer rendering pipeline for JS-heavy sites. You don't configure anything — the engine picks the fastest path automatically.
Best-in-class bot protection.
Challenge pages, CAPTCHAs, browser fingerprinting — handled transparently. No manual cookies, no config. Your requests just work, even on the hardest sites.
Agentic scraping.
Give a goal, get structured data. The AI agent reasons about page content, clicks buttons, navigates, and extracts exactly what you asked for. Powered by the best available models.
Every format, every extraction.
Markdown, JSON, plain text, LLM-optimized. Schema-based extraction, prompt-based extraction, summarization, brand identity, content diffing. 14 endpoints, one API key.
Built for AI agents.
MCP server with 12 tools for Claude, Cursor, Windsurf, OpenCode, Codex, Antigravity, and any MCP client. REST API for everything else. Web search, batch processing, crawling, sitemap discovery.
Documents, screenshots, mobile.
Auto-detects PDFs, DOCX, XLSX, CSV. Take full-page screenshots. Mobile emulation for cleaner layouts. Browser actions — click, type, scroll, wait — before extraction.
Firecrawl compatible.
Drop-in /v2 endpoints. Change your base URL, keep your existing SDK code. Same API shape, better extraction quality, faster response times.
Deep content recovery.
Embedded JSON, structured data, server-rendered payloads — extracted even when the visible DOM is empty. Multiple fallback strategies ensure nothing gets missed. If the content exists, webclaw finds it.
FROM THE BLOG
Latest posts
Mar 31, 2026
Extract structured data from any webpage
You don't always need the full page. Sometimes you need three fields from a product listing. Here's how to pull exactly the data you want from any URL.
Mar 27, 2026
Build a RAG pipeline with live web data
Most RAG tutorials stop at "upload a PDF." Real apps need live web data. Here's how to build a pipeline that fetches, extracts, and indexes pages.
Mar 24, 2026
MCP and Web Scraping for AI Agents
Your AI agent can reason, write code, and hold conversations. But it can't read a webpage. MCP fixes that. Here's how to set it up.
Mar 20, 2026
HTML to Markdown for LLMs
A standard webpage is 50,000 tokens of HTML. The content you need is 800. Here's how to stop paying for the other 49,200.
One credit.
One page.
No hidden multipliers. No per-feature charges. Pick a plan, start extracting.
Unlimited pages. Unlimited research. 200 concurrent. Single-tenant on your cloud, your proxies, your rules. Dedicated Slack channel + SLA.
Self-host forever. AGPL-3.0 license. CLI + server + MCP server. No limits on your hardware.
1 CREDIT = 1 PAGE, ALWAYS · NO HIDDEN MULTIPLIERS · OPEN SOURCE
Common questions
FAQ
Webclaw is a web extraction toolkit that converts any website into clean, structured data. It supports multiple output formats — Markdown, JSON, HTML, plain text, and an LLM-optimized format that strips noise and reduces token count by up to 67%.
Webclaw uses raw HTTP requests with TLS fingerprint impersonation instead of spinning up a headless browser. This means sub-200ms response times, zero browser overhead, and no Selenium or Playwright dependency. It achieves the same results through intelligent content extraction and readability scoring.
Yes. The Starter plan is completely free — 500 pages per month, 5 output formats, sitemap discovery, and full API access. No credit card required. You can upgrade anytime if you need higher limits or advanced features like LLM extraction.
Absolutely. Webclaw is open source under the AGPL-3.0 license. You can run the CLI, REST API server, or MCP server on your own infrastructure. Docker images and one-line deploy scripts are available for quick setup.
Six formats: Markdown (clean readable text), JSON (structured with metadata), HTML (sanitized), plain text, LLM-optimized (stripped of noise for AI consumption), and raw HTML. The LLM format runs a 9-step optimization pipeline to minimize token usage.
Webclaw ships a dedicated MCP (Model Context Protocol) server binary that exposes 8 tools — scrape, crawl, map, batch, extract, summarize, diff, and brand. It works with any MCP-compatible client like Claude Desktop, Claude Code, Cursor, Windsurf, OpenCode, Codex, or Antigravity over stdio transport.
Your extracted content is never stored or logged on our servers. Requests are processed in real-time and the response is returned directly to you. If you use LLM features, content is sent to the AI provider for processing but is not retained. For full control, self-host the entire stack.
Webclaw can use language models to extract structured JSON from pages using a schema you define, answer questions about page content with prompt-based extraction, or generate summaries. It chains through local Ollama first, then falls back to cloud providers.
Ready to build?
Start extracting.
Free tier. No credit card. Deploy in under a minute — or self-host forever. Open source.