HeadlessX is a self-hosted scraping platform with a web dashboard, protected API, queue-backed workflows, and a remote MCP endpoint.
Current live surfaces:
- Website scraping: scrape, crawl, map, content extraction, screenshots
- Google AI Search
- Tavily
- Exa
- YouTube
- Queue jobs, logs, API keys, proxy management, and config management
- Remote MCP over
/mcp
- Simplified the dashboard around one global browser/runtime model
- Added Tavily, Exa, and YouTube workspaces
- Added queued crawl and job flows with Redis + worker support
- Added remote MCP secured with normal dashboard-created API keys
- Added setup and API guides aligned with the current route tree
| Scraper | Description | Status |
|---|---|---|
| Google Maps | Extract business listings, reviews, categories, ratings, contact details, opening hours, and location metadata from Google Maps search results. | Planned |
| Twitter / X | Capture profiles, posts, engagement metrics, media, hashtags, and conversation threads from public X pages. | Planned |
| Extract public company and profile data, role details, locations, website links, and business metadata from LinkedIn surfaces. | Planned | |
| Collect public profile data, captions, post metadata, media links, reels references, and engagement signals. | Planned | |
| Amazon | Extract product listings, seller data, pricing, ratings, reviews, availability, and catalog metadata from Amazon pages. | Planned |
| Capture public page data, posts, about fields, links, follower counts, and engagement metadata from Facebook pages. | Planned | |
| Extract subreddit, post, comment, author, score, flair, and discussion metadata from Reddit threads and listings. | Planned | |
| ThomasNet Suppliers Real-Time Scraper | Extract 70+ ThomasNet supplier fields including emails, phone numbers, company data, products, locations, certifications, and more. | Planned |
| TLS Appointment Booker | Automate TLS appointment availability checks and booking workflows with support for high-frequency monitoring and retry-safe session handling. | Planned |
| GlobalSpec Suppliers Scraper | Extract 200,000+ industrial supplier profiles from GlobalSpec Engineering360 with contact data, business type, product catalogs, specs, and datasheets. | Planned |
| ImportYeti Scraper | Extract supplier profiles, shipment records, and trade data from ImportYeti with 60+ fields including HS codes, shipping lanes, carriers, bills of lading, trading partners, and contact info. | Planned |
| MakersRow Scraper | Extract 11,600+ US manufacturer profiles from MakersRow with email, phone, address, website, GPS coordinates, capabilities, ratings, gallery images, and business hours. | Planned |
| Surface | Description | Status |
|---|---|---|
Web AI Agent (/web) |
Interactive AI agent workspace inside the dashboard that can use all HeadlessX playground tools and scrapers, including Website, Google AI Search, Tavily, Exa, YouTube, and related workflow actions. | Planned |
You can add the HeadlessX CLI skill to AI coding agents such as Cursor, Claude Code, Warp, Windsurf, OpenCode, OpenClaw, Antigravity, and similar tools that support the skills installer flow.
npx skills add https://github.com/saifyxpro/HeadlessX --skill cliThis installs the HeadlessX CLI skill from this repository so the agent can use the published headlessx command and follow the packaged usage guidance.
- Node.js 22+
- pnpm 9+
- PostgreSQL
- Redis
- Python/uv for
yt-engine - Go for the HTML-to-Markdown sidecar
Recommended for most developers:
- PostgreSQL: Supabase or Docker
- Redis: Docker
- App runtime:
pnpm devormise run dev
This keeps infrastructure simple while still running the app locally.
- Clone and install:
git clone https://github.com/saifyxpro/HeadlessX.git
cd HeadlessX
pnpm install- Create root
.envfrom the full example:
cp .env.example .envCurrent root .env.example:
# HeadlessX v2.1.0
# Local development environment
# Database
DATABASE_URL="postgresql://postgres.xxxxx:[email protected]:5432/postgres"
# API server
PORT=8000
HOST=0.0.0.0
NODE_ENV=development
# Required security
# Used by the Next.js dashboard server to authenticate against the API.
DASHBOARD_INTERNAL_API_KEY=replace-with-a-long-random-string
# Used to encrypt stored credentials at rest.
CREDENTIAL_ENCRYPTION_KEY=replace-with-a-different-long-random-string
# Queue and Redis
REDIS_URL=redis://localhost:6379
# Search providers
TAVILY_API_KEY=
EXA_API_KEY=
# Local engines
YT_ENGINE_URL=http://localhost:8090
YT_ENGINE_PORT=8090
YT_ENGINE_TIMEOUT_MS=45000
YT_ENGINE_TEMP_DIR=./tmp/yt-engine
YT_ENGINE_JOB_TTL_HOURS=12
HTML_TO_MARKDOWN_SERVICE_URL=http://localhost:8081
HTML_TO_MARKDOWN_PORT=8081
HTML_TO_MARKDOWN_TIMEOUT_MS=60000
# Browser and stealth defaults are managed in the dashboard settings UI.
# Web dashboard
WEB_PORT=3000
NEXT_PUBLIC_API_URL=http://localhost:8000
# Set this for Docker or custom internal networking.
# INTERNAL_API_URL=http://localhost:8000
# Set this only when the dashboard is hosted on a custom origin.
# FRONTEND_URL=https://dashboard.example.comIf you are using Docker instead of local services, start from the complete Docker env too:
cp infra/docker/.env.example infra/docker/.env- Prepare services:
pnpm db:push
pnpm camoufox:fetch- Start the workspace:
pnpm devThis starts:
- web
- api
- worker
- HTML-to-Markdown service
- yt-engine
Important:
pnpm devdoes not provision PostgreSQL or Redis- Website Crawl requires both Redis and the worker
For the current Docker path:
cp infra/docker/.env.example infra/docker/.env
cd infra/docker
docker compose --profile all up --build -dImportant notes:
- use
--profile all - partial profile runs are not currently reliable because of
depends_onrelationships - full Docker now includes
yt-engine, so YouTube works inside the same compose stack
See docs/setup-guide.md for the full matrix:
- no-Docker setup
- mixed local setup
- full Docker setup
- MCP client configuration
All non-health backend routes are protected with x-api-key.
Core backend surfaces:
GET /api/healthGET/PATCH /api/configGET /api/dashboard/statsGET /api/logsGET/POST/PATCH/DELETE /api/keys- proxy CRUD under
/api/proxies - website operator routes under
/api/operators/website/* - Google AI Search routes under
/api/operators/google/ai-search/* - Tavily routes under
/api/operators/tavily/* - Exa routes under
/api/operators/exa/* - YouTube routes under
/api/operators/youtube/* - queue job routes under
/api/jobs/* - remote MCP endpoint at
/mcp
See the full route reference in docs/api-endpoints.md.
HeadlessX exposes a remote MCP endpoint from the API:
http://localhost:8000/mcp
Use a normal API key created from the dashboard API Keys page.
Do not use DASHBOARD_INTERNAL_API_KEY for MCP clients.
Example client config:
{
"mcpServers": {
"headlessx": {
"transport": "http",
"url": "http://localhost:8000/mcp",
"headers": {
"x-api-key": "hx_your_dashboard_created_key"
}
}
}
}apps/
api/ Express API + worker + MCP
web/ Next.js dashboard
yt-engine/ Python YouTube engine
go-html-to-md-service/ Go HTML-to-Markdown sidecar
docs/
setup-guide.md
api-endpoints.md
infra/docker/
| Package | Description | Status |
|---|---|---|
| @headlessx-cli/core | Published CLI package for HeadlessX operators, jobs, and search workflows. Command: headlessx |
Available |
| HeadlessX Agent Skills | Installable agent skill pack from this repository for Cursor, Claude Code, Warp, Windsurf, OpenCode, OpenClaw, Antigravity, and similar tools. | Available |
| Package | Description | Status |
|---|---|---|
| headfox | HeadlessX-maintained Firefox-based anti-detect browser engine that will power the platform's next-generation browser runtime. | Planned |
| headfox-js | TypeScript package for launching, managing, and integrating Headfox in Node.js automation and scraping flows. | Planned |
- The dashboard uses the internal dashboard key for server-side internal requests
- MCP uses normal user-created API keys, not the dashboard internal key
- Queue-backed features return degraded/unavailable behavior when Redis is missing
- Docker support now covers the full runtime stack, including yt-engine
See CONTRIBUTING.md for the current contribution workflow, local setup expectations, pull request guidance, and commit message conventions.
MIT






