Cinder is a high-performance, self-hosted web scraping API built with Go. It turns any website into LLM-ready markdown, designed as a drop-in alternative to Firecrawl.
Why Cinder? Heavily optimized for low-memory, serverless, and "hobby tier" environments by using intelligent browser process management and a unified "monolith" architecture.
- β‘ Fast & Efficient: Reuses a single Chrome process with lightweight tabs, avoiding the heavy startup cost of spawning browsers per request.
- π Monolith Mode: Runs the API and Async Worker in a single binary/container. Perfect for services like Railway or Leapcell where you pay per active container.
- π Async Queues: Redis-backed job queue (Asynq) for handling heavy scrape jobs without blocking HTTP clients.
- π§ LLM Ready: Converts complex HTML/SPAs into clean, structured Markdown using
html-to-markdown/v2. - π΅οΈ Evasion: Automatic User-Agent rotation and un-detected headless flags.
- Go 1.25+ (for local development)
- Redis (Required for
/crawlendpoints, optional for simple/scrape) - Chromium (Installed automatically in Docker or Linux systems)
- Memory:
- Minimum: 512MB (basic scraping only, no JS rendering)
- Recommended: 2GB (comfortable for dynamic scraping + async queue)
- Hobby Tier (4GB): Perfect for production use
- CPU: 1+ cores (single core works, multiple cores improve concurrency)
- Disk: 50MB (binary + dependencies)
# Clone
git clone https://github.com/Michael-Obele/cinder.git
cd cinder
# Install dependencies
go mod download
# Create .env (optional, uses defaults)
cat > .env << 'EOF'
PORT=8080
SERVER_MODE=debug
LOG_LEVEL=info
# REDIS_URL=redis://localhost:6379 # Optional, for async crawling
EOF
# Run (Monolith Mode)
go run ./cmd/apiVisit http://localhost:8080 (returns 404, which is expectedβAPI is at /v1/scrape, /v1/crawl, etc.)
# Test synchronous scrape
curl -X POST http://localhost:8080/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "mode": "static"}'
# Should return markdown content in ~500ms# Build
docker build -t cinder .
# Run with environment variables
docker run -p 8080:8080 \
-e PORT=8080 \
-e SERVER_MODE=release \
cinder
# With Redis for async crawling
docker run -p 8080:8080 \
-e REDIS_URL=redis://host.docker.internal:6379 \
cinder- Dockerfile support: β Native
- Environment: Set
SERVER_MODE=release - Memory: Hobby Tier (512MB) recommended
- Why: 4GB RAM + Unlimited concurrent requests (pay per compute minutes)
- Cost: ~$5-15/month for moderate traffic
- Setup: Push Docker image, set env vars
- Note: Monolith Mode perfectly fits the resource constraints
- Use as a serverless function (requires API refactor for edge runtime)
- Not recommended due to Chromium size (~400MB)
- Requires AWS Lambda Container Images
- Cold starts ~10-15s (browser startup)
- Reserve concurrency for faster starts
All endpoints are prefixed with /v1/.
Best for: Single pages, fast turnaround needed.
POST /v1/scrape
Request:
curl -X POST http://localhost:8080/v1/scrape \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"mode": "smart"
}'Parameters:
url(required): Valid HTTP(S) URL to scrapemode(optional): Scraping strategysmart(default): Auto-detect static vs dynamicstatic: Use Colly (fast, lightweight)dynamic: Use Chromedp (handles JavaScript)
Response (200 OK):
{
"url": "https://example.com",
"markdown": "# Example Domain\n\nThis domain is established to be used for examples...",
"html": "<!DOCTYPE html>\n<html>\n...",
"metadata": {
"scraped_at": "2026-01-20T10:30:00Z",
"engine": "chromedp"
}
}Best for: Large sites, depth crawling, fire-and-forget jobs.
POST /v1/crawl
Request:
curl -X POST http://localhost:8080/v1/crawl \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/blog",
"render": false
}'Parameters:
url(required): Root URL to start crawlingrender(optional): Force dynamic rendering (default: false)
Response (202 Accepted):
{
"id": "asynq:task:uuid-here",
"url": "https://example.com/blog",
"render": false
}Check job progress and results.
GET /v1/crawl/:id
Request:
curl http://localhost:8080/v1/crawl/asynq:task:uuid-hereResponse (200 OK):
{
"id": "asynq:task:uuid-here",
"queue": "default",
"state": "completed",
"max_retry": 3,
"retried": 0,
"payload": "{\"url\":\"https://example.com/blog\",\"render\":false}",
"result": "{\"urls_scraped\": 15, ...}"
}States: pending, active, completed, failed, retry
Search the web and return results.
POST /v1/search
Requires: BRAVE_SEARCH_API_KEY environment variable
Request:
curl -X POST http://localhost:8080/v1/search \
-H "Content-Type: application/json" \
-d '{"query": "golang web scraping"}'| Mode | Engine | Speed | JS Support | Best For |
|---|---|---|---|---|
| static | Colly | β‘β‘β‘ Fast | β No | Traditional HTML sites |
| dynamic | Chromedp | β‘ Slow | β Yes | React, Vue, SPAs |
| smart | Auto-detect | β‘β‘ Medium | β Sometimes | Most sites (default) |
Smart Mode Algorithm:
- Attempts static scrape first (~200ms)
- Falls back to dynamic if content is minimal or fails
| Variable | Default | Required | Description |
|---|---|---|---|
PORT |
8080 |
No | HTTP server port |
SERVER_MODE |
debug |
No | Server mode: debug, release, test |
LOG_LEVEL |
info |
No | Log level: debug, info, warn, error |
REDIS_URL |
(none) | Conditional* | Redis connection URL (e.g., redis://localhost:6379) |
REDIS_HOST |
(none) | Conditional* | Redis host (alternative to REDIS_URL) |
REDIS_PORT |
6379 |
Conditional* | Redis port |
REDIS_PASSWORD |
(none) | Conditional* | Redis password |
BRAVE_SEARCH_API_KEY |
(none) | No | API key for Brave Search endpoint |
DISABLE_WORKER |
false |
No | Set to true to disable embedded worker (microservices mode) |
Note: *Redis is required for /v1/crawl endpoints. Without it, they return 503 Service Unavailable.
Cinder employs a Monolithic Architecture with Embedded Worker pattern, optimized for serverless and hobby-tier deployments where minimizing resource usage and cold-start times is critical.
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β HTTP API β β Queue Worker β β Scraper β
β (Gin Router) βββββΊβ (Asynq) βββββΊβ Service β
β β β β β β
β β’ /v1/scrape β β β’ Task Processingβ β β’ Mode Selectionβ
β β’ /v1/crawl β β β’ Retry Logic β β β’ Caching β
β β’ /v1/search β β β’ Result Storage β β β’ Result Format β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββ
β Browser Pool β
β (Chromedp) β
β β
β β’ Shared Allocator β
β β’ Tab Management β
β β’ Memory Optimizationβ
ββββββββββββββββββββββββ
Synchronous Flow (/v1/scrape):
Client Request β Gin Router β Scrape Handler β Scraper Service
β β β β
Validate URL β Select Mode β Check Cache β Execute Scrape
β β β β
Return JSON β Format Result β Store Cache β Browser/Colly
Asynchronous Flow (/v1/crawl):
Client Request β Gin Router β Crawl Handler β Redis Queue
β β β β
Validate URL β Create Task β Enqueue Job β Return Job ID
β β β β
βββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Embedded Worker Process
β
Task Processor β Scraper Service
β
Result Storage β Client Polls Status
Problem Solved: Traditional scraping spawns a new Chrome process per request (~500ms startup + 300MB RAM), making it unsuitable for concurrent workloads.
Cinder's Solution:
- Singleton Allocator: One Chromium process per container instance
- Tab Pooling: Each scrape request creates a lightweight tab (
chromedp.NewContext) - Memory Efficiency: ~200-300MB total for browser + API server
- Concurrency: 10 concurrent tabs (configurable via
internal/worker/server.go)
Performance Impact:
- Latency: ~200ms static, ~1-3s dynamic (vs 2-5s with process spawning)
- Throughput: 3-5 requests/second on 2GB instances
- Resource Usage: 70% less memory than traditional approaches
Horizontal Scaling:
- Stateless Design: API instances can be scaled independently
- Shared Redis: Queue coordination across multiple workers
- Load Balancing: Standard HTTP load balancers work out-of-the-box
Vertical Scaling:
- Memory: 4GB recommended for production (handles browser + concurrent requests)
- CPU: 1-2 cores sufficient (I/O bound, not CPU bound)
- Storage: Minimal disk usage (logs + optional cache)
Reliability Features:
- Graceful Degradation: Falls back to static scraping if dynamic fails
- Circuit Breaker: Redis unavailability doesn't crash the API
- Health Checks: Browser process monitoring (planned for Phase 5)
- Result Caching: Redis-backed response caching reduces duplicate work
Why Monolith Mode?
- Serverless Optimization: Single process minimizes cold-start overhead
- Resource Efficiency: No inter-service communication overhead
- Hobby-Tier Friendly: Fits within free tier limits (Leapcell 4GB RAM)
- Simplicity: Easier deployment and debugging
Why Asynq over Custom Queue?
- Battle-Tested: Production-ready Redis-backed queue
- Observability: Built-in metrics and monitoring
- Reliability: Automatic retries, dead letter queues, task scheduling
- Ecosystem: Active maintenance and community support
Why Smart Mode Default?
- User-Friendly: Works for most sites without configuration
- Cost-Effective: Tries fast static scraping first
- Fallback Safety: Gracefully degrades to dynamic rendering
See plan/architecture.md for deeper technical details and design rationale.
Typical latencies on a 2GB instance with hot browser:
| Operation | Time | Notes |
|---|---|---|
| Static scrape (Colly) | 200-500ms | Simple HTML parsing |
| Dynamic scrape (Chromedp) | 1-3s | With JS rendering |
| Browser cold start | ~1-2s | One-time on app startup |
| Queue job enqueue | 5-10ms | Redis write |
| Queue job processing | 1-5s | Depends on site complexity |
Throughput:
- Concurrent requests: 10 (configurable in worker config)
- QPS (queries per second): ~3-5 on medium instances (site-dependent)
Problem: Container kills after ~1-2 hours
- Cause: Chrome memory leak after N pages
- Solution:
- Increase container memory (switch to 2GB+ tier)
- Reduce concurrent workers (lower
Concurrencyininternal/worker/server.go) - Enable browser restart after N requests (planned for Phase 5)
Problem: POST /v1/crawl returns Service Unavailable
- Cause:
REDIS_URLnot set or invalid - Solution: Set
REDIS_URL=redis://localhost:6379or equivalent - Workaround: Use synchronous
/v1/scrapeinstead
Problem: Markdown is mostly empty for modern sites
- Cause: Site not fully hydrated before HTML capture
- Solution:
- Try
mode=dynamicexplicitly - Increase page load timeout (future feature)
- Check browser console logs:
LOG_LEVEL=debug
- Try
Problem: Requests taking >5s
- Cause:
- Colly/Chromedp waiting for slow site
- Cold browser start (first request)
- Browser memory fragmentation
- Solution:
- Use
mode=staticfor fast sites - Warm up the browser:
curl http://localhost:8080/v1/scrape -d '{"url":"https://example.com","mode":"static"}' - Increase container memory
- Use
| Phase | Goal | Status |
|---|---|---|
| Phase 1 | Static Scraping (Colly) | β Done |
| Phase 2 | Dynamic Scraping (Chromedp) | β Done |
| Phase 3 | Async Queue (Asynq + Redis) | β Done |
| Phase 4 | Performance Tuning (Monolith) | β Done |
| Phase 5 | Hardening & Testing | π§ In Progress |
Current Focus:
- Adding a comprehensive Unit & Integration Test Suite (Currently 0% coverage).
- Implementing "Smart Wait" heuristics for slower SPAs.
- Adding a "Browser Health Check" to kill/restart Chrome after N scrapes.
Contributions are welcome! This project is in active development and priorities are:
-
Unit & Integration Tests (Currently 0% coverage)
internal/domain/scraper_test.gointernal/api/handlers/scrape_test.gointernal/scraper/chromedp_test.go
-
Smart Waiting Strategies for SPAs
- Network idle detection
- Configurable wait conditions
- Better heuristics for "page ready"
-
Browser Health Check
- Restart browser after N requests to prevent memory leaks
- Automatic OOM recovery
How to Contribute:
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/amazing-feature) - Add tests for your changes
- Commit your Changes (
git commit -m 'Add amazing feature') - Push to the Branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Standards:
- Use
go fmtfor formatting - Add structured logging via
pkg/logger - Include error handling (avoid silent failures)
- Test your code locally:
go test ./...
Distributed under the MIT License. See LICENSE for more information.