Amazon Roundup Scout — A tool by Grimfaste that automates Amazon product research for roundup articles at scale.
Paste up to 10,000 keywords, and RoundupForge searches Amazon across 21 country marketplaces, collects product ASINs, and delivers organized results — ready for article creation with tools like ZimmWriter.
RoundupForge is a server-side web application built with Next.js 16 and TypeScript. It runs as a self-hosted service on macOS or Linux, using SQLite for local development and PostgreSQL for multi-worker production deployments.
The application is designed for headless batch processing — users submit keyword lists, and the system scrapes Amazon search results and product pages in the background using a pool of scraping API providers. All processing runs server-side with progress tracked in the database, so users can close the browser and return later.
RoundupForge is part of the Grimfaste platform and serves as the data collection layer for DojoClaw, the AI-powered article generation and publishing system.
| Layer | Technology | Purpose |
|---|---|---|
| Framework | Next.js 16 (App Router) | Server-side rendering, API routes, React UI |
| Language | TypeScript (strict mode) | Type safety across frontend and backend |
| Styling | Tailwind CSS 4 | Utility-first CSS framework |
| ORM | Prisma 7 | Database abstraction with migrations |
| Database | SQLite (dev) / PostgreSQL (prod) | Data persistence, job state, settings |
| HTML Parsing | Cheerio | Server-side DOM extraction from scraped pages |
| Scraping | Multi-provider pool | ScrapeOwl, ScraperAPI, ScrapingBee, ZenRows, DataForSEO |
| Concurrency | p-limit | Keyword-level parallel processing (1-50 concurrent) |
| Job Queue | Custom sequential queue | globalThis singleton with DB-backed state |
| LLM Integration | OpenAI-compatible API | Relevance filter for product scoring |
| Encryption | AES-256-GCM | Secrets encrypted at rest in database |
| Testing | Vitest | Unit tests for parsers, scrapers, services |
| Google Sheets | googleapis npm | Keyword import and result export |
| Real-time Updates | Server-Sent Events (SSE) | Live progress streaming with polling fallback |
┌─────────────────────────────────────────────────────────────┐
│ Browser (React UI) │
│ ├── Home — keyword input, Google Sheets, batch config │
│ ├── Dashboard — analytics, credit usage, failure patterns │
│ ├── Projects — progress, products, export, relevance │
│ ├── Profiles — scrape profiles per Amazon marketplace │
│ └── Settings — scrapers, LLM, Google Sheets, auth │
└──────────────────────┬──────────────────────────────────────┘
│ HTTP / SSE
┌──────────────────────▼──────────────────────────────────────┐
│ Next.js API Routes │
│ ├── /api/projects — CRUD, run, stop, export │
│ ├── /api/queue — queue status, recovery │
│ ├── /api/bulk-queue — multi-tab Google Sheets queue │
│ ├── /api/dashboard — aggregated analytics │
│ ├── /api/profiles — scrape profile management │
│ ├── /api/settings — scrapers, LLM, Google, general │
│ ├── /api/sheets — keyword load, result sync │
│ ├── /api/system/status — health check, diagnostics │
│ └── /api/auth/session — optional admin authentication │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ Backend Services │
│ ├── Queue Processor — sequential project execution │
│ ├── Runner — keyword processing with retries │
│ ├── Scraper Pool — primary + fallback adapters │
│ ├── Plugin Registry — extensible scraper registration │
│ ├── Product Cache — ASIN dedup across projects │
│ ├── Lifecycle Hooks — preScrape, postScrape, onFailure │
│ ├── Relevance Filter — LLM-based product scoring │
│ ├── Settings Service — encrypted DB-backed config │
│ ├── Job Run Service — durable job tracking + heartbeat │
│ └── Failure Summary — error categorization (10 types) │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ Data Layer │
│ ├── Project, KeywordResult, Product — core scrape data │
│ ├── JobRun — durable job state │
│ ├── AppSetting — encrypted settings │
│ ├── ExportSnapshot — export versioning │
│ ├── ScrapeProfile — per-domain config │
│ └── LlmProvider — LLM routing │
│ │
│ SQLite (local dev) ──or──▶ PostgreSQL (multi-worker prod) │
└─────────────────────────────────────────────────────────────┘
Keywords (paste / Google Sheets / bulk queue)
│
▼
Queue Project (status: queued → running)
│
▼
Build Amazon search URLs (domain from scrape profile)
│
▼
Fetch search results via scraper pool
(ScrapeOwl → ScraperAPI → ScrapingBee → ZenRows → DataForSEO)
│
▼
Extract product links + ASINs (dedupe, randomize count)
│
▼
Check ASIN cache ──▶ cached? reuse ──▶ not cached? scrape
│
▼
Fast mode: done ─── Full mode: visit each product page
│ extract title, bullets,
│ description, specs, reviews
▼
Store in database, track credits, update progress via SSE
│
▼
Auto-retry failed keywords (exponential backoff)
│
▼
Queue: advance to next project
│
▼
Export: Roundup packs / CSV / JSON / Google Sheets
│
▼
Optional: LLM relevance filter (per-keyword scoring)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ M3 Ultra │ │ M2 Ultra │ │ Mac Mini 1 │ │ Mac Mini 2 │
│ Worker │ │ Master Node │ │ Worker │ │ Worker │
│ │ │ │ │ │ │ │
│ RoundupForge │ │ RoundupForge │ │ RoundupForge │ │ RoundupForge │
│ DojoClaw │ │ DojoClaw │ │ DojoClaw │ │ DojoClaw │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │ │
└─────────────────┼─────────────────┼─────────────────┘
│
┌────────▼────────┐
│ PostgreSQL │
│ (M2 Ultra) │
│ │
│ Shared DB: │
│ - RF tables │
│ - DC tables │
│ - ASIN cache │
└─────────────────┘
- Batch keyword processing — paste or load up to 10,000 keywords at once
- 21 Amazon marketplaces — US, UK, DE, FR, IT, ES, CA, AU, JP, IN, BR, MX, NL, SE, PL, BE, SG, SA, AE, TR, EG
- Two scraping modes — Fast (1 API call per keyword) or Full (1 + N per keyword)
- Randomized product counts — set a range (e.g., 7–15) for natural-looking roundups
- Multi-scraper pool — ScrapeOwl primary with automatic failover to 4 other providers
- Scraper plugin registry — extensible adapter system for new scraping backends
- Exponential backoff — retries with jitter and
retry-afterheader support - Typed error classification — RateLimitError, BlockedError, TimeoutError, AuthError, ParseError
- Sequential project queue — projects run one at a time, auto-advance on completion
- Bulk queue from Google Sheets — queue all sheet tabs as separate projects in one click
- Global max concurrency — configurable cap (1-50) applied across all projects
- Retry/Resume bypasses queue — runs immediately in parallel with queued projects
- Durable job runs — JobRun model with heartbeat tracking survives server restarts
- Graceful shutdown — SIGTERM/SIGINT handlers for clean process termination
- Queue recovery — orphaned "running" projects auto-recovered on restart
- Relevance filter — LLM-based product scoring per keyword (manual trigger)
- Conservative prompt — only drops wrong-category items (accessories, toys, unrelated)
- Per-keyword progress — live filtering progress with error resilience
- Multiple LLM providers — OpenAI, Claude, OpenRouter, Ollama, LM Studio
- Roundup export — ZimmWriter-compatible format, auto-split into packs of 100
- "Save All in One File" — combine all packs into a single download
- CSV and JSON export — full structured data with exclusion filtering
- Google Sheets sync — load keywords from and push results back to Sheets
- Export versioning — snapshot records with content hash for audit trail
- Dashboard — projects, keywords, products, credits, success rate, daily stats
- Failure patterns — 10-category error summarization on dashboard
- Credit tracking — ScrapeOwl credits tracked per project
- Browser notifications — desktop alerts on project completion/failure
- SSE progress — real-time streaming with polling fallback
- System status API — database, queue, integrations health check
- Optional admin auth — APP_ADMIN_TOKEN for deployment protection
- Encrypted secrets — AES-256-GCM for API keys stored in database
- Persisted settings — DB-backed config with environment variable fallback
- Masked API keys — secrets never exposed to browser
- Amazon marketplace dropdown — quick profile creation for any supported country
- Profile validation — domain, selector, and affiliate code validation before use
- Test-scrape preview — test a profile against a single URL before saving
- CSS selector config — title, image, feature bullets, description, reviews
- Node.js 18+
- npm
- A ScrapeOwl API key
git clone https://github.com/MeyerThorsten/grimfaste-roundupforge.git
cd grimfaste-roundupforge
npm install
cp .env.example .env # add your SCRAPEOWL_API_KEY
npx prisma db push
npx prisma generate
npm run devOpen http://localhost:3000.
- Go to Settings and add your ScrapeOwl API key
- Paste keywords or load them from Google Sheets
- Select Fast mode (default) for ASIN collection
- Click Run Batch — project is queued and starts automatically
- Watch progress with live updates
- Click Export Roundup for ZimmWriter-ready output
- 1 API call per keyword — fetches only the Amazon search results page
- Extracts: ASIN, title, image URL, product URL, affiliate URL
- Speed: ~3,600 keywords/hour at 25 concurrent requests
- Cost: 1 ScrapeOwl credit per keyword
- 1 + N API calls per keyword — fetches search page + each product page
- Extracts: everything from Fast mode, plus feature bullets, description, specs, reviews
- Speed: depends on products per keyword and concurrency
- Cost: 1 + N ScrapeOwl credits per keyword
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | file:./dev.db (SQLite) or postgresql://... |
SCRAPEOWL_API_KEY |
Yes | ScrapeOwl API key |
APP_ADMIN_TOKEN |
No | Admin auth token for deployment protection |
APP_SETTINGS_MASTER_KEY |
No | Encryption key for secrets (auto-generated if not set) |
GOOGLE_SERVICE_ACCOUNT_JSON |
No | Google Cloud service account JSON |
GOOGLE_SHEET_ID |
No | Default Google Sheet spreadsheet ID |
All scraper keys, LLM providers, and settings are configurable from Settings in the app.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/projects |
List all projects |
| POST | /api/projects |
Create and auto-queue project |
| GET | /api/projects/[id] |
Get project with keywords + products |
| PATCH | /api/projects/[id] |
Update project name |
| POST | /api/projects/[id]/run |
Retry/resume (bypasses queue) |
| POST | /api/projects/[id]/stop |
Stop running or dequeue |
| GET | /api/projects/[id]/export?format=json|csv|roundup |
Export results |
| GET | /api/projects/[id]/progress |
SSE progress stream |
| POST | /api/projects/[id]/relevance |
Run relevance filter |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/queue |
Queue status (running + queued projects) |
| POST | /api/bulk-queue |
Queue all Google Sheets tabs as projects |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/dashboard |
Aggregated analytics and stats |
| GET | /api/system/status |
Health check and diagnostics |
| GET | /api/scrapers |
Active scraper summary + plan limits |
| Method | Endpoint | Description |
|---|---|---|
| GET/POST | /api/settings/general |
Retry count, max concurrency |
| GET/POST | /api/settings/scrapers |
Scraper keys, plans, toggles |
| GET/POST | /api/settings/google |
Google Sheets configuration |
| GET/POST/DELETE | /api/settings/llm |
LLM provider management |
grimfaste-roundupforge/
├── prisma/
│ └── schema.prisma # Database schema (8 models)
├── src/
│ ├── app/
│ │ ├── layout.tsx # Root layout with nav
│ │ ├── page.tsx # Home — keywords, Sheets, batch config
│ │ ├── dashboard/page.tsx # Analytics dashboard
│ │ ├── profiles/page.tsx # Scrape profile editor
│ │ ├── projects/[id]/page.tsx # Results — progress, products, export
│ │ ├── settings/page.tsx # All settings management
│ │ ├── components/ # Shared UI components
│ │ └── api/ # REST API routes
│ ├── lib/
│ │ ├── prisma.ts # Prisma client singleton
│ │ ├── services/ # Project, product, settings, job-run services
│ │ ├── scraping/ # Adapter interface, 5 providers, pool, registry
│ │ ├── sheets/ # Google Sheets service
│ │ ├── jobs/ # Queue processor, runner, cancellation
│ │ ├── hooks/ # Scrape lifecycle hooks
│ │ ├── observability/ # Failure categorization
│ │ ├── auth/ # Admin authentication
│ │ ├── settings/ # Crypto, scraper config
│ │ ├── export/ # CSV + Roundup serializers
│ │ ├── llm/ # LLM provider abstraction
│ │ └── parsing/ # Keyword input parser
│ └── types/index.ts # TypeScript interfaces
├── docs/
│ ├── design/DESIGN.md # Architecture document
│ └── roadmap/ # Phase planning documents
├── middleware.ts # Auth middleware
├── vitest.config.ts # Test configuration
└── package.json
npm run dev # Start dev server (port 3000)
npm run test # Run vitest tests
npx tsc --noEmit # Type check
npm run build # Production build
npx prisma db push # Push schema changes
npx prisma generate # Regenerate Prisma client
npx prisma studio # Browse databaseRoundupForge is built and maintained by Grimfaste — the analytics command center for publishers managing hundreds of WordPress sites.
RoundupForge serves as the data collection layer in the Grimfaste platform, feeding product data to DojoClaw for AI-powered article generation and multi-site publishing.
RoundupForge is licensed under the GNU Affero General Public License v3.0.