RoundupForge by Grimfaste

Amazon Roundup Scout — A tool by Grimfaste that automates Amazon product research for roundup articles at scale.

Paste up to 10,000 keywords, and RoundupForge searches Amazon across 21 country marketplaces, collects product ASINs, and delivers organized results — ready for article creation with tools like ZimmWriter.

Technology Statement

RoundupForge is a server-side web application built with Next.js 16 and TypeScript. It runs as a self-hosted service on macOS or Linux, using SQLite for local development and PostgreSQL for multi-worker production deployments.

The application is designed for headless batch processing — users submit keyword lists, and the system scrapes Amazon search results and product pages in the background using a pool of scraping API providers. All processing runs server-side with progress tracked in the database, so users can close the browser and return later.

RoundupForge is part of the Grimfaste platform and serves as the data collection layer for DojoClaw, the AI-powered article generation and publishing system.

Tech Stack

Layer	Technology	Purpose
Framework	Next.js 16 (App Router)	Server-side rendering, API routes, React UI
Language	TypeScript (strict mode)	Type safety across frontend and backend
Styling	Tailwind CSS 4	Utility-first CSS framework
ORM	Prisma 7	Database abstraction with migrations
Database	SQLite (dev) / PostgreSQL (prod)	Data persistence, job state, settings
HTML Parsing	Cheerio	Server-side DOM extraction from scraped pages
Scraping	Multi-provider pool	ScrapeOwl, ScraperAPI, ScrapingBee, ZenRows, DataForSEO
Concurrency	p-limit	Keyword-level parallel processing (1-50 concurrent)
Job Queue	Custom sequential queue	globalThis singleton with DB-backed state
LLM Integration	OpenAI-compatible API	Relevance filter for product scoring
Encryption	AES-256-GCM	Secrets encrypted at rest in database
Testing	Vitest	Unit tests for parsers, scrapers, services
Google Sheets	googleapis npm	Keyword import and result export
Real-time Updates	Server-Sent Events (SSE)	Live progress streaming with polling fallback

Architecture

┌─────────────────────────────────────────────────────────────┐
│  Browser (React UI)                                         │
│  ├── Home — keyword input, Google Sheets, batch config      │
│  ├── Dashboard — analytics, credit usage, failure patterns  │
│  ├── Projects — progress, products, export, relevance       │
│  ├── Profiles — scrape profiles per Amazon marketplace      │
│  └── Settings — scrapers, LLM, Google Sheets, auth          │
└──────────────────────┬──────────────────────────────────────┘
                       │ HTTP / SSE
┌──────────────────────▼──────────────────────────────────────┐
│  Next.js API Routes                                         │
│  ├── /api/projects      — CRUD, run, stop, export           │
│  ├── /api/queue         — queue status, recovery            │
│  ├── /api/bulk-queue    — multi-tab Google Sheets queue      │
│  ├── /api/dashboard     — aggregated analytics              │
│  ├── /api/profiles      — scrape profile management         │
│  ├── /api/settings      — scrapers, LLM, Google, general    │
│  ├── /api/sheets        — keyword load, result sync         │
│  ├── /api/system/status — health check, diagnostics         │
│  └── /api/auth/session  — optional admin authentication     │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│  Backend Services                                           │
│  ├── Queue Processor    — sequential project execution      │
│  ├── Runner             — keyword processing with retries   │
│  ├── Scraper Pool       — primary + fallback adapters       │
│  ├── Plugin Registry    — extensible scraper registration   │
│  ├── Product Cache      — ASIN dedup across projects        │
│  ├── Lifecycle Hooks    — preScrape, postScrape, onFailure  │
│  ├── Relevance Filter   — LLM-based product scoring         │
│  ├── Settings Service   — encrypted DB-backed config        │
│  ├── Job Run Service    — durable job tracking + heartbeat  │
│  └── Failure Summary    — error categorization (10 types)   │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│  Data Layer                                                 │
│  ├── Project, KeywordResult, Product  — core scrape data    │
│  ├── JobRun                           — durable job state   │
│  ├── AppSetting                       — encrypted settings  │
│  ├── ExportSnapshot                   — export versioning   │
│  ├── ScrapeProfile                    — per-domain config   │
│  └── LlmProvider                      — LLM routing        │
│                                                             │
│  SQLite (local dev) ──or──▶ PostgreSQL (multi-worker prod)  │
└─────────────────────────────────────────────────────────────┘

Data Flow

Keywords (paste / Google Sheets / bulk queue)
        │
        ▼
  Queue Project (status: queued → running)
        │
        ▼
  Build Amazon search URLs (domain from scrape profile)
        │
        ▼
  Fetch search results via scraper pool
  (ScrapeOwl → ScraperAPI → ScrapingBee → ZenRows → DataForSEO)
        │
        ▼
  Extract product links + ASINs (dedupe, randomize count)
        │
        ▼
  Check ASIN cache ──▶ cached? reuse ──▶ not cached? scrape
        │
        ▼
  Fast mode: done ─── Full mode: visit each product page
        │                          extract title, bullets,
        │                          description, specs, reviews
        ▼
  Store in database, track credits, update progress via SSE
        │
        ▼
  Auto-retry failed keywords (exponential backoff)
        │
        ▼
  Queue: advance to next project
        │
        ▼
  Export: Roundup packs / CSV / JSON / Google Sheets
        │
        ▼
  Optional: LLM relevance filter (per-keyword scoring)

Multi-Worker Architecture (Production)

┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│  M3 Ultra    │  │  M2 Ultra    │  │  Mac Mini 1  │  │  Mac Mini 2  │
│  Worker      │  │  Master Node │  │  Worker      │  │  Worker      │
│              │  │              │  │              │  │              │
│ RoundupForge │  │ RoundupForge │  │ RoundupForge │  │ RoundupForge │
│ DojoClaw     │  │ DojoClaw     │  │ DojoClaw     │  │ DojoClaw     │
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │                 │
       └─────────────────┼─────────────────┼─────────────────┘
                         │
                ┌────────▼────────┐
                │  PostgreSQL     │
                │  (M2 Ultra)     │
                │                 │
                │  Shared DB:     │
                │  - RF tables    │
                │  - DC tables    │
                │  - ASIN cache   │
                └─────────────────┘

Features

Scraping & Data Collection

Batch keyword processing — paste or load up to 10,000 keywords at once
21 Amazon marketplaces — US, UK, DE, FR, IT, ES, CA, AU, JP, IN, BR, MX, NL, SE, PL, BE, SG, SA, AE, TR, EG
Two scraping modes — Fast (1 API call per keyword) or Full (1 + N per keyword)
Randomized product counts — set a range (e.g., 7–15) for natural-looking roundups
Multi-scraper pool — ScrapeOwl primary with automatic failover to 4 other providers
Scraper plugin registry — extensible adapter system for new scraping backends
Exponential backoff — retries with jitter and retry-after header support
Typed error classification — RateLimitError, BlockedError, TimeoutError, AuthError, ParseError

Queue & Job Management

Sequential project queue — projects run one at a time, auto-advance on completion
Bulk queue from Google Sheets — queue all sheet tabs as separate projects in one click
Global max concurrency — configurable cap (1-50) applied across all projects
Retry/Resume bypasses queue — runs immediately in parallel with queued projects
Durable job runs — JobRun model with heartbeat tracking survives server restarts
Graceful shutdown — SIGTERM/SIGINT handlers for clean process termination
Queue recovery — orphaned "running" projects auto-recovered on restart

LLM & Filtering

Relevance filter — LLM-based product scoring per keyword (manual trigger)
Conservative prompt — only drops wrong-category items (accessories, toys, unrelated)
Per-keyword progress — live filtering progress with error resilience
Multiple LLM providers — OpenAI, Claude, OpenRouter, Ollama, LM Studio

Export & Integration

Roundup export — ZimmWriter-compatible format, auto-split into packs of 100
"Save All in One File" — combine all packs into a single download
CSV and JSON export — full structured data with exclusion filtering
Google Sheets sync — load keywords from and push results back to Sheets
Export versioning — snapshot records with content hash for audit trail

Monitoring & Analytics

Dashboard — projects, keywords, products, credits, success rate, daily stats
Failure patterns — 10-category error summarization on dashboard
Credit tracking — ScrapeOwl credits tracked per project
Browser notifications — desktop alerts on project completion/failure
SSE progress — real-time streaming with polling fallback
System status API — database, queue, integrations health check

Security & Settings

Optional admin auth — APP_ADMIN_TOKEN for deployment protection
Encrypted secrets — AES-256-GCM for API keys stored in database
Persisted settings — DB-backed config with environment variable fallback
Masked API keys — secrets never exposed to browser

Scrape Profiles

Amazon marketplace dropdown — quick profile creation for any supported country
Profile validation — domain, selector, and affiliate code validation before use
Test-scrape preview — test a profile against a single URL before saving
CSS selector config — title, image, feature bullets, description, reviews

Quick Start

Prerequisites

Node.js 18+
npm
A ScrapeOwl API key

Installation

git clone https://github.com/MeyerThorsten/grimfaste-roundupforge.git
cd grimfaste-roundupforge
npm install
cp .env.example .env       # add your SCRAPEOWL_API_KEY
npx prisma db push
npx prisma generate
npm run dev

Open http://localhost:3000.

First Run

Go to Settings and add your ScrapeOwl API key
Paste keywords or load them from Google Sheets
Select Fast mode (default) for ASIN collection
Click Run Batch — project is queued and starts automatically
Watch progress with live updates
Click Export Roundup for ZimmWriter-ready output

Scraping Modes

Fast Mode (default)

1 API call per keyword — fetches only the Amazon search results page
Extracts: ASIN, title, image URL, product URL, affiliate URL
Speed: ~3,600 keywords/hour at 25 concurrent requests
Cost: 1 ScrapeOwl credit per keyword

Full Mode

1 + N API calls per keyword — fetches search page + each product page
Extracts: everything from Fast mode, plus feature bullets, description, specs, reviews
Speed: depends on products per keyword and concurrency
Cost: 1 + N ScrapeOwl credits per keyword

Environment Variables

Variable	Required	Description
`DATABASE_URL`	Yes	`file:./dev.db` (SQLite) or `postgresql://...`
`SCRAPEOWL_API_KEY`	Yes	ScrapeOwl API key
`APP_ADMIN_TOKEN`	No	Admin auth token for deployment protection
`APP_SETTINGS_MASTER_KEY`	No	Encryption key for secrets (auto-generated if not set)
`GOOGLE_SERVICE_ACCOUNT_JSON`	No	Google Cloud service account JSON
`GOOGLE_SHEET_ID`	No	Default Google Sheet spreadsheet ID

All scraper keys, LLM providers, and settings are configurable from Settings in the app.

API Reference

Projects

Method	Endpoint	Description
GET	`/api/projects`	List all projects
POST	`/api/projects`	Create and auto-queue project
GET	`/api/projects/[id]`	Get project with keywords + products
PATCH	`/api/projects/[id]`	Update project name
POST	`/api/projects/[id]/run`	Retry/resume (bypasses queue)
POST	`/api/projects/[id]/stop`	Stop running or dequeue
GET	`/api/projects/[id]/export?format=json\|csv\|roundup`	Export results
GET	`/api/projects/[id]/progress`	SSE progress stream
POST	`/api/projects/[id]/relevance`	Run relevance filter

Queue & Bulk

Method	Endpoint	Description
GET	`/api/queue`	Queue status (running + queued projects)
POST	`/api/bulk-queue`	Queue all Google Sheets tabs as projects

System

Method	Endpoint	Description
GET	`/api/dashboard`	Aggregated analytics and stats
GET	`/api/system/status`	Health check and diagnostics
GET	`/api/scrapers`	Active scraper summary + plan limits

Settings

Method	Endpoint	Description
GET/POST	`/api/settings/general`	Retry count, max concurrency
GET/POST	`/api/settings/scrapers`	Scraper keys, plans, toggles
GET/POST	`/api/settings/google`	Google Sheets configuration
GET/POST/DELETE	`/api/settings/llm`	LLM provider management

Project Structure

grimfaste-roundupforge/
├── prisma/
│   └── schema.prisma                 # Database schema (8 models)
├── src/
│   ├── app/
│   │   ├── layout.tsx                 # Root layout with nav
│   │   ├── page.tsx                   # Home — keywords, Sheets, batch config
│   │   ├── dashboard/page.tsx         # Analytics dashboard
│   │   ├── profiles/page.tsx          # Scrape profile editor
│   │   ├── projects/[id]/page.tsx     # Results — progress, products, export
│   │   ├── settings/page.tsx          # All settings management
│   │   ├── components/               # Shared UI components
│   │   └── api/                       # REST API routes
│   ├── lib/
│   │   ├── prisma.ts                  # Prisma client singleton
│   │   ├── services/                  # Project, product, settings, job-run services
│   │   ├── scraping/                  # Adapter interface, 5 providers, pool, registry
│   │   ├── sheets/                    # Google Sheets service
│   │   ├── jobs/                      # Queue processor, runner, cancellation
│   │   ├── hooks/                     # Scrape lifecycle hooks
│   │   ├── observability/             # Failure categorization
│   │   ├── auth/                      # Admin authentication
│   │   ├── settings/                  # Crypto, scraper config
│   │   ├── export/                    # CSV + Roundup serializers
│   │   ├── llm/                       # LLM provider abstraction
│   │   └── parsing/                   # Keyword input parser
│   └── types/index.ts                 # TypeScript interfaces
├── docs/
│   ├── design/DESIGN.md              # Architecture document
│   └── roadmap/                       # Phase planning documents
├── middleware.ts                       # Auth middleware
├── vitest.config.ts                   # Test configuration
└── package.json

Development

npm run dev              # Start dev server (port 3000)
npm run test             # Run vitest tests
npx tsc --noEmit         # Type check
npm run build            # Production build
npx prisma db push       # Push schema changes
npx prisma generate      # Regenerate Prisma client
npx prisma studio        # Browse database

About

RoundupForge is built and maintained by Grimfaste — the analytics command center for publishers managing hundreds of WordPress sites.

RoundupForge serves as the data collection layer in the Grimfaste platform, feeding product data to DojoClaw for AI-powered article generation and multi-site publishing.

Learn more at grimfaste.com

License

RoundupForge is licensed under the GNU Affero General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
prisma		prisma
public		public
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
middleware.ts		middleware.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
prisma.config.ts		prisma.config.ts
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

RoundupForge by Grimfaste

Technology Statement

Tech Stack

Architecture

Data Flow

Multi-Worker Architecture (Production)

Features

Scraping & Data Collection

Queue & Job Management

LLM & Filtering

Export & Integration

Monitoring & Analytics

Security & Settings

Scrape Profiles

Quick Start

Prerequisites

Installation

First Run

Scraping Modes

Fast Mode (default)

Full Mode

Environment Variables

API Reference

Projects

Queue & Bulk

System

Settings

Project Structure

Development

About

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages