AshbyHQ Job Scraper

A production-grade system that scrapes public AshbyHQ job listings across 53+ companies, persists them in Neon PostgreSQL, tracks changes over time, and surfaces actionable opportunities through a Next.js frontend with intelligent scoring.

Features

53+ Companies — OpenAI, Notion, Ramp, Deel, Snowflake, Cursor, Harvey, Cohere, and more
Self-Service Company Addition — Paste any Ashby URL in the frontend to add and scrape a new company in real time
Automated Scraping — Polls Ashby's public job board API on a configurable cron schedule
Change Detection — Tracks new, updated, and removed postings via content hashing
Intelligent Scoring — Ranks jobs by keyword relevance, location, remote preference, department, and freshness
Cloud Database — Neon PostgreSQL with connection pooling (queryable from anywhere)
Web Frontend — Next.js 16 App Router with filtering, pagination, and per-browser apply/ignore tracking (localStorage)
Stealth-Aware — Jittered scheduling, randomized delays, browser-like headers
Reports — CLI summaries and daily Markdown reports with apply links

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Backend (Node.js)                                           │
│                                                             │
│ Scheduler ─→ Registry + DB ─→ Fetch ─→ Normalize ─→ PgSQL  │
│                                          │                  │
│                                   Diff Engine               │
│                                          │                  │
│                               Intelligence ─→ Notifications │
└──────────────────────────────┬──────────────────────────────┘
                               │
                        Neon PostgreSQL
                               │
┌──────────────────────────────┴──────────────────────────────┐
│ Frontend (Next.js)                                          │
│                                                             │
│ Server Components ─→ PostgreSQL queries ─→ Scored job feed  │
│ /add page ─→ POST /api/companies ─→ Live scrape + insert   │
│ Status tracking ─→ localStorage (per-browser, not DB)       │
└─────────────────────────────────────────────────────────────┘

Quick Start

Backend (Scraper)

# Install dependencies
npm install

# Configure environment (set DATABASE_URL to your Neon connection string)
cp .env.example .env

# Run a single scrape cycle
node index.js run

# Start the scheduled scraper
node index.js start

Frontend (Web UI)

cd web
npm install

# Configure environment
echo "DATABASE_URL=your_neon_connection_string" > .env.local

# Start the dev server
npx next dev -p 3000

Adding Companies

Via the Web UI (recommended)

Navigate to /add in the frontend. Paste an Ashby job board URL (e.g. https://jobs.ashbyhq.com/stripe) or just the slug (stripe). The system validates the board against Ashby's API, scrapes all open positions, and adds them to the database in real time. Future cron runs pick up the new company automatically.

Via the CLI

node index.js add <slug> -n "Company Name"
node index.js run   # scrape immediately

Commands

Command	Description
`node index.js run`	Run a single scrape cycle across all due companies
`node index.js start`	Start the cron-based scheduler
`node index.js report`	Generate a Markdown report from existing data
`node index.js add <slug>`	Add a company to the source registry

Configuration

Copy .env.example to .env and set:

DATABASE_URL — Neon PostgreSQL connection string
CRON_SCHEDULE — Cron expression for scrape frequency (default: every 12h)
MIN_RELEVANCE_SCORE — Minimum score threshold for surfaced jobs
LOG_LEVEL — Logging verbosity (debug/info/warn/error)

Intelligence rules (keyword weights, preferred locations, etc.) are in src/config/rules.json.

Tech Stack

Layer	Technology
Backend runtime	Node.js (CommonJS)
Database	Neon PostgreSQL (connection pooling, SSL)
HTTP client	Axios with exponential backoff
Scheduling	node-cron with jitter
Frontend	Next.js 16 (App Router), TypeScript, Tailwind CSS
CLI	Commander with chalk output

Project Structure

├── index.js                CLI entry point
├── src/
│   ├── config/             Environment + rules.json
│   ├── scheduler/          Cron + pipeline orchestration
│   ├── sources/            Company registry (53+ companies)
│   ├── fetch/              Axios client with retries + stealth headers
│   ├── normalize/          Schema mapping + content hashing
│   ├── store/              PostgreSQL persistence (async pg)
│   ├── diff/               Change detection (NEW/UPDATED/REMOVED)
│   ├── intelligence/       Multi-signal scoring engine
│   ├── notify/             CLI output + Markdown reports
│   └── utils/              Logger, hash, delay helpers
├── web/
│   ├── app/                Next.js pages (feed, detail, applied, ignored, add)
│   │   └── api/companies/  POST to add + scrape, GET to list companies
│   ├── components/         UI components (filters, job list, add company form)
│   └── lib/                Database queries, scoring, types, localStorage status
├── reports/                Generated Markdown reports
└── INTERVIEW_GUIDE.md      End-to-end project walkthrough

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
scripts		scripts
src		src
web		web
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AshbyHQ Job Scraper

Features

Architecture

Quick Start

Backend (Scraper)

Frontend (Web UI)

Adding Companies

Via the Web UI (recommended)

Via the CLI

Commands

Configuration

Tech Stack

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

AshbyHQ Job Scraper

Features

Architecture

Quick Start

Backend (Scraper)

Frontend (Web UI)

Adding Companies

Via the Web UI (recommended)

Via the CLI

Commands

Configuration

Tech Stack

Project Structure

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages