LeetCode Discussion Crawler + MCP Server

Crawl LeetCode interview discussion forums, extract structured intelligence with Claude AI, and expose the data as an MCP (Model Context Protocol) server for Claude Desktop and Claude Code.

What it does

Crawls LeetCode discussion forums filtered by company tag (google, meta, amazon, etc.)
Classifies posts by type (leetcode, discuss_link, discussion)
Groups and deduplicates discussions by post title + company across pages
Optionally enriches posts with AI-extracted fields: problem_name, leetcode_url, interview_stage, difficulty, reported_outcome
Saves to CSV (per-company cache or monthly output)
Exposes 4 tools over stdio MCP transport for use with Claude Desktop / Claude Code

Installation

git clone https://github.com/louisfghbvc/mcp-leetcode-crawler.git
cd mcp-leetcode-crawler
pip install -r requirements.txt

Usage

CLI

# Crawl 10 pages of Google interview discussions
python cli.py --company google --pages 10

# Filter to posts from the last 30 days
python cli.py --company google --pages 10 --since 30d

# Crawl and enrich with AI extraction (requires ANTHROPIC_API_KEY)
python cli.py --company google --pages 10 --enrich

# Other companies
python cli.py --company meta --pages 5 --since 7d

Available CLI options:

--company: Company tag (default: google)
--pages: Pages to crawl, 15 posts/page (default: 10)
--output: Output CSV path (default: leetcode_interview_questions.csv)
--output-dir: Directory for monthly CSVs (default: output)
--since: Time window filter — Nd (days) or Nw (weeks), e.g. --since 14d
--enrich: AI-extract structured fields from each post (requires ANTHROPIC_API_KEY)
--verbose: Debug logging
--category: Discussion category (default: interview-question)

MCP Server

Register with Claude Desktop or Claude Code:

# Run directly
python mcp_server.py

# Or register with Claude Desktop
mcp install mcp_server.py

The server exposes 4 tools:

Tool	Description
`search_discussions(company, days)`	Filter cached posts by company and time window
`get_hot_problems(company, limit)`	Top LeetCode problems ranked by discussion frequency
`get_thread(post_id)`	Real-time single-thread fetch (no cache)
`refresh(company, num_pages, days, enrich)`	Crawl and rebuild on-disk cache

Cache lives in ./cache/{company}_questions.csv. Run refresh once to populate before using search_discussions or get_hot_problems.

AI Enrichment

When --enrich (CLI) or enrich=True (MCP refresh) is used, each post gets 5 new fields:

Field	Values
`problem_name`	LeetCode problem name, or `null`
`leetcode_url`	Full problem URL, or `null`
`interview_stage`	`phone screen` / `online assessment` / `onsite` / `final round` / `null`
`difficulty`	`easy` / `medium` / `hard` / `null`
`reported_outcome`	`passed` / `failed` / `no mention` / `null` (null = extraction failed, will retry)

Cost: ~$0.001/post with claude-haiku-4-5-20251001. Already-enriched posts are skipped on re-run.

Project Structure

mcp-leetcode-crawler/
├── crawler.py          # Core crawler: fetch, deduplicate, filter, classify posts
├── extractor.py        # AI extraction layer: Claude Haiku → structured fields
├── mcp_server.py       # MCP server: 4 tools over stdio transport
├── cli.py              # Command-line interface
├── requirements.txt    # Python dependencies
├── test_crawler.py     # Crawler unit tests
├── test_mcp_server.py  # MCP server unit tests (22 tests)
├── test_extractor.py   # Extractor unit tests (12 tests)
├── test_cli.py         # CLI integration tests (2 tests)
└── cache/              # Per-company CSV cache (git-ignored)

Requirements

Python 3.12+
ANTHROPIC_API_KEY env var (only for --enrich)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.agent/skills		.agent/skills
.claude		.claude
.cursorindexingignore		.cursorindexingignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
TODOS.md		TODOS.md
VERSION		VERSION
cli.py		cli.py
crawler.py		crawler.py
extractor.py		extractor.py
google_sheets_exporter.py		google_sheets_exporter.py
mcp_server.py		mcp_server.py
requirements.txt		requirements.txt
run.sh		run.sh
run_with_sheets.sh		run_with_sheets.sh
test_cli.py		test_cli.py
test_crawler.py		test_crawler.py
test_extractor.py		test_extractor.py
test_mcp_server.py		test_mcp_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeetCode Discussion Crawler + MCP Server

What it does

Installation

Usage

CLI

MCP Server

AI Enrichment

Project Structure

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LeetCode Discussion Crawler + MCP Server

What it does

Installation

Usage

CLI

MCP Server

AI Enrichment

Project Structure

Requirements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages