RepoScout — AI-Powered Open Source Intelligence Engine

"RepoScout doesn't guess, It knows!"

An agentic RAG system that queries 85K+ Python packages, 2M+ dependency signals, and 390K+ download data points to deliver data-driven open source intelligence — not opinions, not guesses, but real metrics from real data.

RepoScout is not a chatbot with package opinions. It's an autonomous research agent backed by a custom PostgreSQL data layer (4 tables, 600K+ rows) that converts natural language queries into validated SQL, combines results with semantic vector search, augments with live API metadata, and synthesizes actionable recommendations grounded in real adoption metrics.

Demo

See RepoScout in action -> Watch Demo

The Problem

Question	ChatGPT	GitHub Search	RepoScout
"How many projects use FastAPI?"	Guesses from training data	Can't answer	Exact count from indexed data
"Is library X actively maintained?"	Outdated info	Manual checking	Computed health score (0-100)
"What's growing fastest in Python AI?"	Generic list	Can't analyze trends	YoY growth from real dependency data
"Show me download trends"	No data	No data	6 months of daily download stats
"Compare implementation patterns"	General knowledge	Keyword file matches	AI-powered source code analysis
"Packages with 50K+ stars but <100 deps"	Can't query structured data	Can't filter	NL-to-SQL against a real database

Architecture

RepoScout implements a 5-model agentic RAG pipeline where each model handles a specialized stage:

                         User Query
                             |
                             v
+----------------------------------------------------------------+
|                    GUARDRAILS LAYER                             |
|  Content moderation (Mistral) + intent classification          |
|  Blocks harmful/off-topic queries before they reach the LLM    |
+-----------------------------+----------------------------------+
                              |  safe + classified
                              v
+----------------------------------------------------------------+
|                   AGENTIC ORCHESTRATOR                          |
|                                                                |
|  LLM-driven function calling — autonomously decides which      |
|  tools to call and in what order. Iterates until it has         |
|  enough data to synthesize a grounded answer.                  |
|                                                                |
|  +-------------+  +--------------+  +-------------------+      |
|  |  Semantic   |  |   Package    |  |   NL-to-SQL       |      |
|  |  Search     |  |    Intel     |  |   Engine          |      |
|  | (Vector DB) |  | (DB + PyPI)  |  | (validated SQL)   |      |
|  +-------------+  +--------------+  +-------------------+      |
|  +-------------+  +--------------+  +-------------------+      |
|  |    Code     |  |   Compare    |  |   Dependency      |      |
|  |  Analysis   |  |   Packages   |  |    Lookup         |      |
|  |  (GitHub +  |  | (multi-pkg)  |  | (reverse deps)    |      |
|  |  Devstral)  |  |              |  |                   |      |
|  +-------------+  +--------------+  +-------------------+      |
+-----------------------------+----------------------------------+
                              |
                              v
+----------------------------------------------------------------+
|                    SYNTHESIS + STREAMING                        |
|                                                                |
|  Data-grounded analysis with confidence labels                 |
|  Streamed to frontend via SSE (Server-Sent Events)             |
|  Post-response verification ensures no hallucinated stats      |
+-----------------------------+----------------------------------+
                              |
                              v
+----------------------------------------------------------------+
|                      DATA LAYER                                |
|                                                                |
|  PostgreSQL ···· 4 tables, 600K+ rows (packages, metadata,     |
|                  downloads, AI-enriched profiles)               |
|                  NL-to-SQL target for analytical queries        |
|  Vector DB ····· 85K semantic search embeddings                |
|  Live APIs ····· PyPI + GitHub (fresh data per request)        |
+----------------------------------------------------------------+
                              |
                              v
+----------------------------------------------------------------+
|                     PRESENTATION                               |
|                                                                |
|  FastAPI REST API → Next.js + shadcn/ui frontend               |
|  SSE streaming: progress → metadata cards → AI analysis        |
|  PDF export: client-ready benchmarking reports                 |
+----------------------------------------------------------------+

What Makes It "Agentic"

Unlike simple RAG (retrieve → generate), RepoScout's orchestrator autonomously decides its research strategy:

Multi-step reasoning — searches for packages, fetches detailed stats for the most promising ones, then compares them, all without human intervention
Dynamic tool selection — the LLM chooses which of 6 tools to call based on intermediate results
NL-to-SQL — analytical queries are converted to validated, safe SQL executed against a real database
Iterative refinement — up to 8 iterations of tool calling before final synthesis
Concurrent execution — independent tool calls run in parallel via asyncio.gather
Self-correcting — if a SQL query fails, the LLM fixes the syntax and retries automatically

Multi-Model Pipeline

RepoScout uses a cost-optimized hybrid pipeline combining Mistral and OpenAI models, each chosen for cost-performance tradeoff at that stage:

Stage	Model	Role
Moderation	Mistral Moderation	Safety filter — blocks harmful and off-topic queries
Classification	Ministral 8B	Fast intent routing (explore, compare, analytical, reject)
Orchestration	GPT-4o-mini	Agentic function calling with 6 tools, up to 8 iterations
Code Analysis	Devstral	Fetches GitHub source and extracts implementation patterns
Synthesis	GPT-4o-mini	Data-grounded analysis, streamed via SSE
Embeddings	OpenAI text-embedding-3-small	85K+ package vectors for semantic search
Enrichment	GPT-4o-mini (Batch API)	Structured intelligence profiles for 27K packages

NL-to-SQL Pipeline

For analytical queries, RepoScout converts natural language directly to validated PostgreSQL queries:

User: "packages with over 50K stars but less than 100 dependents"
                    |
                    v
       LLM generates validated SQL
                    |
                    v
         +--------------------+
         | Schema Validation  |  Pre-execution safety checks
         | + Auto-correction  |  Fixes common LLM mistakes
         +--------------------+
                    |
                    v
         +--------------------+
         | PostgreSQL Engine  |  Execute read-only query
         +--------------------+
                    |
                    v
         +--------------------+
         | Error? LLM fixes   |  Self-correcting retry loop
         | and retries         |
         +--------------------+

Safety guardrails: Only SELECT allowed (INSERT/UPDATE/DELETE/DROP blocked), schema validation against known tables and columns, auto-correction of common LLM mistakes, query timeout, row cap.

PostgreSQL Data Layer

The entire structured data layer runs on PostgreSQL (Supabase) — a custom schema designed for both LLM-generated SQL queries and direct API lookups. All tables are joined via foreign keys with normalized package names, enabling cross-table JOINs in the NL-to-SQL pipeline.

+-------------------------------+
|     packages (85K rows)       |
|  Source: deps.dev (BigQuery)  |
|                               |
|  Central fact table:          |
|  Package identity, GitHub     |
|  stats, dependency counts,    |
|  YoY growth metrics           |
+---------------+---------------+
                |  FK: package_id
    +-----------+-----------+-------------------+
    |                       |                   |
    v                       v                   v
+----------------+  +----------------+  +-------------------+
| pypi_metadata  |  | download_stats |  | enriched_profiles |
| (85K rows)     |  | (390K rows)    |  | (27K rows)        |
|                |  |                |  |                   |
| Summary, keys, |  | Daily download |  | AI-generated      |
| classifiers,   |  | counts over    |  | structured intel: |
| versions,      |  | 6 months from  |  | use cases, tags,  |
| dependencies,  |  | pypistats.org  |  | alternatives,     |
| release dates  |  |                |  | maturity signals  |
+----------------+  +----------------+  +-------------------+

The LLM generates SQL against this schema — filtering, aggregation, ranking, and cross-table JOINs across all 4 tables. Schema validation ensures only known tables and columns are queried, with auto-correction for common LLM mistakes.

+-----------------------------------------------------+
|          Vector Index (Qdrant Cloud)                 |
|          85K semantic search vectors                 |
|                                                     |
|  Cosine similarity search over package embeddings   |
|  Enriched payload for filtering + re-ranking        |
+-----------------------------------------------------+

Hybrid retrieval: Three layers working together — semantic search (Qdrant) for conceptual queries, structured PostgreSQL queries for keyword + growth-based ranking, and NL-to-SQL for analytical queries with specific filters. Results are blended with growth-aware re-ranking to surface genuinely trending packages.

Core Features

Explore: "How does the world solve X?"

"How do Python projects handle rate limiting?"

Semantic search across 85K+ packages → adoption metrics retrieval → source code analysis from GitHub → data-grounded synthesis with citations.

Compare: "Should I use X or Y?"

"FastAPI vs Django vs Flask"

Side-by-side comparison with dependents count, YoY growth, maintainer activity, version frequency, stars, health scores, and code snippets.

Analytical: "Show me specific data slices"

"Top 10 packages by growth with at least 500 dependents"

Natural language converted to SQL — filtering, aggregation, ranking, and cross-table JOINs. Results grounded in real database queries, not LLM training data.

Trending: "What's growing fastest?"

"What are the fastest growing AI libraries in Python?"

Growth-aware retrieval ranked by real adoption velocity — not just stars or hype. Filters for packages with genuine traction before ranking.

Download Trends

Daily download charts normalized to percentage growth — packages of different sizes become visually comparable on the same axis.

Health Score

Every package scored 0-100 across four weighted dimensions:

Adoption — real-world usage from 2M+ dependency signals
Maintenance — recency of releases and active development
Community — GitHub engagement signals
Maturity — release history as a stability proxy

Score bands: Healthy (80-100) | Moderate (60-79) | Caution (0-59)

PDF Export

One-click client-ready reports with:

Comparison table with all package metrics
Key metrics summary (most adopted, healthiest, fastest growing)
AI-generated analysis and recommendations
Data confidence badges
Professional branding and formatting

Real-Time Streaming

SSE-based streaming architecture:

Phase 1: Progress events as tools execute in real-time
Phase 2: Metadata (package cards, stats) sent immediately — UI renders before analysis
Phase 3: Analysis text streams token-by-token for typewriter effect
Phase 4: Download chart data fetched in background, sent when ready

Evaluation Dashboard

Built-in observability and evaluation dashboard at /dashboard:

Overview — KPI cards (total queries, avg latency, cache hit rate, satisfaction), latency percentile trends (p50/p95/p99), pipeline stage breakdown
Traces — Full waterfall breakdown for every query: per-stage timing, individual tool call durations, iteration count
Feedback — Thumbs up/down satisfaction tracking with daily trends

Offline Evaluation Framework

50-query golden test set measuring retrieval quality (Recall@K, MRR), response quality (LLM-as-judge on 5 criteria), classification accuracy, and safety guardrails.

Metric	Score
Classification Accuracy	100%
Retrieval Recall@K	58.7%
Must-Mention Pass Rate	82.5%
Judge: Relevance	4.90/5
Judge: Accuracy	4.92/5
Judge: Hallucination	4.97/5
Avg Latency (explore)	~18s
Avg Latency (compare)	~12s
Avg Latency (reject)	~1.5s

Braintrust Experiment Tracking

Integrated Braintrust for versioned experiment tracking and automated regression detection across pipeline changes:

Versioned Datasets — 50 golden queries stored as a Braintrust Dataset with expected outputs, enabling reproducible evaluation
9 automated scorers — 6 custom retrieval metrics (Recall@K, Precision@K, MRR, classification accuracy, must-mention, min-packages) + 3 LLM-as-judge scorers (Factuality, AnswerRelevancy, PackageRecommendationQuality)
Experiment comparison — automatic diffing between runs surfaces per-query regressions after code changes
Baseline import — existing offline eval results imported as baseline for comparison against live experiment runs

# Run evaluation with experiment tracking
python -m scripts.braintrust_eval run --experiment "after-reranking-fix"

# Import existing eval results as baseline
python -m scripts.braintrust_eval import-baseline eval_report.json

Persistent Caching (L1 + L2)

Two-tier caching for instant repeat queries:

L1: In-memory LRU — fuzzy word matching, instant replay
L2: PostgreSQL — SHA-256 hash lookup, 24-hour TTL, persists across deploys

Guardrails & AI Safety

"Make me a phishing site" → Rejected

Multi-layer guardrail system:

Content Moderation — safety filter blocking harmful, toxic, or unsafe content
Intent Classification — routes queries into valid modes or rejects off-topic
Data Confidence Labels — every response tagged as direct, partial, or insufficient
Post-Response Verification — checks all cited package names and statistics against database
SQL Safety — write operations blocked, schema validation, timeout, row cap
Safe Fallbacks — "Insufficient data" responses when packages aren't in the database, never hallucinated stats

Data Scale

Dataset	Scale	Source
Packages indexed	85K+	deps.dev (BigQuery)
PyPI metadata	85K+	PyPI JSON API
AI-enriched profiles	27K+	GPT-4o-mini (Batch API)
Download data points	390K+	pypistats.org
Dependency relationships	584K+	PyPI metadata
Aggregate dependent signals	2.1M+	deps.dev
Semantic search vectors	85K+	OpenAI embeddings

FastAPI REST API

Production REST API powering the Next.js frontend via SSE streaming and JSON endpoints:

Method	Endpoint	Description
`POST`	`/api/search/stream`	Main query — SSE stream with real-time progress, metadata, and token-by-token analysis
`POST`	`/api/search`	Synchronous query — full agent pipeline, returns JSON
`GET`	`/api/package/{name}`	Package detail — stats, health score, code snippet
`GET`	`/api/compare`	Side-by-side multi-package comparison
`GET`	`/api/health/{name}`	Health check with risk flags
`GET`	`/api/downloads`	Monthly download trends for charting
`POST`	`/api/report/pdf`	PDF comparison report with AI analysis
`GET`	`/api/stats`	Dataset statistics
`GET`	`/api/search/quick`	Fast semantic search (no agent, low latency)
`GET`	`/api/dependents/{name}`	Reverse dependency lookup
`POST`	`/api/feedback`	Submit thumbs up/down rating
`GET`	`/api/dashboard/overview`	KPI summary (latency, cache rate, errors)
`GET`	`/api/dashboard/traces`	Query traces with stage timings
`GET`	`/api/dashboard/stages`	Per-stage timing breakdown
`GET`	`/api/dashboard/feedback`	Feedback summary and trends

System Components

reposcout/
│
├── backend/
│   ├── api/              # FastAPI REST endpoints + SSE streaming
│   ├── agents/           # Multi-model agentic pipeline (orchestrator, analysis, synthesis)
│   ├── services/         # Health scoring, vector search, live data enrichment
│   └── models/           # Pydantic request/response schemas
│
├── frontend/
│   ├── pages/            # Next.js search interface
│   ├── components/       # Stats, comparisons, charts, health rings, AI analysis
│   └── lib/              # API client (SSE + REST)
│
├── data-pipeline/
│   ├── ingestion/        # PyPI, GitHub, deps.dev data collection
│   └── embeddings/       # Vector embedding generation + indexing
│
└── docs/                 # Architecture diagrams + screenshots

Module	What It Does
Agentic Orchestrator	Multi-model pipeline with autonomous tool calling and iterative reasoning
NL-to-SQL Engine	Converts natural language to validated PostgreSQL queries with safety guardrails
Semantic Search	Vector similarity search across 85K+ package embeddings with hybrid re-ranking
Package Intelligence	Stats aggregation, health scoring, and live metadata enrichment
Code Analysis	GitHub source fetching + AI-powered pattern extraction
Data Pipeline	ETL from PyPI, deps.dev, pypistats.org into PostgreSQL + Qdrant
Streaming Layer	SSE-based real-time progress, metadata, and token-by-token analysis
PDF Reports	Client-ready benchmarking output with comparison tables and AI analysis

Tech Stack

Layer	Technology
Frontend	Next.js, shadcn/ui, Recharts, Tailwind CSS
Backend	Python, FastAPI, SSE streaming
Database	PostgreSQL (Supabase) — 4 tables, 600K+ rows, NL-to-SQL target
Vector Search	Qdrant Cloud
AI Models	Mistral (moderation, classification, code analysis) + OpenAI (orchestration, synthesis, embeddings)
PDF Generation	fpdf2
Deployment	Render (backend) + Vercel (frontend)

Stay Ahead in the AI Game

Try this query: "What are the fastest growing AI libraries in Python?"

RepoScout surfaces real-time shifts in the AI ecosystem:

openai-agents grew 22,000% this year — it barely existed 12 months ago
google-genai up 2,400%, pydantic-ai up 691%, anthropic nearly tripled
openai still leads adoption with 9,000+ dependents and 260M+ monthly downloads

Data you won't find on any blog post or newsletter. Just Ask RepoScout!

Screenshots

Search Interface & Suggestion Cards

Stats Banner & Comparison Table

Charts — Adoption & Download Trends

AI-Powered Analysis

Guardrails in Action

Future Roadmap

Expand dataset coverage — Integrate historical dependency graph data (Libraries.io) to add version-level dependency resolution, historical maintainer activity, and repository-level metadata for 4M+ projects. Would significantly deepen the intelligence layer beyond the current 85K-package snapshot.
Automated data refresh pipeline — Scheduled job (cron / GitHub Actions) that periodically re-fetches PyPI metadata, download statistics, and dependency counts so RepoScout always reflects the latest state of the Python ecosystem — not a point-in-time snapshot.
~~Evaluation and benchmarks~~ — DONE. 50-query golden test set with Recall@K, MRR, LLM-as-judge (5 criteria), and full observability dashboard with per-stage tracing. See Evaluation Dashboard and Offline Evaluation Framework.

Built With & Acknowledgments

RepoScout stands on the shoulders of incredible open-source projects and platforms. Grateful to the teams behind each of these:

Technology	What it does	How RepoScout uses it
OpenAI	AI research lab	GPT-4o-mini powers orchestration, synthesis, and batch enrichment; text-embedding-3-small generates 85K vectors
Mistral AI	French AI lab building open-weight and commercial LLMs	3 models: moderation (free), classification (Ministral 8B), and code analysis (Devstral)
Supabase	Open-source Firebase alternative (hosted PostgreSQL)	Production database — 85K packages, metadata, 390K download stats, 27K enriched profiles
Qdrant	Open-source vector database for similarity search	Hosts 85K package embeddings on Qdrant Cloud for semantic package discovery
DuckDB	In-process analytical database (like SQLite for analytics)	Local data store with full READMEs for code snippet extraction and offline scripts
Google deps.dev	Open source dependency intelligence by Google	Primary data source for package dependency graphs and adoption metrics via BigQuery
PyPI	The Python Package Index — official package repository	READMEs, keywords, classifiers, version history, and dependency lists for 85K packages
pypistats.org	Community-run PyPI download statistics API	6 months of daily download data powering trend charts
Next.js	React framework for production web apps	Frontend with SSE streaming, real-time UI updates
FastAPI	Modern Python web framework for building APIs	Backend serving SSE streams, REST endpoints, and orchestrating the agent pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
backend		backend
docs		docs
frontend		frontend
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

RepoScout — AI-Powered Open Source Intelligence Engine

Demo

The Problem

Architecture

What Makes It "Agentic"

Multi-Model Pipeline

NL-to-SQL Pipeline

PostgreSQL Data Layer

Core Features

Explore: "How does the world solve X?"

Compare: "Should I use X or Y?"

Analytical: "Show me specific data slices"

Trending: "What's growing fastest?"

Download Trends

Health Score

PDF Export

Real-Time Streaming

Evaluation Dashboard

Offline Evaluation Framework

Braintrust Experiment Tracking

Persistent Caching (L1 + L2)

Guardrails & AI Safety

Data Scale

FastAPI REST API

System Components

Tech Stack

Stay Ahead in the AI Game

Screenshots

Search Interface & Suggestion Cards

Stats Banner & Comparison Table

Charts — Adoption & Download Trends

AI-Powered Analysis

Guardrails in Action

Future Roadmap

Built With & Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages