"RepoScout doesn't guess, It knows!"
An agentic RAG system that queries 85K+ Python packages, 2M+ dependency signals, and 390K+ download data points to deliver data-driven open source intelligence — not opinions, not guesses, but real metrics from real data.
RepoScout is not a chatbot with package opinions. It's an autonomous research agent backed by a custom PostgreSQL data layer (4 tables, 600K+ rows) that converts natural language queries into validated SQL, combines results with semantic vector search, augments with live API metadata, and synthesizes actionable recommendations grounded in real adoption metrics.
See RepoScout in action -> Watch Demo
| Question | ChatGPT | GitHub Search | RepoScout |
|---|---|---|---|
| "How many projects use FastAPI?" | Guesses from training data | Can't answer | Exact count from indexed data |
| "Is library X actively maintained?" | Outdated info | Manual checking | Computed health score (0-100) |
| "What's growing fastest in Python AI?" | Generic list | Can't analyze trends | YoY growth from real dependency data |
| "Show me download trends" | No data | No data | 6 months of daily download stats |
| "Compare implementation patterns" | General knowledge | Keyword file matches | AI-powered source code analysis |
| "Packages with 50K+ stars but <100 deps" | Can't query structured data | Can't filter | NL-to-SQL against a real database |
RepoScout implements a 5-model agentic RAG pipeline where each model handles a specialized stage:
User Query
|
v
+----------------------------------------------------------------+
| GUARDRAILS LAYER |
| Content moderation (Mistral) + intent classification |
| Blocks harmful/off-topic queries before they reach the LLM |
+-----------------------------+----------------------------------+
| safe + classified
v
+----------------------------------------------------------------+
| AGENTIC ORCHESTRATOR |
| |
| LLM-driven function calling — autonomously decides which |
| tools to call and in what order. Iterates until it has |
| enough data to synthesize a grounded answer. |
| |
| +-------------+ +--------------+ +-------------------+ |
| | Semantic | | Package | | NL-to-SQL | |
| | Search | | Intel | | Engine | |
| | (Vector DB) | | (DB + PyPI) | | (validated SQL) | |
| +-------------+ +--------------+ +-------------------+ |
| +-------------+ +--------------+ +-------------------+ |
| | Code | | Compare | | Dependency | |
| | Analysis | | Packages | | Lookup | |
| | (GitHub + | | (multi-pkg) | | (reverse deps) | |
| | Devstral) | | | | | |
| +-------------+ +--------------+ +-------------------+ |
+-----------------------------+----------------------------------+
|
v
+----------------------------------------------------------------+
| SYNTHESIS + STREAMING |
| |
| Data-grounded analysis with confidence labels |
| Streamed to frontend via SSE (Server-Sent Events) |
| Post-response verification ensures no hallucinated stats |
+-----------------------------+----------------------------------+
|
v
+----------------------------------------------------------------+
| DATA LAYER |
| |
| PostgreSQL ···· 4 tables, 600K+ rows (packages, metadata, |
| downloads, AI-enriched profiles) |
| NL-to-SQL target for analytical queries |
| Vector DB ····· 85K semantic search embeddings |
| Live APIs ····· PyPI + GitHub (fresh data per request) |
+----------------------------------------------------------------+
|
v
+----------------------------------------------------------------+
| PRESENTATION |
| |
| FastAPI REST API → Next.js + shadcn/ui frontend |
| SSE streaming: progress → metadata cards → AI analysis |
| PDF export: client-ready benchmarking reports |
+----------------------------------------------------------------+
Unlike simple RAG (retrieve → generate), RepoScout's orchestrator autonomously decides its research strategy:
- Multi-step reasoning — searches for packages, fetches detailed stats for the most promising ones, then compares them, all without human intervention
- Dynamic tool selection — the LLM chooses which of 6 tools to call based on intermediate results
- NL-to-SQL — analytical queries are converted to validated, safe SQL executed against a real database
- Iterative refinement — up to 8 iterations of tool calling before final synthesis
- Concurrent execution — independent tool calls run in parallel via
asyncio.gather - Self-correcting — if a SQL query fails, the LLM fixes the syntax and retries automatically
RepoScout uses a cost-optimized hybrid pipeline combining Mistral and OpenAI models, each chosen for cost-performance tradeoff at that stage:
| Stage | Model | Role |
|---|---|---|
| Moderation | Mistral Moderation | Safety filter — blocks harmful and off-topic queries |
| Classification | Ministral 8B | Fast intent routing (explore, compare, analytical, reject) |
| Orchestration | GPT-4o-mini | Agentic function calling with 6 tools, up to 8 iterations |
| Code Analysis | Devstral | Fetches GitHub source and extracts implementation patterns |
| Synthesis | GPT-4o-mini | Data-grounded analysis, streamed via SSE |
| Embeddings | OpenAI text-embedding-3-small | 85K+ package vectors for semantic search |
| Enrichment | GPT-4o-mini (Batch API) | Structured intelligence profiles for 27K packages |
For analytical queries, RepoScout converts natural language directly to validated PostgreSQL queries:
User: "packages with over 50K stars but less than 100 dependents"
|
v
LLM generates validated SQL
|
v
+--------------------+
| Schema Validation | Pre-execution safety checks
| + Auto-correction | Fixes common LLM mistakes
+--------------------+
|
v
+--------------------+
| PostgreSQL Engine | Execute read-only query
+--------------------+
|
v
+--------------------+
| Error? LLM fixes | Self-correcting retry loop
| and retries |
+--------------------+
Safety guardrails: Only SELECT allowed (INSERT/UPDATE/DELETE/DROP blocked), schema validation against known tables and columns, auto-correction of common LLM mistakes, query timeout, row cap.
The entire structured data layer runs on PostgreSQL (Supabase) — a custom schema designed for both LLM-generated SQL queries and direct API lookups. All tables are joined via foreign keys with normalized package names, enabling cross-table JOINs in the NL-to-SQL pipeline.
+-------------------------------+
| packages (85K rows) |
| Source: deps.dev (BigQuery) |
| |
| Central fact table: |
| Package identity, GitHub |
| stats, dependency counts, |
| YoY growth metrics |
+---------------+---------------+
| FK: package_id
+-----------+-----------+-------------------+
| | |
v v v
+----------------+ +----------------+ +-------------------+
| pypi_metadata | | download_stats | | enriched_profiles |
| (85K rows) | | (390K rows) | | (27K rows) |
| | | | | |
| Summary, keys, | | Daily download | | AI-generated |
| classifiers, | | counts over | | structured intel: |
| versions, | | 6 months from | | use cases, tags, |
| dependencies, | | pypistats.org | | alternatives, |
| release dates | | | | maturity signals |
+----------------+ +----------------+ +-------------------+
The LLM generates SQL against this schema — filtering, aggregation, ranking, and cross-table JOINs across all 4 tables. Schema validation ensures only known tables and columns are queried, with auto-correction for common LLM mistakes.
+-----------------------------------------------------+
| Vector Index (Qdrant Cloud) |
| 85K semantic search vectors |
| |
| Cosine similarity search over package embeddings |
| Enriched payload for filtering + re-ranking |
+-----------------------------------------------------+
Hybrid retrieval: Three layers working together — semantic search (Qdrant) for conceptual queries, structured PostgreSQL queries for keyword + growth-based ranking, and NL-to-SQL for analytical queries with specific filters. Results are blended with growth-aware re-ranking to surface genuinely trending packages.
"How do Python projects handle rate limiting?"
Semantic search across 85K+ packages → adoption metrics retrieval → source code analysis from GitHub → data-grounded synthesis with citations.
"FastAPI vs Django vs Flask"
Side-by-side comparison with dependents count, YoY growth, maintainer activity, version frequency, stars, health scores, and code snippets.
"Top 10 packages by growth with at least 500 dependents"
Natural language converted to SQL — filtering, aggregation, ranking, and cross-table JOINs. Results grounded in real database queries, not LLM training data.
"What are the fastest growing AI libraries in Python?"
Growth-aware retrieval ranked by real adoption velocity — not just stars or hype. Filters for packages with genuine traction before ranking.
Daily download charts normalized to percentage growth — packages of different sizes become visually comparable on the same axis.
Every package scored 0-100 across four weighted dimensions:
- Adoption — real-world usage from 2M+ dependency signals
- Maintenance — recency of releases and active development
- Community — GitHub engagement signals
- Maturity — release history as a stability proxy
Score bands: Healthy (80-100) | Moderate (60-79) | Caution (0-59)
One-click client-ready reports with:
- Comparison table with all package metrics
- Key metrics summary (most adopted, healthiest, fastest growing)
- AI-generated analysis and recommendations
- Data confidence badges
- Professional branding and formatting
SSE-based streaming architecture:
- Phase 1: Progress events as tools execute in real-time
- Phase 2: Metadata (package cards, stats) sent immediately — UI renders before analysis
- Phase 3: Analysis text streams token-by-token for typewriter effect
- Phase 4: Download chart data fetched in background, sent when ready
Built-in observability and evaluation dashboard at /dashboard:
- Overview — KPI cards (total queries, avg latency, cache hit rate, satisfaction), latency percentile trends (p50/p95/p99), pipeline stage breakdown
- Traces — Full waterfall breakdown for every query: per-stage timing, individual tool call durations, iteration count
- Feedback — Thumbs up/down satisfaction tracking with daily trends
50-query golden test set measuring retrieval quality (Recall@K, MRR), response quality (LLM-as-judge on 5 criteria), classification accuracy, and safety guardrails.
| Metric | Score |
|---|---|
| Classification Accuracy | 100% |
| Retrieval Recall@K | 58.7% |
| Must-Mention Pass Rate | 82.5% |
| Judge: Relevance | 4.90/5 |
| Judge: Accuracy | 4.92/5 |
| Judge: Hallucination | 4.97/5 |
| Avg Latency (explore) | ~18s |
| Avg Latency (compare) | ~12s |
| Avg Latency (reject) | ~1.5s |
Integrated Braintrust for versioned experiment tracking and automated regression detection across pipeline changes:
- Versioned Datasets — 50 golden queries stored as a Braintrust Dataset with expected outputs, enabling reproducible evaluation
- 9 automated scorers — 6 custom retrieval metrics (Recall@K, Precision@K, MRR, classification accuracy, must-mention, min-packages) + 3 LLM-as-judge scorers (Factuality, AnswerRelevancy, PackageRecommendationQuality)
- Experiment comparison — automatic diffing between runs surfaces per-query regressions after code changes
- Baseline import — existing offline eval results imported as baseline for comparison against live experiment runs
# Run evaluation with experiment tracking
python -m scripts.braintrust_eval run --experiment "after-reranking-fix"
# Import existing eval results as baseline
python -m scripts.braintrust_eval import-baseline eval_report.jsonTwo-tier caching for instant repeat queries:
- L1: In-memory LRU — fuzzy word matching, instant replay
- L2: PostgreSQL — SHA-256 hash lookup, 24-hour TTL, persists across deploys
"Make me a phishing site" → Rejected
Multi-layer guardrail system:
- Content Moderation — safety filter blocking harmful, toxic, or unsafe content
- Intent Classification — routes queries into valid modes or rejects off-topic
- Data Confidence Labels — every response tagged as
direct,partial, orinsufficient - Post-Response Verification — checks all cited package names and statistics against database
- SQL Safety — write operations blocked, schema validation, timeout, row cap
- Safe Fallbacks — "Insufficient data" responses when packages aren't in the database, never hallucinated stats
| Dataset | Scale | Source |
|---|---|---|
| Packages indexed | 85K+ | deps.dev (BigQuery) |
| PyPI metadata | 85K+ | PyPI JSON API |
| AI-enriched profiles | 27K+ | GPT-4o-mini (Batch API) |
| Download data points | 390K+ | pypistats.org |
| Dependency relationships | 584K+ | PyPI metadata |
| Aggregate dependent signals | 2.1M+ | deps.dev |
| Semantic search vectors | 85K+ | OpenAI embeddings |
Production REST API powering the Next.js frontend via SSE streaming and JSON endpoints:
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/search/stream |
Main query — SSE stream with real-time progress, metadata, and token-by-token analysis |
POST |
/api/search |
Synchronous query — full agent pipeline, returns JSON |
GET |
/api/package/{name} |
Package detail — stats, health score, code snippet |
GET |
/api/compare |
Side-by-side multi-package comparison |
GET |
/api/health/{name} |
Health check with risk flags |
GET |
/api/downloads |
Monthly download trends for charting |
POST |
/api/report/pdf |
PDF comparison report with AI analysis |
GET |
/api/stats |
Dataset statistics |
GET |
/api/search/quick |
Fast semantic search (no agent, low latency) |
GET |
/api/dependents/{name} |
Reverse dependency lookup |
POST |
/api/feedback |
Submit thumbs up/down rating |
GET |
/api/dashboard/overview |
KPI summary (latency, cache rate, errors) |
GET |
/api/dashboard/traces |
Query traces with stage timings |
GET |
/api/dashboard/stages |
Per-stage timing breakdown |
GET |
/api/dashboard/feedback |
Feedback summary and trends |
reposcout/
│
├── backend/
│ ├── api/ # FastAPI REST endpoints + SSE streaming
│ ├── agents/ # Multi-model agentic pipeline (orchestrator, analysis, synthesis)
│ ├── services/ # Health scoring, vector search, live data enrichment
│ └── models/ # Pydantic request/response schemas
│
├── frontend/
│ ├── pages/ # Next.js search interface
│ ├── components/ # Stats, comparisons, charts, health rings, AI analysis
│ └── lib/ # API client (SSE + REST)
│
├── data-pipeline/
│ ├── ingestion/ # PyPI, GitHub, deps.dev data collection
│ └── embeddings/ # Vector embedding generation + indexing
│
└── docs/ # Architecture diagrams + screenshots
| Module | What It Does |
|---|---|
| Agentic Orchestrator | Multi-model pipeline with autonomous tool calling and iterative reasoning |
| NL-to-SQL Engine | Converts natural language to validated PostgreSQL queries with safety guardrails |
| Semantic Search | Vector similarity search across 85K+ package embeddings with hybrid re-ranking |
| Package Intelligence | Stats aggregation, health scoring, and live metadata enrichment |
| Code Analysis | GitHub source fetching + AI-powered pattern extraction |
| Data Pipeline | ETL from PyPI, deps.dev, pypistats.org into PostgreSQL + Qdrant |
| Streaming Layer | SSE-based real-time progress, metadata, and token-by-token analysis |
| PDF Reports | Client-ready benchmarking output with comparison tables and AI analysis |
| Layer | Technology |
|---|---|
| Frontend | Next.js, shadcn/ui, Recharts, Tailwind CSS |
| Backend | Python, FastAPI, SSE streaming |
| Database | PostgreSQL (Supabase) — 4 tables, 600K+ rows, NL-to-SQL target |
| Vector Search | Qdrant Cloud |
| AI Models | Mistral (moderation, classification, code analysis) + OpenAI (orchestration, synthesis, embeddings) |
| PDF Generation | fpdf2 |
| Deployment | Render (backend) + Vercel (frontend) |
Try this query: "What are the fastest growing AI libraries in Python?"
RepoScout surfaces real-time shifts in the AI ecosystem:
- openai-agents grew 22,000% this year — it barely existed 12 months ago
- google-genai up 2,400%, pydantic-ai up 691%, anthropic nearly tripled
- openai still leads adoption with 9,000+ dependents and 260M+ monthly downloads
Data you won't find on any blog post or newsletter. Just Ask RepoScout!
- Expand dataset coverage — Integrate historical dependency graph data (Libraries.io) to add version-level dependency resolution, historical maintainer activity, and repository-level metadata for 4M+ projects. Would significantly deepen the intelligence layer beyond the current 85K-package snapshot.
- Automated data refresh pipeline — Scheduled job (cron / GitHub Actions) that periodically re-fetches PyPI metadata, download statistics, and dependency counts so RepoScout always reflects the latest state of the Python ecosystem — not a point-in-time snapshot.
Evaluation and benchmarks— DONE. 50-query golden test set with Recall@K, MRR, LLM-as-judge (5 criteria), and full observability dashboard with per-stage tracing. See Evaluation Dashboard and Offline Evaluation Framework.
RepoScout stands on the shoulders of incredible open-source projects and platforms. Grateful to the teams behind each of these:
| Technology | What it does | How RepoScout uses it |
|---|---|---|
| OpenAI | AI research lab | GPT-4o-mini powers orchestration, synthesis, and batch enrichment; text-embedding-3-small generates 85K vectors |
| Mistral AI | French AI lab building open-weight and commercial LLMs | 3 models: moderation (free), classification (Ministral 8B), and code analysis (Devstral) |
| Supabase | Open-source Firebase alternative (hosted PostgreSQL) | Production database — 85K packages, metadata, 390K download stats, 27K enriched profiles |
| Qdrant | Open-source vector database for similarity search | Hosts 85K package embeddings on Qdrant Cloud for semantic package discovery |
| DuckDB | In-process analytical database (like SQLite for analytics) | Local data store with full READMEs for code snippet extraction and offline scripts |
| Google deps.dev | Open source dependency intelligence by Google | Primary data source for package dependency graphs and adoption metrics via BigQuery |
| PyPI | The Python Package Index — official package repository | READMEs, keywords, classifiers, version history, and dependency lists for 85K packages |
| pypistats.org | Community-run PyPI download statistics API | 6 months of daily download data powering trend charts |
| Next.js | React framework for production web apps | Frontend with SSE streaming, real-time UI updates |
| FastAPI | Modern Python web framework for building APIs | Backend serving SSE streams, REST endpoints, and orchestrating the agent pipeline |
(C) 2026 Charusmita Dhiman. All Rights Reserved.






