Skip to content

charu01smita28/reposcout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RepoScout — AI-Powered Open Source Intelligence Engine

"RepoScout doesn't guess, It knows!"

Try it live → reposcout.app

An agentic RAG system that queries 85K+ Python packages, 2M+ dependency signals, and 390K+ download data points to deliver data-driven open source intelligence — not opinions, not guesses, but real metrics from real data.

RepoScout is not a chatbot with package opinions. It's an autonomous research agent backed by a custom PostgreSQL data layer (4 tables, 600K+ rows) that converts natural language queries into validated SQL, combines results with semantic vector search, augments with live API metadata, and synthesizes actionable recommendations grounded in real adoption metrics.


Demo

See RepoScout in action -> Watch Demo

Watch the demo


The Problem

Question ChatGPT GitHub Search RepoScout
"How many projects use FastAPI?" Guesses from training data Can't answer Exact count from indexed data
"Is library X actively maintained?" Outdated info Manual checking Computed health score (0-100)
"What's growing fastest in Python AI?" Generic list Can't analyze trends YoY growth from real dependency data
"Show me download trends" No data No data 6 months of daily download stats
"Compare implementation patterns" General knowledge Keyword file matches AI-powered source code analysis
"Packages with 50K+ stars but <100 deps" Can't query structured data Can't filter NL-to-SQL against a real database

Architecture

RepoScout implements a 5-model agentic RAG pipeline where each model handles a specialized stage:

                         User Query
                             |
                             v
+----------------------------------------------------------------+
|                    GUARDRAILS LAYER                             |
|  Content moderation (Mistral) + intent classification          |
|  Blocks harmful/off-topic queries before they reach the LLM    |
+-----------------------------+----------------------------------+
                              |  safe + classified
                              v
+----------------------------------------------------------------+
|                   AGENTIC ORCHESTRATOR                          |
|                                                                |
|  LLM-driven function calling — autonomously decides which      |
|  tools to call and in what order. Iterates until it has         |
|  enough data to synthesize a grounded answer.                  |
|                                                                |
|  +-------------+  +--------------+  +-------------------+      |
|  |  Semantic   |  |   Package    |  |   NL-to-SQL       |      |
|  |  Search     |  |    Intel     |  |   Engine          |      |
|  | (Vector DB) |  | (DB + PyPI)  |  | (validated SQL)   |      |
|  +-------------+  +--------------+  +-------------------+      |
|  +-------------+  +--------------+  +-------------------+      |
|  |    Code     |  |   Compare    |  |   Dependency      |      |
|  |  Analysis   |  |   Packages   |  |    Lookup         |      |
|  |  (GitHub +  |  | (multi-pkg)  |  | (reverse deps)    |      |
|  |  Devstral)  |  |              |  |                   |      |
|  +-------------+  +--------------+  +-------------------+      |
+-----------------------------+----------------------------------+
                              |
                              v
+----------------------------------------------------------------+
|                    SYNTHESIS + STREAMING                        |
|                                                                |
|  Data-grounded analysis with confidence labels                 |
|  Streamed to frontend via SSE (Server-Sent Events)             |
|  Post-response verification ensures no hallucinated stats      |
+-----------------------------+----------------------------------+
                              |
                              v
+----------------------------------------------------------------+
|                      DATA LAYER                                |
|                                                                |
|  PostgreSQL ···· 4 tables, 600K+ rows (packages, metadata,     |
|                  downloads, AI-enriched profiles)               |
|                  NL-to-SQL target for analytical queries        |
|  Vector DB ····· 85K semantic search embeddings                |
|  Live APIs ····· PyPI + GitHub (fresh data per request)        |
+----------------------------------------------------------------+
                              |
                              v
+----------------------------------------------------------------+
|                     PRESENTATION                               |
|                                                                |
|  FastAPI REST API → Next.js + shadcn/ui frontend               |
|  SSE streaming: progress → metadata cards → AI analysis        |
|  PDF export: client-ready benchmarking reports                 |
+----------------------------------------------------------------+

What Makes It "Agentic"

Unlike simple RAG (retrieve → generate), RepoScout's orchestrator autonomously decides its research strategy:

  • Multi-step reasoning — searches for packages, fetches detailed stats for the most promising ones, then compares them, all without human intervention
  • Dynamic tool selection — the LLM chooses which of 6 tools to call based on intermediate results
  • NL-to-SQL — analytical queries are converted to validated, safe SQL executed against a real database
  • Iterative refinement — up to 8 iterations of tool calling before final synthesis
  • Concurrent execution — independent tool calls run in parallel via asyncio.gather
  • Self-correcting — if a SQL query fails, the LLM fixes the syntax and retries automatically

Multi-Model Pipeline

RepoScout uses a cost-optimized hybrid pipeline combining Mistral and OpenAI models, each chosen for cost-performance tradeoff at that stage:

Stage Model Role
Moderation Mistral Moderation Safety filter — blocks harmful and off-topic queries
Classification Ministral 8B Fast intent routing (explore, compare, analytical, reject)
Orchestration GPT-4o-mini Agentic function calling with 6 tools, up to 8 iterations
Code Analysis Devstral Fetches GitHub source and extracts implementation patterns
Synthesis GPT-4o-mini Data-grounded analysis, streamed via SSE
Embeddings OpenAI text-embedding-3-small 85K+ package vectors for semantic search
Enrichment GPT-4o-mini (Batch API) Structured intelligence profiles for 27K packages

NL-to-SQL Pipeline

For analytical queries, RepoScout converts natural language directly to validated PostgreSQL queries:

User: "packages with over 50K stars but less than 100 dependents"
                    |
                    v
       LLM generates validated SQL
                    |
                    v
         +--------------------+
         | Schema Validation  |  Pre-execution safety checks
         | + Auto-correction  |  Fixes common LLM mistakes
         +--------------------+
                    |
                    v
         +--------------------+
         | PostgreSQL Engine  |  Execute read-only query
         +--------------------+
                    |
                    v
         +--------------------+
         | Error? LLM fixes   |  Self-correcting retry loop
         | and retries         |
         +--------------------+

Safety guardrails: Only SELECT allowed (INSERT/UPDATE/DELETE/DROP blocked), schema validation against known tables and columns, auto-correction of common LLM mistakes, query timeout, row cap.


PostgreSQL Data Layer

The entire structured data layer runs on PostgreSQL (Supabase) — a custom schema designed for both LLM-generated SQL queries and direct API lookups. All tables are joined via foreign keys with normalized package names, enabling cross-table JOINs in the NL-to-SQL pipeline.

+-------------------------------+
|     packages (85K rows)       |
|  Source: deps.dev (BigQuery)  |
|                               |
|  Central fact table:          |
|  Package identity, GitHub     |
|  stats, dependency counts,    |
|  YoY growth metrics           |
+---------------+---------------+
                |  FK: package_id
    +-----------+-----------+-------------------+
    |                       |                   |
    v                       v                   v
+----------------+  +----------------+  +-------------------+
| pypi_metadata  |  | download_stats |  | enriched_profiles |
| (85K rows)     |  | (390K rows)    |  | (27K rows)        |
|                |  |                |  |                   |
| Summary, keys, |  | Daily download |  | AI-generated      |
| classifiers,   |  | counts over    |  | structured intel: |
| versions,      |  | 6 months from  |  | use cases, tags,  |
| dependencies,  |  | pypistats.org  |  | alternatives,     |
| release dates  |  |                |  | maturity signals  |
+----------------+  +----------------+  +-------------------+

The LLM generates SQL against this schema — filtering, aggregation, ranking, and cross-table JOINs across all 4 tables. Schema validation ensures only known tables and columns are queried, with auto-correction for common LLM mistakes.

+-----------------------------------------------------+
|          Vector Index (Qdrant Cloud)                 |
|          85K semantic search vectors                 |
|                                                     |
|  Cosine similarity search over package embeddings   |
|  Enriched payload for filtering + re-ranking        |
+-----------------------------------------------------+

Hybrid retrieval: Three layers working together — semantic search (Qdrant) for conceptual queries, structured PostgreSQL queries for keyword + growth-based ranking, and NL-to-SQL for analytical queries with specific filters. Results are blended with growth-aware re-ranking to surface genuinely trending packages.


Core Features

Explore: "How does the world solve X?"

"How do Python projects handle rate limiting?"

Semantic search across 85K+ packages → adoption metrics retrieval → source code analysis from GitHub → data-grounded synthesis with citations.

Compare: "Should I use X or Y?"

"FastAPI vs Django vs Flask"

Side-by-side comparison with dependents count, YoY growth, maintainer activity, version frequency, stars, health scores, and code snippets.

Analytical: "Show me specific data slices"

"Top 10 packages by growth with at least 500 dependents"

Natural language converted to SQL — filtering, aggregation, ranking, and cross-table JOINs. Results grounded in real database queries, not LLM training data.

Trending: "What's growing fastest?"

"What are the fastest growing AI libraries in Python?"

Growth-aware retrieval ranked by real adoption velocity — not just stars or hype. Filters for packages with genuine traction before ranking.

Download Trends

Daily download charts normalized to percentage growth — packages of different sizes become visually comparable on the same axis.

Health Score

Every package scored 0-100 across four weighted dimensions:

  • Adoption — real-world usage from 2M+ dependency signals
  • Maintenance — recency of releases and active development
  • Community — GitHub engagement signals
  • Maturity — release history as a stability proxy

Score bands: Healthy (80-100) | Moderate (60-79) | Caution (0-59)

PDF Export

One-click client-ready reports with:

  • Comparison table with all package metrics
  • Key metrics summary (most adopted, healthiest, fastest growing)
  • AI-generated analysis and recommendations
  • Data confidence badges
  • Professional branding and formatting

Real-Time Streaming

SSE-based streaming architecture:

  • Phase 1: Progress events as tools execute in real-time
  • Phase 2: Metadata (package cards, stats) sent immediately — UI renders before analysis
  • Phase 3: Analysis text streams token-by-token for typewriter effect
  • Phase 4: Download chart data fetched in background, sent when ready

Evaluation Dashboard

Built-in observability and evaluation dashboard at /dashboard:

  • Overview — KPI cards (total queries, avg latency, cache hit rate, satisfaction), latency percentile trends (p50/p95/p99), pipeline stage breakdown
  • Traces — Full waterfall breakdown for every query: per-stage timing, individual tool call durations, iteration count
  • Feedback — Thumbs up/down satisfaction tracking with daily trends

Offline Evaluation Framework

50-query golden test set measuring retrieval quality (Recall@K, MRR), response quality (LLM-as-judge on 5 criteria), classification accuracy, and safety guardrails.

Metric Score
Classification Accuracy 100%
Retrieval Recall@K 58.7%
Must-Mention Pass Rate 82.5%
Judge: Relevance 4.90/5
Judge: Accuracy 4.92/5
Judge: Hallucination 4.97/5
Avg Latency (explore) ~18s
Avg Latency (compare) ~12s
Avg Latency (reject) ~1.5s

Dashboard Overview

Dashboard Traces

Braintrust Experiment Tracking

Integrated Braintrust for versioned experiment tracking and automated regression detection across pipeline changes:

  • Versioned Datasets — 50 golden queries stored as a Braintrust Dataset with expected outputs, enabling reproducible evaluation
  • 9 automated scorers — 6 custom retrieval metrics (Recall@K, Precision@K, MRR, classification accuracy, must-mention, min-packages) + 3 LLM-as-judge scorers (Factuality, AnswerRelevancy, PackageRecommendationQuality)
  • Experiment comparison — automatic diffing between runs surfaces per-query regressions after code changes
  • Baseline import — existing offline eval results imported as baseline for comparison against live experiment runs
# Run evaluation with experiment tracking
python -m scripts.braintrust_eval run --experiment "after-reranking-fix"

# Import existing eval results as baseline
python -m scripts.braintrust_eval import-baseline eval_report.json

Persistent Caching (L1 + L2)

Two-tier caching for instant repeat queries:

  • L1: In-memory LRU — fuzzy word matching, instant replay
  • L2: PostgreSQL — SHA-256 hash lookup, 24-hour TTL, persists across deploys

Guardrails & AI Safety

"Make me a phishing site"Rejected

Multi-layer guardrail system:

  1. Content Moderation — safety filter blocking harmful, toxic, or unsafe content
  2. Intent Classification — routes queries into valid modes or rejects off-topic
  3. Data Confidence Labels — every response tagged as direct, partial, or insufficient
  4. Post-Response Verification — checks all cited package names and statistics against database
  5. SQL Safety — write operations blocked, schema validation, timeout, row cap
  6. Safe Fallbacks — "Insufficient data" responses when packages aren't in the database, never hallucinated stats

Data Scale

Dataset Scale Source
Packages indexed 85K+ deps.dev (BigQuery)
PyPI metadata 85K+ PyPI JSON API
AI-enriched profiles 27K+ GPT-4o-mini (Batch API)
Download data points 390K+ pypistats.org
Dependency relationships 584K+ PyPI metadata
Aggregate dependent signals 2.1M+ deps.dev
Semantic search vectors 85K+ OpenAI embeddings

FastAPI REST API

Production REST API powering the Next.js frontend via SSE streaming and JSON endpoints:

Method Endpoint Description
POST /api/search/stream Main query — SSE stream with real-time progress, metadata, and token-by-token analysis
POST /api/search Synchronous query — full agent pipeline, returns JSON
GET /api/package/{name} Package detail — stats, health score, code snippet
GET /api/compare Side-by-side multi-package comparison
GET /api/health/{name} Health check with risk flags
GET /api/downloads Monthly download trends for charting
POST /api/report/pdf PDF comparison report with AI analysis
GET /api/stats Dataset statistics
GET /api/search/quick Fast semantic search (no agent, low latency)
GET /api/dependents/{name} Reverse dependency lookup
POST /api/feedback Submit thumbs up/down rating
GET /api/dashboard/overview KPI summary (latency, cache rate, errors)
GET /api/dashboard/traces Query traces with stage timings
GET /api/dashboard/stages Per-stage timing breakdown
GET /api/dashboard/feedback Feedback summary and trends

System Components

reposcout/
│
├── backend/
│   ├── api/              # FastAPI REST endpoints + SSE streaming
│   ├── agents/           # Multi-model agentic pipeline (orchestrator, analysis, synthesis)
│   ├── services/         # Health scoring, vector search, live data enrichment
│   └── models/           # Pydantic request/response schemas
│
├── frontend/
│   ├── pages/            # Next.js search interface
│   ├── components/       # Stats, comparisons, charts, health rings, AI analysis
│   └── lib/              # API client (SSE + REST)
│
├── data-pipeline/
│   ├── ingestion/        # PyPI, GitHub, deps.dev data collection
│   └── embeddings/       # Vector embedding generation + indexing
│
└── docs/                 # Architecture diagrams + screenshots
Module What It Does
Agentic Orchestrator Multi-model pipeline with autonomous tool calling and iterative reasoning
NL-to-SQL Engine Converts natural language to validated PostgreSQL queries with safety guardrails
Semantic Search Vector similarity search across 85K+ package embeddings with hybrid re-ranking
Package Intelligence Stats aggregation, health scoring, and live metadata enrichment
Code Analysis GitHub source fetching + AI-powered pattern extraction
Data Pipeline ETL from PyPI, deps.dev, pypistats.org into PostgreSQL + Qdrant
Streaming Layer SSE-based real-time progress, metadata, and token-by-token analysis
PDF Reports Client-ready benchmarking output with comparison tables and AI analysis

Tech Stack

Layer Technology
Frontend Next.js, shadcn/ui, Recharts, Tailwind CSS
Backend Python, FastAPI, SSE streaming
Database PostgreSQL (Supabase) — 4 tables, 600K+ rows, NL-to-SQL target
Vector Search Qdrant Cloud
AI Models Mistral (moderation, classification, code analysis) + OpenAI (orchestration, synthesis, embeddings)
PDF Generation fpdf2
Deployment Render (backend) + Vercel (frontend)

Stay Ahead in the AI Game

Try this query: "What are the fastest growing AI libraries in Python?"

RepoScout surfaces real-time shifts in the AI ecosystem:

  • openai-agents grew 22,000% this year — it barely existed 12 months ago
  • google-genai up 2,400%, pydantic-ai up 691%, anthropic nearly tripled
  • openai still leads adoption with 9,000+ dependents and 260M+ monthly downloads

Data you won't find on any blog post or newsletter. Just Ask RepoScout!


Screenshots

Search Interface & Suggestion Cards

Search Interface

Stats Banner & Comparison Table

Stats and Comparison

Charts — Adoption & Download Trends

Charts and Download Trends

AI-Powered Analysis

AI Analysis

Guardrails in Action

Rejected Query


Future Roadmap

  • Expand dataset coverage — Integrate historical dependency graph data (Libraries.io) to add version-level dependency resolution, historical maintainer activity, and repository-level metadata for 4M+ projects. Would significantly deepen the intelligence layer beyond the current 85K-package snapshot.
  • Automated data refresh pipeline — Scheduled job (cron / GitHub Actions) that periodically re-fetches PyPI metadata, download statistics, and dependency counts so RepoScout always reflects the latest state of the Python ecosystem — not a point-in-time snapshot.
  • Evaluation and benchmarksDONE. 50-query golden test set with Recall@K, MRR, LLM-as-judge (5 criteria), and full observability dashboard with per-stage tracing. See Evaluation Dashboard and Offline Evaluation Framework.

Built With & Acknowledgments

RepoScout stands on the shoulders of incredible open-source projects and platforms. Grateful to the teams behind each of these:

Technology What it does How RepoScout uses it
OpenAI AI research lab GPT-4o-mini powers orchestration, synthesis, and batch enrichment; text-embedding-3-small generates 85K vectors
Mistral AI French AI lab building open-weight and commercial LLMs 3 models: moderation (free), classification (Ministral 8B), and code analysis (Devstral)
Supabase Open-source Firebase alternative (hosted PostgreSQL) Production database — 85K packages, metadata, 390K download stats, 27K enriched profiles
Qdrant Open-source vector database for similarity search Hosts 85K package embeddings on Qdrant Cloud for semantic package discovery
DuckDB In-process analytical database (like SQLite for analytics) Local data store with full READMEs for code snippet extraction and offline scripts
Google deps.dev Open source dependency intelligence by Google Primary data source for package dependency graphs and adoption metrics via BigQuery
PyPI The Python Package Index — official package repository READMEs, keywords, classifiers, version history, and dependency lists for 85K packages
pypistats.org Community-run PyPI download statistics API 6 months of daily download data powering trend charts
Next.js React framework for production web apps Frontend with SSE streaming, real-time UI updates
FastAPI Modern Python web framework for building APIs Backend serving SSE streams, REST endpoints, and orchestrating the agent pipeline

License

(C) 2026 Charusmita Dhiman. All Rights Reserved.

About

AI-powered Open Source Intelligence Engine. Query 85K+ PyPI packages with semantic search, health scores, dependency graphs, and code analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors