Skip to content

vivek34561/Adaptive-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

74 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ Adaptive-RAG โ€“ Intelligent Retrieval-Augmented Generation System

Python LangGraph LangChain FastAPI Next.js Docker License

Docker Backend Pulls Docker Frontend Pulls

A self-correcting Retrieval-Augmented Generation system that dynamically routes, grades, and rewrites queries for accurate, grounded answers โ€” with a real-time streaming chat UI.

โœจ Features โ€ข ๐Ÿ—๏ธ Architecture โ€ข ๐Ÿ› ๏ธ Tech Stack โ€ข ๐Ÿš€ Quick Start โ€ข ๐Ÿณ Docker โ€ข โ˜๏ธ Deployment โ€ข ๐Ÿ“‚ Project Structure


๐ŸŽฏ Overview

Adaptive-RAG is a production-grade RAG pipeline built with LangChain and LangGraph that intelligently decides how to retrieve information before generating an answer.

Instead of blindly retrieving documents, the system:

  • Routes queries to a local FAISS vectorstore for domain-specific questions
  • Escalates to a human reviewer when the query is out-of-scope or needs expert judgment
  • Grades retrieved documents for relevance before generating
  • Rewrites queries when retrieval quality is poor, then retries
  • Validates answers for hallucinations and grounding before returning

This results in more accurate, less hallucinated, and context-aware answers.


๐Ÿ’ก Why I Built This

The Problem: Traditional RAG systems blindly retrieve from a vectorstore even when:

  • The question is out of domain
  • Knowledge is stale or missing
  • Retrieval quality is poor

This leads to hallucinations or incomplete answers.

My Goal: Build a self-correcting RAG pipeline that can:

  • Decide where to retrieve from
  • Judge how good the retrieval is
  • Improve itself by rewriting queries when needed
  • Know when to escalate rather than guess

Key Learning: LangGraph is ideal for building conditional, feedback-driven RAG workflows instead of linear chains.


โœจ Key Features

  • Adaptive Routing โ€” Automatically routes between vectorstore retrieval and human escalation based on query type
  • Retrieval Grading โ€” LLM-based grader evaluates whether retrieved documents are relevant
  • Hallucination Detection โ€” Grades generated answers against retrieved context for factual grounding
  • Query Rewriting โ€” Reformulates weak or ambiguous questions to improve retrieval quality
  • Human Escalation โ€” Out-of-scope queries are escalated and logged to a reviewer queue
  • Real-time Streaming โ€” SSE-based token-by-token streaming with status updates (routing, retrieving, grading, generating)
  • Chat Persistence โ€” Full conversation history stored in Supabase (PostgreSQL) with in-memory fallback
  • Session Management โ€” LLM-generated chat titles, per-session message history, sidebar navigation
  • Graph-based Control Flow โ€” LangGraph manages explicit state, conditional edges, and retry loops
  • Fully Dockerized โ€” Backend and frontend images published to Docker Hub with CI/CD on every push

๐Ÿ—๏ธ Architecture

LangGraph Workflow

User Query
   โ”‚
   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  route_question โ”‚  โ† llama-3.1-8b-instant decides: vectorstore or human_escalation
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
   โ”‚
   โ”œโ”€โ”€โ–บ human_escalation   โ† Query logged, graceful escalation message returned
   โ”‚
   โ””โ”€โ”€โ–บ retrieve (FAISS)
            โ”‚
            โ–ผ
       grade_documents     โ† Each doc scored relevant / not relevant
            โ”‚
            โ”œโ”€โ”€ All relevant โ†’ generate
            โ”‚                     โ”‚
            โ”‚                     โ–ผ
            โ”‚              grade_generation   โ† Hallucination + answer quality check
            โ”‚                     โ”‚
            โ”‚                     โ”œโ”€โ”€ useful โ†’ โœ… Return answer
            โ”‚                     โ”œโ”€โ”€ not useful โ†’ generate (retry)
            โ”‚                     โ””โ”€โ”€ not supported โ†’ generate (retry)
            โ”‚
            โ””โ”€โ”€ None relevant
                     โ”‚
                     โ”œโ”€โ”€ retrieval_attempts < 1 โ†’ transform_query โ†’ retrieve (retry)
                     โ””โ”€โ”€ retrieval_attempts โ‰ฅ 1 โ†’ human_escalation

System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Next.js Frontend  โ”‚ โ”€โ”€SSEโ”€โ”€โ–บโ”‚   FastAPI Backend         โ”‚
โ”‚  (Vercel)           โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚  (Render)                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚                          โ”‚
                                โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
                                โ”‚  โ”‚  LangGraph RAG App  โ”‚  โ”‚
                                โ”‚  โ”‚  (lazy-loaded)      โ”‚  โ”‚
                                โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
                                โ”‚           โ”‚              โ”‚
                                โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
                                โ”‚    โ–ผ             โ–ผ       โ”‚
                                โ”‚  FAISS       Groq LLM   โ”‚
                                โ”‚  (HF API     (llama-3)  โ”‚
                                โ”‚  embeddings)            โ”‚
                                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                           โ”‚
                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
                                    โ”‚  Supabase   โ”‚
                                    โ”‚ (PostgreSQL)โ”‚
                                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ› ๏ธ Tech Stack

Component Technology
Orchestration LangGraph โ‰ฅ 0.2
RAG Framework LangChain โ‰ฅ 0.3
LLM Groq โ€” llama-3.1-8b-instant
Embedding Model sentence-transformers/all-MiniLM-L6-v2 (via HuggingFace Inference API โ€” no local PyTorch)
Vector Store FAISS (CPU, persisted to disk)
Backend FastAPI + Uvicorn (streaming via SSE)
Frontend Next.js 16 (React, TypeScript, Tailwind CSS)
Chat Storage Supabase (PostgreSQL) + in-memory fallback
Containerization Docker + Docker Compose
CI/CD GitHub Actions โ†’ Docker Hub
Deployment Render (backend), Vercel (frontend)
Python Version 3.11.11

๐Ÿ”ฌ How It Works

1. Routing

The route_question node uses llama-3.1-8b-instant with structured output to decide:

  • vectorstore โ€” domain questions about AI agents, prompt engineering, adversarial attacks
  • human_escalation โ€” off-topic, policy-sensitive, or time-sensitive queries

2. Retrieval

  • FAISS vectorstore is built from web URLs + local PDFs on startup
  • Embeddings are generated via HuggingFace Inference API (all-MiniLM-L6-v2) โ€” no PyTorch required
  • The index is cached to disk and reused across restarts

3. Document Grading

Each retrieved document is scored yes/no for relevance to the question. Irrelevant documents are filtered out.

4. Generation

Relevant context + chat history are passed to the RAG chain (llama-3.1-8b-instant) for answer generation.

5. Answer Validation

The generated answer is checked for:

  • Hallucinations โ€” is it grounded in the retrieved documents?
  • Adequacy โ€” does it actually address the question?

If either check fails, the system retries or escalates.

6. Streaming

The backend streams status updates and answer tokens via Server-Sent Events (SSE). The frontend renders tokens word-by-word as they arrive.


๐Ÿš€ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Groq API key (free)
  • HuggingFace token (free, for embeddings API)
  • Supabase project (optional โ€” falls back to in-memory storage)

1. Clone & Setup

git clone https://github.com/vivek34561/Adaptive-RAG.git
cd Adaptive-RAG

2. Create .env

GROQ_API_KEY=your_groq_api_key
HF_TOKEN=your_huggingface_token
TAVILY_API_KEY=your_tavily_key          # optional
LANGCHAIN_API_KEY=your_langsmith_key    # optional, for tracing
DATABASE_URL=your_supabase_postgres_url # optional, falls back to in-memory

3. Install Backend Dependencies

pip install -r requirements.txt

4. Run Backend

uvicorn backend:app --reload --port 8000

On startup, the server:

  1. Binds to port immediately (no startup delay)
  2. Kicks off a background warmup โ€” builds/loads the FAISS index
  3. Logs ---WARMUP: Done. Backend ready.--- when ready

5. Run Frontend

cd frontend
npm install
npm run dev

Set NEXT_PUBLIC_API_BASE_URL=http://localhost:8000 in frontend/.env.local.

Open http://localhost:3000.


๐Ÿณ Docker

Pre-built images are available on Docker Hub and updated automatically on every push to main.

Pull & Run

# Backend (FastAPI on port 7860)
docker pull vivek3242/adaptive-rag-backend:latest
docker run -p 7860:7860 \
  -e GROQ_API_KEY=your_key \
  -e HF_TOKEN=your_token \
  vivek3242/adaptive-rag-backend:latest

# Frontend (Next.js on port 3000)
docker pull vivek3242/adaptive-rag-frontend:latest
docker run -p 3000:3000 vivek3242/adaptive-rag-frontend:latest

Run Full Stack Locally with Docker Compose

# Clone the repo and add your .env file, then:
docker compose up --build
Service URL
Frontend http://localhost:3000
Backend http://localhost:7860
Health http://localhost:7860/health
# Run in background
docker compose up -d --build

# View logs
docker compose logs -f

# Stop everything
docker compose down

Docker Hub Repositories

Image Link
Backend vivek3242/adaptive-rag-backend
Frontend vivek3242/adaptive-rag-frontend

Images are tagged with both :latest and :<commit-sha> for easy rollbacks.


โ˜๏ธ Deployment

Backend โ†’ Render

  1. Push code to GitHub
  2. Create a new Web Service on Render
  3. Set Start Command: uvicorn backend:app --host 0.0.0.0 --port $PORT
  4. Set Python Version: 3.11.11 (via runtime.txt)
  5. Add all environment variables under Environment:
Key Value
GROQ_API_KEY your key
HF_TOKEN your key
DATABASE_URL your Supabase URL
TAVILY_API_KEY your key (optional)

โš ๏ธ Free tier note: Render free tier spins services down after 15 min of inactivity. Cold start takes ~2 min.

Frontend โ†’ Vercel

  1. Connect your GitHub repo to Vercel
  2. Set Root Directory to frontend
  3. Add environment variable:
Key Value
NEXT_PUBLIC_API_BASE_URL https://your-backend.onrender.com

CI/CD โ€” GitHub Actions โ†’ Docker Hub

On every push to main, the workflow automatically:

  1. Builds the backend Docker image โ†’ pushes vivek3242/adaptive-rag-backend:latest
  2. Builds the frontend Docker image โ†’ pushes vivek3242/adaptive-rag-frontend:latest

Required GitHub Secrets:

Secret Value
DOCKERHUB_TOKEN Docker Hub access token (Read & Write)
NEXT_PUBLIC_API_BASE_URL Your Render backend URL

๐Ÿ“‚ Project Structure

Adaptive-RAG/
โ”œโ”€โ”€ backend.py                        # FastAPI app โ€” all API endpoints + SSE streaming
โ”œโ”€โ”€ requirements.txt                  # Python dependencies (no PyTorch!)
โ”œโ”€โ”€ runtime.txt                       # Python 3.11.11 for Render
โ”œโ”€โ”€ Dockerfile                        # Backend Docker image (port 7860)
โ”œโ”€โ”€ docker-compose.yml                # Full-stack local dev (backend + frontend)
โ”œโ”€โ”€ .dockerignore                     # Excludes venvs, secrets from backend image
โ”œโ”€โ”€ .env                              # Local secrets (not committed)
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ graphs/
โ”‚   โ”‚   โ””โ”€โ”€ graph_builder.py          # FAISS index builder + HuggingFace API embeddings
โ”‚   โ”œโ”€โ”€ llms/
โ”‚   โ”‚   โ””โ”€โ”€ llm.py                    # RAG prompt template + Groq LLM chain
โ”‚   โ”œโ”€โ”€ nodes/
โ”‚   โ”‚   โ””โ”€โ”€ node_implementation.py    # All graph nodes: route, retrieve, grade, generate, escalate
โ”‚   โ”œโ”€โ”€ states/
โ”‚   โ”‚   โ””โ”€โ”€ state.py                  # LangGraph state schema + compiled app
โ”‚   โ”œโ”€โ”€ storage/
โ”‚   โ”‚   โ””โ”€โ”€ chat_store.py             # Supabase session/message persistence
โ”‚   โ””โ”€โ”€ data/
โ”‚       โ””โ”€โ”€ faiss_index/              # Vectorstore cache (auto-created at runtime)
โ”‚
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ Dockerfile                    # Frontend Docker image (multi-stage, Next.js standalone)
โ”‚   โ”œโ”€โ”€ .dockerignore                 # Excludes node_modules from build context
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ app/                      # Next.js App Router pages
โ”‚   โ”‚   โ””โ”€โ”€ components/ui/
โ”‚   โ”‚       โ””โ”€โ”€ animated-ai-chat.tsx  # Main chat UI with sidebar + streaming
โ”‚   โ”œโ”€โ”€ .env.local                    # Frontend env (NEXT_PUBLIC_API_BASE_URL)
โ”‚   โ””โ”€โ”€ package.json
โ”‚
โ”œโ”€โ”€ documents/                        # Drop PDFs here to add to the knowledge base
โ””โ”€โ”€ .github/workflows/main.yaml       # CI/CD โ†’ builds & pushes Docker images to Docker Hub

๐Ÿ“Š What Makes This Project Stand Out

Aspect Detail
Self-correcting Not a linear chain โ€” the graph retries, rewrites, and escalates
Streaming UX Real-time status + token streaming via SSE
Production patterns Lazy loading, startup warmup, in-memory fallback, graceful error handling
No local PyTorch Embeddings use HuggingFace Inference API โ€” lightweight, deployable on free tier
Persistent history Supabase-backed chat sessions with automatic LLM-generated titles
Fully Dockerized Multi-stage builds, Docker Hub CI/CD, docker-compose for local dev

This is the kind of RAG system used in enterprise knowledge assistants, AI support bots, and research copilots.


๐Ÿ”ฎ Future Improvements

  • Multi-document upload via UI
  • Tool-augmented RAG (calculator, code interpreter)
  • Evaluation dashboard with RAGAS metrics
  • Confidence-based answer refusal
  • Support for multiple knowledge domains with separate vectorstores

๐Ÿ‘จโ€๐Ÿ’ป Author

Vivek Kumar Gupta
AI Engineering Student | GenAI & Agentic Systems Builder


๐Ÿ“„ License

MIT License ยฉ 2025 Vivek Kumar Gupta

About

An adaptive Retrieval-Augmented Generation (RAG) system that dynamically routes queries between vector search and web search for accurate, grounded answers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors