A self-correcting Retrieval-Augmented Generation system that dynamically routes, grades, and rewrites queries for accurate, grounded answers โ with a real-time streaming chat UI.
โจ Features โข ๐๏ธ Architecture โข ๐ ๏ธ Tech Stack โข ๐ Quick Start โข ๐ณ Docker โข โ๏ธ Deployment โข ๐ Project Structure
Adaptive-RAG is a production-grade RAG pipeline built with LangChain and LangGraph that intelligently decides how to retrieve information before generating an answer.
Instead of blindly retrieving documents, the system:
- Routes queries to a local FAISS vectorstore for domain-specific questions
- Escalates to a human reviewer when the query is out-of-scope or needs expert judgment
- Grades retrieved documents for relevance before generating
- Rewrites queries when retrieval quality is poor, then retries
- Validates answers for hallucinations and grounding before returning
This results in more accurate, less hallucinated, and context-aware answers.
The Problem: Traditional RAG systems blindly retrieve from a vectorstore even when:
- The question is out of domain
- Knowledge is stale or missing
- Retrieval quality is poor
This leads to hallucinations or incomplete answers.
My Goal: Build a self-correcting RAG pipeline that can:
- Decide where to retrieve from
- Judge how good the retrieval is
- Improve itself by rewriting queries when needed
- Know when to escalate rather than guess
Key Learning: LangGraph is ideal for building conditional, feedback-driven RAG workflows instead of linear chains.
- Adaptive Routing โ Automatically routes between vectorstore retrieval and human escalation based on query type
- Retrieval Grading โ LLM-based grader evaluates whether retrieved documents are relevant
- Hallucination Detection โ Grades generated answers against retrieved context for factual grounding
- Query Rewriting โ Reformulates weak or ambiguous questions to improve retrieval quality
- Human Escalation โ Out-of-scope queries are escalated and logged to a reviewer queue
- Real-time Streaming โ SSE-based token-by-token streaming with status updates (routing, retrieving, grading, generating)
- Chat Persistence โ Full conversation history stored in Supabase (PostgreSQL) with in-memory fallback
- Session Management โ LLM-generated chat titles, per-session message history, sidebar navigation
- Graph-based Control Flow โ LangGraph manages explicit state, conditional edges, and retry loops
- Fully Dockerized โ Backend and frontend images published to Docker Hub with CI/CD on every push
User Query
โ
โผ
โโโโโโโโโโโโโโโโโโโ
โ route_question โ โ llama-3.1-8b-instant decides: vectorstore or human_escalation
โโโโโโโโโโโโโโโโโโโ
โ
โโโโบ human_escalation โ Query logged, graceful escalation message returned
โ
โโโโบ retrieve (FAISS)
โ
โผ
grade_documents โ Each doc scored relevant / not relevant
โ
โโโ All relevant โ generate
โ โ
โ โผ
โ grade_generation โ Hallucination + answer quality check
โ โ
โ โโโ useful โ โ
Return answer
โ โโโ not useful โ generate (retry)
โ โโโ not supported โ generate (retry)
โ
โโโ None relevant
โ
โโโ retrieval_attempts < 1 โ transform_query โ retrieve (retry)
โโโ retrieval_attempts โฅ 1 โ human_escalation
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Next.js Frontend โ โโSSEโโโบโ FastAPI Backend โ
โ (Vercel) โโโโโโโโโโโ (Render) โ
โโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โ LangGraph RAG App โ โ
โ โ (lazy-loaded) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโดโโโโโโโ โ
โ โผ โผ โ
โ FAISS Groq LLM โ
โ (HF API (llama-3) โ
โ embeddings) โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ
โโโโโโโโผโโโโโโโ
โ Supabase โ
โ (PostgreSQL)โ
โโโโโโโโโโโโโโโ
| Component | Technology |
|---|---|
| Orchestration | LangGraph โฅ 0.2 |
| RAG Framework | LangChain โฅ 0.3 |
| LLM | Groq โ llama-3.1-8b-instant |
| Embedding Model | sentence-transformers/all-MiniLM-L6-v2 (via HuggingFace Inference API โ no local PyTorch) |
| Vector Store | FAISS (CPU, persisted to disk) |
| Backend | FastAPI + Uvicorn (streaming via SSE) |
| Frontend | Next.js 16 (React, TypeScript, Tailwind CSS) |
| Chat Storage | Supabase (PostgreSQL) + in-memory fallback |
| Containerization | Docker + Docker Compose |
| CI/CD | GitHub Actions โ Docker Hub |
| Deployment | Render (backend), Vercel (frontend) |
| Python Version | 3.11.11 |
The route_question node uses llama-3.1-8b-instant with structured output to decide:
vectorstoreโ domain questions about AI agents, prompt engineering, adversarial attackshuman_escalationโ off-topic, policy-sensitive, or time-sensitive queries
- FAISS vectorstore is built from web URLs + local PDFs on startup
- Embeddings are generated via HuggingFace Inference API (
all-MiniLM-L6-v2) โ no PyTorch required - The index is cached to disk and reused across restarts
Each retrieved document is scored yes/no for relevance to the question. Irrelevant documents are filtered out.
Relevant context + chat history are passed to the RAG chain (llama-3.1-8b-instant) for answer generation.
The generated answer is checked for:
- Hallucinations โ is it grounded in the retrieved documents?
- Adequacy โ does it actually address the question?
If either check fails, the system retries or escalates.
The backend streams status updates and answer tokens via Server-Sent Events (SSE). The frontend renders tokens word-by-word as they arrive.
- Python 3.11+
- Node.js 18+
- Groq API key (free)
- HuggingFace token (free, for embeddings API)
- Supabase project (optional โ falls back to in-memory storage)
git clone https://github.com/vivek34561/Adaptive-RAG.git
cd Adaptive-RAGGROQ_API_KEY=your_groq_api_key
HF_TOKEN=your_huggingface_token
TAVILY_API_KEY=your_tavily_key # optional
LANGCHAIN_API_KEY=your_langsmith_key # optional, for tracing
DATABASE_URL=your_supabase_postgres_url # optional, falls back to in-memorypip install -r requirements.txtuvicorn backend:app --reload --port 8000On startup, the server:
- Binds to port immediately (no startup delay)
- Kicks off a background warmup โ builds/loads the FAISS index
- Logs
---WARMUP: Done. Backend ready.---when ready
cd frontend
npm install
npm run devSet NEXT_PUBLIC_API_BASE_URL=http://localhost:8000 in frontend/.env.local.
Open http://localhost:3000.
Pre-built images are available on Docker Hub and updated automatically on every push to main.
# Backend (FastAPI on port 7860)
docker pull vivek3242/adaptive-rag-backend:latest
docker run -p 7860:7860 \
-e GROQ_API_KEY=your_key \
-e HF_TOKEN=your_token \
vivek3242/adaptive-rag-backend:latest
# Frontend (Next.js on port 3000)
docker pull vivek3242/adaptive-rag-frontend:latest
docker run -p 3000:3000 vivek3242/adaptive-rag-frontend:latest# Clone the repo and add your .env file, then:
docker compose up --build| Service | URL |
|---|---|
| Frontend | http://localhost:3000 |
| Backend | http://localhost:7860 |
| Health | http://localhost:7860/health |
# Run in background
docker compose up -d --build
# View logs
docker compose logs -f
# Stop everything
docker compose down| Image | Link |
|---|---|
| Backend | vivek3242/adaptive-rag-backend |
| Frontend | vivek3242/adaptive-rag-frontend |
Images are tagged with both
:latestand:<commit-sha>for easy rollbacks.
- Push code to GitHub
- Create a new Web Service on Render
- Set Start Command:
uvicorn backend:app --host 0.0.0.0 --port $PORT - Set Python Version:
3.11.11(viaruntime.txt) - Add all environment variables under Environment:
| Key | Value |
|---|---|
GROQ_API_KEY |
your key |
HF_TOKEN |
your key |
DATABASE_URL |
your Supabase URL |
TAVILY_API_KEY |
your key (optional) |
โ ๏ธ Free tier note: Render free tier spins services down after 15 min of inactivity. Cold start takes ~2 min.
- Connect your GitHub repo to Vercel
- Set Root Directory to
frontend - Add environment variable:
| Key | Value |
|---|---|
NEXT_PUBLIC_API_BASE_URL |
https://your-backend.onrender.com |
On every push to main, the workflow automatically:
- Builds the backend Docker image โ pushes
vivek3242/adaptive-rag-backend:latest - Builds the frontend Docker image โ pushes
vivek3242/adaptive-rag-frontend:latest
Required GitHub Secrets:
| Secret | Value |
|---|---|
DOCKERHUB_TOKEN |
Docker Hub access token (Read & Write) |
NEXT_PUBLIC_API_BASE_URL |
Your Render backend URL |
Adaptive-RAG/
โโโ backend.py # FastAPI app โ all API endpoints + SSE streaming
โโโ requirements.txt # Python dependencies (no PyTorch!)
โโโ runtime.txt # Python 3.11.11 for Render
โโโ Dockerfile # Backend Docker image (port 7860)
โโโ docker-compose.yml # Full-stack local dev (backend + frontend)
โโโ .dockerignore # Excludes venvs, secrets from backend image
โโโ .env # Local secrets (not committed)
โ
โโโ src/
โ โโโ graphs/
โ โ โโโ graph_builder.py # FAISS index builder + HuggingFace API embeddings
โ โโโ llms/
โ โ โโโ llm.py # RAG prompt template + Groq LLM chain
โ โโโ nodes/
โ โ โโโ node_implementation.py # All graph nodes: route, retrieve, grade, generate, escalate
โ โโโ states/
โ โ โโโ state.py # LangGraph state schema + compiled app
โ โโโ storage/
โ โ โโโ chat_store.py # Supabase session/message persistence
โ โโโ data/
โ โโโ faiss_index/ # Vectorstore cache (auto-created at runtime)
โ
โโโ frontend/
โ โโโ Dockerfile # Frontend Docker image (multi-stage, Next.js standalone)
โ โโโ .dockerignore # Excludes node_modules from build context
โ โโโ src/
โ โ โโโ app/ # Next.js App Router pages
โ โ โโโ components/ui/
โ โ โโโ animated-ai-chat.tsx # Main chat UI with sidebar + streaming
โ โโโ .env.local # Frontend env (NEXT_PUBLIC_API_BASE_URL)
โ โโโ package.json
โ
โโโ documents/ # Drop PDFs here to add to the knowledge base
โโโ .github/workflows/main.yaml # CI/CD โ builds & pushes Docker images to Docker Hub
| Aspect | Detail |
|---|---|
| Self-correcting | Not a linear chain โ the graph retries, rewrites, and escalates |
| Streaming UX | Real-time status + token streaming via SSE |
| Production patterns | Lazy loading, startup warmup, in-memory fallback, graceful error handling |
| No local PyTorch | Embeddings use HuggingFace Inference API โ lightweight, deployable on free tier |
| Persistent history | Supabase-backed chat sessions with automatic LLM-generated titles |
| Fully Dockerized | Multi-stage builds, Docker Hub CI/CD, docker-compose for local dev |
This is the kind of RAG system used in enterprise knowledge assistants, AI support bots, and research copilots.
- Multi-document upload via UI
- Tool-augmented RAG (calculator, code interpreter)
- Evaluation dashboard with RAGAS metrics
- Confidence-based answer refusal
- Support for multiple knowledge domains with separate vectorstores
Vivek Kumar Gupta
AI Engineering Student | GenAI & Agentic Systems Builder
- GitHub: github.com/vivek34561
- LinkedIn: linkedin.com/in/vivek-gupta-0400452b6
- Portfolio: resume-sepia-seven.vercel.app
MIT License ยฉ 2025 Vivek Kumar Gupta