Production-ready LangGraph multi-agent orchestration project with a supervisor routing agent, FastAPI backend, Streamlit frontend, PostgreSQL persistence, local RAG (ChromaDB), knowledge graph retrieval, and MCP tool integration.
If you are searching for a LangGraph multi-agent orchestration example, supervisor agent architecture, or a production multi-agent AI template, this repository is built for that exact use case.
Keywords: langgraph, multi-agent orchestration, supervisor agent, agent routing, fastapi, streamlit, rag, knowledge graph, mcp, genai.
- Multi-agent assistants with a supervisor/router pattern
- Agentic RAG systems with hybrid retrieval (vector + BM25 + reranking)
- Real-time chat services using FastAPI + Streamlit + SSE streaming
- Tool-using agents via MCP (
web_search,calculator)
- 12-agent LangGraph orchestration graph
- FastAPI service with
/invokeand/stream - Operational endpoints:
/metrics,/healthz,/readyz - Streamlit chat UI
- Hybrid supervisor routing (
intent_router_agent: rules first, LLM fallback for low-signal cases) - LlamaGuard moderation (
safety_agent) - Web retrieval with recency preferences + relevance filtering
- Local RAG with ChromaDB + hybrid retrieval (vector + BM25 + reranking), SQLite fallback
- Knowledge graph reasoning agent (NetworkX) for relationship-style local queries
- TTL caching for web search and local RAG retrieval
- Optional Redis-backed caching (with in-memory fallback)
- MCP tool bridge for
web_searchandcalculator(with local fallback) - Graph-native HITL for recency web-news queries (
web_hitl_gate_agent) - Streamlit HITL buttons (
Approve/Reject) with plain-text fallback (approve,reject: reason) - Backend HITL decision mapper (service-side pending cache per
user_id+thread_id) - HITL decision audit trail persisted per user/thread (
hitl_events) - Monitoring stack with Prometheus + Grafana
- Dual-layer persistence:
- LangGraph PostgreSQL checkpointer (graph state continuity)
- Conversation Store (PostgreSQL with SQLite fallback)
- Per-user conversation history in UI (load latest thread + switch from sidebar)
- Evaluation agent (heuristic quality audit)
- Per-user authentication (user ID + password) with bearer tokens
safety_agentintent_router_agentclarification_agentquery_rewriter_agentrecency_guard_agentweb_hitl_gate_agentweb_search_agentknowledge_graph_agentrag_agentmath_agentresponse_agentevaluation_agent
%%{init: {"flowchart":{"nodeSpacing":45,"rankSpacing":65},"themeVariables":{"fontSize":"16px"}}}%%
flowchart TD
U[User in Streamlit] --> AUTH[Login or Register]
AUTH --> HIST[Load conversation history]
HIST --> Q[User sends message]
Q --> API[FastAPI invoke or stream]
API --> STOREH[Store human message]
API --> HMAP[HITL mapper for plain approve or reject]
HMAP --> CFG[Build graph config model + thread_id]
CFG --> LG[LangGraph research_assistant]
CFG --> CHK[Postgres checkpoints]
LG --> SA[safety_agent]
SA --> IR[intent_router_agent]
IR -->|clarify| CA[clarification_agent]
CA --> EVA[evaluation_agent]
IR -->|rewrite| QR[query_rewriter_agent]
QR --> IR
IR -->|math| MA[math_agent]
MA --> RESP[response_agent]
IR -->|rag| RA[rag_agent]
RA --> RESP
IR -->|kg| KG[knowledge_graph_agent]
KG --> RA
IR -->|general| RESP
IR -->|web or hybrid| RG[recency_guard_agent]
RG --> WH[web_hitl_gate_agent]
WH -->|awaiting| WAIT[Wait for decision]
WAIT --> BTN[UI buttons Approve or Reject]
BTN --> API
WH -->|approved web| WS[web_search_agent]
WS -->|web route| RESP
WH -->|approved hybrid| WS
WS -->|hybrid route| RA
WH -->|rejected| REJ[Reject follow-up message]
REJ --> EVA
RESP --> EVA
EVA --> END[Graph end]
END --> APIRESP[Return answer to Streamlit]
APIRESP --> STOREA[Store AI message]
STOREA --> CST[conversation_store]
END --> HITLDB[Store HITL decision]
HITLDB --> HITL[hitl_events]
API --> METRICS["/healthz | /readyz | /metrics"]
METRICS --> PROM[Prometheus]
PROM --> GRAF[Grafana]
%%{init: {"flowchart":{"nodeSpacing":40,"rankSpacing":55},"themeVariables":{"fontSize":"16px"}}}%%
flowchart LR
subgraph K8S["Optional Local Kubernetes Deployment"]
direction LR
KUI[streamlit-app Deployment + Service]
KAGENT[agent-service Deployment + Service]
KPROM[prometheus Deployment + Service]
KGRAF[grafana Deployment + Service]
end
KUI --> KAGENT
KAGENT --> KPROM
KPROM --> KGRAF
clarification_agentasks a follow-up question and ends the current run.- The next user message starts a new run and is routed again.
- On login, UI fetches
GET /store/threads, auto-loads latest thread, and allows switching thread history. - For recency/news prompts, graph-level HITL runs in
web_hitl_gate_agent. - The assistant returns a preview; Streamlit shows
Approve/Rejectbuttons (typingapproveorreject: <reason>also works). - Service maps plain approve/reject into internal HITL control payload using pending context keyed by
user_id+thread_id. - HITL approve/reject is persisted in
hitl_eventsautomatically from graph state (/invokeand/stream). local:prefix routes torag_agentorknowledge_graph_agentfor relationship questions.recency_guard_agentapplies recency as a preference (fallback to most recent relevant results).- Mermaid graph edges remain stable for retrieval internals; new relationship reasoning runs in
knowledge_graph_agent. - Cache checks happen inside
web_search_agent,rag_agent, and graph-rag retrieval functions (Redis/in-memory fallback).
agent/research_assistant.py- LangGraph orchestration, agents, routing logicagent/tools.py- web search + filtering logicagent/local_rag.py- local RAG (ChromaDB + SQLite fallback)agent/graph_rag.py- dedicated graph RAG Chroma retrieval storeagent/knowledge_graph.py- NetworkX relationship extraction over graph RAG evidenceagent/llama_guard.py- moderation logicservice/service.py- FastAPI service + endpoints + checkpointer/store wiringservice/persistence_store.py- conversation Store layer (Postgres/SQLite)streamlit_app.py- Streamlit UImonitoring/prometheus.yml- Prometheus scrape config for/metricsscripts/ingestion/ingest_local_rag_pdfs.py- PDF ingestion to local RAG ChromaDBscripts/ingestion/ingest_graph_rag_pdfs.py- separate PDF ingestion for knowledge graph storescripts/ingestion/generate_synthetic_graph_pdfs.py- synthetic graph PDFs for KG testingk8s/- local Kubernetes manifests (app + monitoring stack)docs/architecture/agent_runtime_flow.md- runtime flow explainer (human-readable)docs/architecture/agent_runtime_flow.mmd- raw Mermaid source for the same flow
CIworkflow (.github/workflows/ci.yml)- triggers on
pushandpull_requesttomain - runs Python tests:
service/test_service.pyandschema/test_schema.py - verifies both Docker images build successfully
- triggers on
CDworkflow (.github/workflows/cd-release.yml)- triggers on version tags (
v*) or manual dispatch - builds and publishes Docker images to Docker Hub:
<namespace>/multi-agent-orchestration-service:<version><namespace>/multi-agent-orchestration-app:<version>
- required repo secrets:
DOCKERHUB_USERNAMEDOCKERHUB_TOKEN(Docker Hub access token)
- optional repo variable:
DOCKERHUB_NAMESPACE(if omitted, username is used as namespace)
- triggers on version tags (
git tag v1.0.0
git push origin v1.0.0POST /auth/register- register user and receive access tokenPOST /auth/login- login user and receive access tokenPOST /invoke- non-streaming chat responsePOST /stream- streaming chat responsePOST /web_search/preview- optional/manual preview endpoint for external clients (Streamlit default flow does not require this)POST /hitl/web_decision- optional/manual HITL audit event record endpointGET /hitl/web_decisions- list authenticated user's HITL audit recordsGET /store/threads- list authenticated user's recent conversation threadsGET /store/{thread_id}- inspect persisted conversation recordsPOST /feedback- user feedback/ratingGET /healthz- liveness probe endpointGET /readyz- readiness probe endpointGET /metrics- Prometheus scrape endpoint
When ENABLE_USER_AUTH=true, all non-auth endpoints require Authorization: Bearer <access_token>.
Bring up full stack:
docker compose up -d --buildAccess:
- FastAPI:
http://localhost:8000 - Streamlit:
http://localhost:8501 - Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3001(defaultadmin/admin)
Prometheus target should show agent_service as UP at:
http://localhost:9090/targets
- Request Rate:
sum(rate(http_requests_total[5m])) - Error Rate (5xx):
sum(rate(http_requests_total{status_code=~"5.."}[15m])) / clamp_min(sum(rate(http_requests_total[15m])), 1e-9)
- P95 Latency:
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[15m])))
- Traffic by endpoint:
sum by (path) (rate(http_requests_total[5m]))
- Status code distribution:
sum by (status_code) (rate(http_requests_total[5m]))
Beginner-friendly Kubernetes manifests are available in k8s/ with step-by-step instructions:
- See k8s/README.md
- Includes app + streamlit + Prometheus + Grafana manifests
- Includes health/readiness probes
- Supports one-command apply via
kubectl apply -k k8s
- Checkpointer (LangGraph, PostgreSQL)
- Stores graph execution/checkpoint state by
thread_id - Used for workflow state continuity/resume
- Conversation Store (PostgreSQL, SQLite fallback)
- Stores durable human/AI messages + metadata
- Used for history/debugging/audit (
/store/threads,/store/{thread_id})
- Checkpointer tables:
checkpointscheckpoint_writescheckpoint_blobs
- Conversation store table:
conversation_storeusershitl_events
intent_router_agent uses a hybrid strategy:
- Deterministic rules first (high precision, low latency)
- Optional LLM classifier fallback for low-signal queries (confidence-gated)
- Safe fallback to
generalif classifier confidence is low/unavailable
Primary deterministic rules:
local:prefix ->rag(orkgwhen relation intent is detected)- relationship reasoning + local context ->
kg - ambiguous (
help me,this,that) ->clarify - vague but rewritable (
news,latest news) ->rewrite - math-like query ->
math - web keywords ->
web - local/project keywords ->
rag - both web + local keywords ->
hybrid - fallback ->
general
Router debug metadata written into state:
route_confidenceroute_reason
git clone https://github.com/Theepankumargandhi/Agent-Orchestration.git
cd Agent-OrchestrationSet at least the following (adjust values to your machine):
OPENAI_API_KEY=...
GROQ_API_KEY=...
# FastAPI service port (your current setup uses 8080)
PORT=8080
API_BASE_URL=http://localhost:8080
# LangGraph checkpointer (PostgreSQL)
POSTGRES_CHECKPOINT_URI=postgresql://postgres:password@localhost:5432/agentdb
CHECKPOINT_FALLBACK_SQLITE=true
CHECKPOINT_DB_PATH=data/checkpoints/checkpoints.db
# Conversation Store (defaults to checkpoint URI if omitted)
POSTGRES_STORE_URI=postgresql://postgres:password@localhost:5432/agentdb
STORE_FALLBACK_SQLITE=true
STORE_DB_PATH=data/store/store.db
STORE_NAMESPACE=default
# RAG / ChromaDB
USE_CHROMA_RAG=true
CHROMA_PERSIST_DIR=data/chroma_db
CHROMA_COLLECTION_NAME=local_pdf_docs
RAG_PDF_DIR=rag_docs
LOCAL_RAG_DB_PATH=data/rag/local_rag.db
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
RAG_CHUNK_SIZE=1000
RAG_CHUNK_OVERLAP=150
# Optional hybrid router (rules + LLM fallback)
HYBRID_ROUTER_ENABLE=true
HYBRID_ROUTER_MIN_CONFIDENCE=0.75
# Optional hybrid RAG retrieval + reranking (Chroma path)
RAG_VECTOR_TOP_K=8
RAG_BM25_TOP_K=8
RAG_RERANK_TOP_K=6
RAG_RRF_K=60
RAG_ENABLE_LLM_RERANKER=true
# Graph RAG (separate store for knowledge_graph_agent)
GRAPH_RAG_ENABLED=true
GRAPH_RAG_PDF_DIR=graph_rag_docs
GRAPH_CHROMA_PERSIST_DIR=data/graph_chroma_db
GRAPH_CHROMA_COLLECTION_NAME=graph_pdf_docs
GRAPH_RAG_CHUNK_SIZE=1000
GRAPH_RAG_CHUNK_OVERLAP=150
GRAPH_RAG_CACHE_TTL_SECONDS=600
# Optional TTL caches (latency/cost optimization)
WEB_CACHE_TTL_SECONDS=300
RAG_CACHE_TTL_SECONDS=600
# Optional Redis cache backend (shared cache across processes/containers)
CACHE_USE_REDIS=true
REDIS_URL=redis://localhost:6379/0
# Optional API auth
AUTH_SECRET=
# Per-user auth (enabled by default)
ENABLE_USER_AUTH=true
USER_AUTH_SECRET=change_me_to_a_long_random_secret
USER_AUTH_TOKEN_TTL_SECONDS=86400
PASSWORD_HASH_ITERATIONS=210000
# Optional MCP tools (web_search + calculator)
MCP_TOOLS_ENABLED=true
MCP_TOOL_SERVER_COMMAND=.venv\Scripts\python.exe
MCP_TOOL_SERVER_SCRIPT=agent\mcp_tool_server.py
MCP_TOOL_SERVER_ARGS=
MCP_BRIDGE_COMMAND=.venv\Scripts\python.exe
MCP_BRIDGE_SCRIPT=agent\mcp_bridge_client.py
MCP_CALL_TIMEOUT_SECONDS=20pip install -r requirements.txtpython run_service.pyIf your main runtime is llm_env, install MCP only in the separate .venv used by MCP_BRIDGE_COMMAND / MCP_TOOL_SERVER_COMMAND:
.venv\Scripts\python.exe -m pip install mcp==1.12.4If .venv\Scripts\python.exe shows No Python at ..., recreate .venv first:
python -m venv .venv
.venv\Scripts\python.exe -m pip install -r requirements.txt
.venv\Scripts\python.exe -m pip install mcp==1.12.4streamlit run streamlit_app.pyOn first launch, use the sidebar Register flow (user ID + password), then sign in.
- Put PDFs in
rag_docs/ - Run:
python scripts/ingestion/ingest_local_rag_pdfs.py --pdf-dir rag_docs --reset- Ask a forced local query:
local: summarize the uploaded pdfs
- Put relationship-focused PDFs in
graph_rag_docs/ - Run:
python scripts/ingestion/ingest_graph_rag_pdfs.py --pdf-dir graph_rag_docs --reset- Ask a relationship query:
local: how is streamlit connected to fastapi in this project
scripts/ingestion/ingest_local_rag_pdfs.pyupdates Local RAG only (rag_docs->chroma_db).scripts/ingestion/ingest_graph_rag_pdfs.pyupdates Graph RAG / KG only (graph_rag_docs->graph_chroma_db).knowledge_graph_agentuses Graph RAG retrieval, then forwards torag_agentfor extra grounding beforeresponse_agent.
When ChromaDB is enabled, local retrieval now uses a hybrid pipeline inside agent/local_rag.py:
- Chroma vector retrieval (semantic)
- BM25-style lexical ranking over local Chroma corpus
- Reciprocal Rank Fusion (RRF)
- Optional LLM reranker (falls back to heuristic reranker)
This improves local retrieval precision for paraphrases and keyword-heavy queries.
- Web search TTL cache (
agent/tools.py)- caches
perform_web_search(...)results by query/recency/relevance key - default TTL:
300s
- caches
- Local RAG TTL cache (
agent/local_rag.py)- caches
search_local_knowledge(...)results by query/limit/backend key - default TTL:
600s - cache is cleared automatically after PDF ingestion/reset
- caches
- Redis support (optional)
- if
REDIS_URLis configured and reachable, web/RAG caches use Redis (setex) - if Redis is unavailable, system falls back to local in-memory TTL cache automatically
- if
- UI cache visibility
- the response footer shows source path when available (for example:
Sources: web via mcporSources: web via memory cache) - live retrievals may omit source footer when no source metadata is present
- the response footer shows source path when available (for example:
web_search_agentcalls an MCP stdio tool server first for web search.math_agentcalls an MCP stdio calculator tool first.- Main app env (
llm_env) does not need themcppackage anymore. - MCP client calls a sidecar bridge script (
agent/mcp_bridge_client.py) usingMCP_BRIDGE_COMMAND. - In
llm_env, point bothMCP_BRIDGE_COMMANDandMCP_TOOL_SERVER_COMMANDto a separate.venvPython wheremcpis installed. - If MCP is disabled/unavailable, both agents automatically fall back to existing local logic.
- Defaults:
MCP_TOOL_SERVER_SCRIPT->agent/mcp_tool_server.pyMCP_BRIDGE_SCRIPT->agent/mcp_bridge_client.py
Open in browser (replace with your sidebar thread ID and correct backend port):
http://localhost:8080/store/threads?limit=30
http://localhost:8080/store/<thread_id>?limit=50
conversation_store-> human/AI messages + metadatacheckpoints,checkpoint_writes,checkpoint_blobs-> LangGraph checkpoint internals
evaluation_agentis heuristic (not factual verification)clarification_agentends current run; final answer comes on next user turn- hybrid router still depends on classifier confidence thresholds and can misroute low-signal queries
- RAG quality still depends on chunking, embeddings, ingestion quality, and reranker behavior