Upload any document (PDF, image, DOCX, XLSX, PowerPoint, CSV, text, or web URL) and ask questions in plain English. Fully local, private, and powered by Mistral via Ollama β with optional OpenAI support.
| Feature | Description |
|---|---|
| π§ Thread-Safe Vector Store | Fixed FAISS race conditions with proper locking |
| β‘ Reciprocal Rank Fusion (RRF) | Accurate hybrid scoring between FAISS and BM25 instead of zero-insertion |
| π€ OpenAI Provider Support | Switchable LLM backend β Ollama (default) or OpenAI via LLM_PROVIDER |
| π§ Local Analysis Engine | Intelligent offline answers using keyword extraction & entity analysis |
| π Semantic Search | Dedicated search endpoint returning ranked passages without LLM overhead |
| π Batch Q&A | Submit multiple questions at once, export results as CSV |
| π Query Analytics Dashboard | Track query frequency, response times, popular documents, failure rates |
| π¦ Export / Import Index | Download and restore complete FAISS index as .zip for backup/migration |
| π Document Versioning | Track upload history with version numbers and diff metadata |
| β° Scheduled Web Crawling | Background thread auto-refreshes web URL sources on configurable interval |
| πΈοΈ Knowledge Graph Canvas | Interactive canvas-based entity graph visualization in the browser |
| π± Mobile Sidebar Toggle | Responsive design with collapsible sidebar overlay for mobile screens |
| π Markdown Rendering | Full markdown rendering in chat (tables, code, headers) via marked.js |
| π Async Ingestion | Non-blocking document processing with task status tracking |
| π‘οΈ Security Hardening | Windows reserved names, hidden file protection, WebSocket input limits |
| π Lazy Module Loading | Heavy deps loaded only when needed β faster cold starts |
See the full CHANGELOG for details.
RAG (Retrieval-Augmented Generation) lets an AI answer questions about your specific documents. Instead of guessing from training data, it:
- Reads and indexes your documents (text extraction + semantic embedding).
- Finds the most relevant passages when you ask a question (FAISS + BM25 + Reciprocal Rank Fusion).
- Feeds those passages to an LLM (Mistral 7B via Ollama, or GPT via OpenAI) as context.
- Writes a grounded answer with citations to your actual documents.
Demo Mode: When Ollama is offline, the system gracefully falls back to heuristic-based answers using keyword extraction, entity analysis, and structured data parsing β no LLM required.
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (HTML/CSS/JS + marked.js) β
β Chat Β· Search Β· Batch Β· KG Β· Analytics Β· Upload β
βββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β FastAPI REST + WebSocket Server (Lifespan) β
β 40+ endpoints Β· Async ops Β· Rate limiting β
βββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
βββββββββββ¬ββββββββββββ¬βββββββββββΌβββββββββββ¬βββββββββββ¬βββββββββββ
β Stage 1 β Stage 2 β Stage 3 β Stage 4 β Stage 5 β Features β
β LOAD β CHUNK β EMBED β RETRIEVE β ANSWER β v3.1 β
βββββββββββΌββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌβββββββββββ€
β PyMuPDF β Sentence β MiniLM β Hybrid β Ollama / β Versions β
β Paddle β Unicode β FAISS β FAISS+ β OpenAI β BatchQA β
β OCR β Table- β IDMap β BM25+RRF β LCEL β Search β
β openpyxlβ Atomic β Auto-IVF β Cross- β Stream β Analyticsβ
β pptx β Dedup β Export β Encoder β Demo β KG β
β BS4 β SHA-256 β Threadπβ Rerank β History β Crawl β
βββββββββββ΄ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ
Document β OCR/Parse β Sentence Chunk β Embed (384D) β FAISS Index (IDMap)
β
Question β Embed β FAISS Search (top 20) β BM25 + RRF β Rerank (top 5)
β
LLM (Ollama/OpenAI/Demo) β Answer + Sources
β
Cache β Session β Analytics β WebSocket
# Start everything (app + Ollama)
docker-compose up -d
# Pull the LLM model (first time only)
docker exec rag-ollama ollama pull mistral
# Open the UI
open http://localhost:8000Prerequisites:
- Python 3.10+
- Ollama installed and running (optional β Demo Mode works without it)
# 1. Clone & install
git clone https://github.com/kulkarnishub377/Document-AI---RAG-Pipeline.git
cd Document-AI---RAG-Pipeline
pip install -r requirements.txt
# 2. Pull the LLM model (optional)
ollama pull mistral
# 3. Configure (optional)
cp .env.example .env
# Edit .env to customize settings (LLM provider, CORS, etc.)
# 4. Run application
python run.pyOpen http://localhost:8000 in your browser.
cp .env.example .env
# Edit .env:
# LLM_PROVIDER=openai
# OPENAI_API_KEY=sk-your-key-here
# OPENAI_MODEL=gpt-3.5-turbo
pip install langchain-openai
python run.pyDocuAI Studio/
βββ api/
β βββ app.py # FastAPI REST + WebSocket server (40+ endpoints)
βββ chunking/
β βββ semantic_chunker.py # Unicode-aware sentence chunking with dedup
βββ embedding/
β βββ vector_store.py # FAISS + BM25 hybrid index (IDMap, RRF, auto-IVF)
βββ features/ # Feature Modules
β βββ knowledge_graph.py # Entity extraction + relationship mapping
β βββ collaboration.py # WebSocket real-time multi-user Q&A
β βββ pdf_annotator.py # PDF highlighting with source passages
β βββ comparator.py # Document comparison analysis
β βββ evaluation.py # RAGAS-inspired evaluation metrics
β βββ query_analytics.py # π v3.1 Query analytics tracker
βββ frontend/
β βββ index.html # UI with Search, Batch, KG, Analytics views
β βββ css/style.css # Premium dark/light glassmorphic theme
β βββ js/app.js # Client logic (markdown, streaming, mobile)
βββ ingestion/
β βββ document_loader.py # Multi-format loader (PDF/Excel/PPTX/CSV/Image/Web)
βββ llm/
β βββ prompt_chains.py # Ollama/OpenAI chains + streaming + demo mode
βββ retrieval/
β βββ reranker.py # Cross-encoder reranking
βββ utils/
β βββ cache.py # LRU query cache with TTL
β βββ rate_limiter.py # Sliding-window rate limiter
β βββ sessions.py # SQLite persistent chat sessions
β βββ exceptions.py # Custom exception hierarchy
βββ tests/
β βββ conftest.py # Shared test fixtures
β βββ test_api.py # 30+ API endpoint tests
β βββ test_chunker.py # Chunking unit tests
β βββ test_config.py # Config override tests
β βββ test_pipeline.py # Pipeline integration tests
β βββ test_reranker.py # Reranker unit tests
β βββ test_vector_store.py # Vector store unit tests
βββ config.py # Central configuration (env-driven, 40+ settings)
βββ pipeline.py # Pipeline orchestrator (batch, versioning, export)
βββ run.py # Entry point
βββ Dockerfile # Container build (with healthcheck)
βββ docker-compose.yml # Full stack: app + Ollama
βββ requirements.txt # Python dependencies (pinned)
βββ CHANGELOG.md # π Version history
βββ .env.example # Configuration template
βββ README.md # This file
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Serve frontend UI |
GET |
/health |
Health check with version |
GET |
/status |
Index stats + Ollama status |
GET |
/analytics |
Storage, cache, document breakdown |
| Method | Endpoint | Description |
|---|---|---|
POST |
/ingest |
Upload a single document (PDF/Excel/PPTX/Image/DOCX/CSV/TXT) |
POST |
/ingest/url |
Ingest a web URL |
POST |
/ingest/async |
π Non-blocking ingestion with task ID |
| Method | Endpoint | Description |
|---|---|---|
POST |
/query |
Ask a question (sync, with source filtering & chat history) |
POST |
/query-stream |
Ask a question (streaming SSE) |
POST |
/query/batch |
π Ask multiple questions at once |
POST |
/search |
π Semantic search (no LLM) |
POST |
/summarize |
Summarize documents by topic |
POST |
/extract |
Extract structured fields as JSON |
POST |
/table-query |
Ask about tables |
POST |
/compare |
Compare two documents |
POST |
/annotate |
Q&A with highlighted PDF export |
POST |
/evaluate |
Run RAGAS evaluation on a query |
| Method | Endpoint | Description |
|---|---|---|
POST |
/sessions |
Create a new chat session |
GET |
/sessions |
List recent sessions |
GET |
/sessions/{id} |
Get session details |
GET |
/sessions/{id}/messages |
Get messages in a session |
DELETE |
/sessions/{id} |
Delete a session |
| Method | Endpoint | Description |
|---|---|---|
GET |
/knowledge-graph |
Full graph data (nodes + edges) |
GET |
/knowledge-graph/search |
Search entities by type |
POST |
/knowledge-graph/reset |
Clear the knowledge graph |
| Method | Endpoint | Description |
|---|---|---|
GET |
/documents |
List all indexed documents |
DELETE |
/document/{filename} |
Delete a specific document |
POST |
/clear |
Clear the entire index |
GET |
/versions |
π Get version history for all docs |
GET |
/versions/{filename} |
π Get version history for a document |
GET |
/export |
π Download index as zip |
POST |
/import |
π Upload and restore index from zip |
GET |
/download/{filename} |
Download an uploaded file |
| Method | Endpoint | Description |
|---|---|---|
GET |
/crawl/urls |
π List scheduled crawl URLs |
POST |
/crawl/add |
π Add a URL to the crawl schedule |
POST |
/crawl/remove |
π Remove a URL from the schedule |
POST |
/crawl/run |
π Manually trigger a crawl |
| Method | Endpoint | Description |
|---|---|---|
GET |
/query-analytics |
π Query frequency & response stats |
POST |
/query-analytics/clear |
π Clear analytics data |
GET |
/evaluate/dashboard |
RAGAS evaluation dashboard |
GET |
/evaluate/history |
Evaluation history log |
POST |
/evaluate/clear |
Clear evaluation history |
| Method | Endpoint | Description |
|---|---|---|
GET |
/cache/stats |
Cache statistics |
POST |
/cache/clear |
Clear query cache |
| Method | Endpoint | Description |
|---|---|---|
GET |
/tasks/{id} |
π Get async task status |
# 1. Ingest a PDF
curl -X POST http://localhost:8000/ingest -F "[email protected]"
# 2. Ask a question
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the total amount on the invoice?"}'curl -X POST http://localhost:8000/query/batch \
-H "Content-Type: application/json" \
-d '{"questions": ["What is the total?", "Who signed?", "When is the due date?"]}'curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "payment terms", "top_k": 10}'# Export
curl -o backup.zip http://localhost:8000/export
# Import
curl -X POST http://localhost:8000/import -F "[email protected]"All settings can be configured via environment variables or a .env file:
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
ollama |
π ollama or openai |
OLLAMA_MODEL |
mistral |
LLM model name |
OLLAMA_VISION_MODEL |
llava |
Vision model for image extraction |
OPENAI_API_KEY |
β | π Required when LLM_PROVIDER=openai |
OPENAI_MODEL |
gpt-3.5-turbo |
π OpenAI model to use |
LLM_TIMEOUT_SECS |
120 |
π LLM request timeout in seconds |
EMBED_MODEL_NAME |
all-MiniLM-L6-v2 |
Embedding model |
CHUNK_SIZE |
512 |
Chunk size in characters |
RETRIEVAL_TOP_K |
20 |
FAISS candidates to retrieve |
RERANKER_TOP_K |
5 |
Final results after reranking |
IVF_THRESHOLD |
50000 |
π Auto-IVF upgrade threshold |
ENABLE_GPU |
auto |
GPU mode: auto, true, false |
MULTILINGUAL_MODE |
false |
Auto-detect document language |
MAX_FILE_SIZE_MB |
50 |
Max upload size |
CORS_ORIGINS |
* |
π Configurable CORS origins |
CACHE_ENABLED |
true |
Enable query caching |
RATE_LIMIT_ENABLED |
true |
Enable rate limiting |
KNOWLEDGE_GRAPH_ENABLED |
true |
Enable KG extraction |
CRAWL_ENABLED |
false |
π Enable scheduled web crawling |
CRAWL_INTERVAL_MINS |
1440 |
π Crawl interval (default: 24h) |
QUERY_ANALYTICS_ENABLED |
true |
π Enable query analytics tracking |
DOC_VERSIONING_ENABLED |
true |
π Enable document versioning |
WS_ENABLED |
true |
Enable WebSocket collaboration |
LOG_FORMAT |
text |
Log format: text or json |
See .env.example for the full list.
Tip: Improve answer quality with stronger models like
qwen2.5:14borllama3.1:8bforOLLAMA_MODEL, andllava:13bforOLLAMA_VISION_MODEL.
- Multi-format document support (PDF, Image, DOCX, Excel, PPTX, CSV) β
- Persistent conversation sessions with SQLite β
- Knowledge graph extraction β
- Document comparison β
- PDF annotation export β
- Real-time WebSocket collaboration β
- Query caching with TTL β
- API rate limiting β
- Evaluation dashboard using RAGAS metrics β
- Configurable LLM providers (Ollama / OpenAI) β
- Batch Q&A with CSV export β
- Query analytics dashboard β
- Document versioning β
- Export / Import index β
- Scheduled web crawling β
- Semantic search endpoint β
- Async ingestion with task tracking β
- User authentication for multi-user deployments
- Webhook support for document change notifications
- OCR confidence metrics
Ollama connection refused
Make sure Ollama is running: ollama serve
Check it responds: curl http://localhost:11434/api/tags
π‘ If Ollama is unavailable, the app works in Demo Mode with heuristic answers.
PaddleOCR first run is slow It downloads ~45 MB of model weights on first OCR call. This is normal β subsequent runs are fast.
Out of memory during query
Switch to a smaller LLM: set OLLAMA_MODEL=llama3.2:3b in .env
Or reduce RETRIEVAL_TOP_K=10 to process fewer candidates.
FAISS index not found error
Ingest at least one document first before querying:
curl -X POST http://localhost:8000/ingest -F "[email protected]"
Rate limit exceeded
Increase RATE_LIMIT_REQUESTS and RATE_LIMIT_WINDOW in .env, or set RATE_LIMIT_ENABLED=false.
OpenAI API errors
Check your OPENAI_API_KEY is valid and has credit. Set LLM_PROVIDER=ollama to fall back to local mode.
# Run all tests
python -m pytest tests/ -v
# Run specific test file
python -m pytest tests/test_api.py -v
# Run with coverage
python -m pytest tests/ --cov=. --cov-report=term-missing- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Write tests for your changes
- Run
python -m pytest tests/ -vto make sure everything passes - Submit a Pull Request
See CONTRIBUTING.md for detailed guidelines.
MIT License β see LICENSE for details.




