A production-focused Retrieval-Augmented Generation (RAG) system for PDF ingestion and grounded Q&A.
This repository includes:
- A root application optimized for fast iteration with Streamlit, FastAPI, and Inngest workflow orchestration.
- A packaged implementation under rag-engine with modular API, core services, and tests.
The goal is practical document intelligence, not notebook demos:
- Multi-provider LLM and embedding support: OpenAI, Gemini, Claude, Ollama, and local fallback paths.
- Durable workflow mode using Inngest step functions.
- Local-first resilience with graceful fallback when remote dependencies are unavailable.
- Source-aware answers with retrieval-backed context.
PDF Upload -> Chunking -> Embeddings -> Qdrant -> Top-K Retrieval -> LLM Answer
| |
+--------- Inngest Step Functions ----------+
Core design patterns:
- Deterministic IDs for idempotent re-ingestion.
- Provider abstraction via environment configuration.
- Graceful degradation for vector store and model services.
- Preflight diagnostics in UI for developer-friendly troubleshooting.
.
├── main.py # FastAPI + Inngest functions (root app)
├── streamlit_app.py # Streamlit UI with local fallback + preflight checks
├── data_loader.py # PDF loading, chunking, embeddings
├── vector_db.py # Qdrant integration with local fallback
├── custom_types.py # Pydantic models
├── qdrant_storage/ # Embedded/local Qdrant data path
├── uploads/ # Uploaded PDFs
├── doc.md # Extended technical walkthrough
└── rag-engine/ # Packaged production structure
├── src/rag_engine/
├── tests/
├── docker/
└── pyproject.toml
python -m venv .venv
(Set-ExecutionPolicy -Scope Process -ExecutionPolicy RemoteSigned) ; (& .\.venv\Scripts\Activate.ps1)python -m pip install --upgrade pip
python -m pip install -e .python -m streamlit run streamlit_app.pyOpen: http://127.0.0.1:8501
This mode is enough to ingest and query with local fallback behavior.
Run each service in a separate terminal from repository root.
$env:INNGEST_DEV='1'
python -m uvicorn main:app --host 127.0.0.1 --port 8000.\.tools\inngest\inngest.exe dev -u http://127.0.0.1:8000/api/inngest --no-discoverypython -m streamlit run streamlit_app.pyExpected endpoints:
- Streamlit: http://127.0.0.1:8501
- FastAPI Inngest endpoint: http://127.0.0.1:8000/api/inngest
- Inngest Dev Server: http://127.0.0.1:8288
The root app reads configuration from .env.
Important variables:
- LLM_PROVIDER: openai | gemini | claude | ollama | local
- EMBED_PROVIDER: openai | gemini | ollama | local
- OLLAMA_BASE_URL
- OLLAMA_MODEL
- OLLAMA_EMBED_MODEL
- INNGEST_ENABLED: true | false
- INNGEST_DEV: 1 | 0
- INNGEST_API_BASE: usually http://127.0.0.1:8288/v1
- INNGEST_EVENT_API_BASE: usually http://127.0.0.1:8288
- QDRANT_PATH: local embedded storage path
- EMBED_DIM: should match embedding model output dimensions
Note: Keep secrets such as API keys in .env and never commit them.
If you want the package-based API/UI implementation:
cd rag-engine
python -m pip install -e .[dev]
python -m uvicorn rag_engine.api.app:app --reload --port 8000Optional commands (from rag-engine):
pytest
ruff check src testsFrom rag-engine directory:
docker compose -f docker/docker-compose.yml up --buildThis launches Qdrant and the packaged API service.
- Verify FastAPI is running on port 8000.
- Verify Inngest Dev Server is running on port 8288.
- Confirm .env has INNGEST_API_BASE and INNGEST_EVENT_API_BASE set to localhost endpoints.
- The app will attempt local embedded fallback using QDRANT_PATH.
- Ensure the process can write to qdrant_storage.
- Check provider keys and model names in .env.
- For Ollama, confirm OLLAMA_BASE_URL and model availability.
- Do not commit .env files with real secrets.
- Use local model providers for sensitive workloads when data residency is required.
- Restrict uploaded document handling to trusted sources and controlled environments.
- Multi-provider runtime routing for both embeddings and generation.
- Local fallback mode for degraded infrastructure conditions.
- Deterministic ingestion IDs for idempotency.
- Source-aware answers to improve auditability and trust.
Add your preferred license file (MIT, Apache-2.0, etc.) at repository root.
Built for real-world RAG operations by Adil Shamim.