This project demonstrates an end-to-end Retrieval-Augmented Generation (RAG) workflow using your own documents, Chroma vector storage, Gemini embeddings, and a local Docker-served LLM for offline-style inference.
It is organized as a practical notebook journey so you can learn each stage step by step, then run question answering against your private knowledge base.
- Build a RAG pipeline from scratch using your own data.
- Ingest and chunk both plain text and PDF documents.
- Generate embeddings with
models/gemini-embedding-2-preview. - Persist vectors in a local Chroma database (
chroma_db/). - Perform similarity search and retrieval over your indexed data.
- Run an interactive retrieval + answer workflow.
- Switch generation from cloud to a local Docker Model Runner endpoint using an OpenAI-compatible API (
http://localhost:12434/engines/v1).
- Your source documents remain in your own environment.
- Vector database is stored locally and reusable across notebooks.
- Local model serving reduces dependency on external generation APIs for inference.
- The notebook flow makes experimentation and learning easier.
rag-implementation-with-own-data/
├── README.md
├── pyproject.toml
├── requirements.txt
├── main.py
├── src/
│ ├── 1.0_RAG_With_Own_Text.ipynb
│ ├── 2.0_RAG_With_PDF.ipynb
│ ├── 3.0_Simalirity_Search_VectorDB.ipynb
│ └── 4.0_Docker_Model_Runner.ipynb
└── chroma_db/
└── ... (persisted vector index)
- Python
- LangChain (
langchain,langchain-classic,langchain-community) - ChromaDB (
chromadb,langchain-chroma) - Google GenAI (
google-genai,langchain-google-genai) for embedding and cloud LLM notebooks - OpenAI-compatible client (
langchain-openai) for local Docker model endpoint - PyPDF for PDF ingestion
- Jupyter notebooks for step-by-step implementation
Follow the notebooks in order:
- Loaded environment configuration.
- Initialized Gemini chat model for early testing.
- Created
Documentobjects from custom text data. - Split text into chunks using
RecursiveCharacterTextSplitter. - Embedded chunks using
GoogleGenerativeAIEmbeddings. - Saved embeddings to local Chroma (
../chroma_db).
- Loaded PDFs with
PyPDFDirectoryLoader. - Split documents into retrieval-ready chunks.
- Embedded and stored chunks in the same Chroma database.
- Built retrieval + generation chain with a prompt template.
- Queried the indexed PDF knowledge through RAG.
- Reopened the persisted Chroma vector store.
- Performed similarity-based retrieval.
- Connected retriever with a Gemini chat model.
- Added interactive query handling via widgets for easier testing.
- Reused the same embedding model and same Chroma store.
- Replaced cloud generation model with local Docker-served model:
- Model:
ai/qwen3:0.6B-F16 - Base URL:
http://localhost:12434/engines/v1
- Model:
- Built retrieval chain exactly as before.
- Answered questions using your indexed data with a locally served model backend.
git clone <your-repo-url>
cd rag-implementation-with-own-datapython -m venv .venv
source .venv/bin/activateUse either requirements.txt or pyproject.toml workflow.
pip install -r requirements.txtCreate a .env file in the project root:
GEMINI_API_KEY=your_gemini_api_keyEven when you use a local Docker-served LLM for generation, this project still uses Gemini embeddings for vector creation/loading in current notebooks.
Start Jupyter and execute notebooks in this order:
src/1.0_RAG_With_Own_Text.ipynbsrc/2.0_RAG_With_PDF.ipynbsrc/3.0_Simalirity_Search_VectorDB.ipynbsrc/4.0_Docker_Model_Runner.ipynb
jupyter notebookuv run streamlit run src/app.pylink: http://localhost:8501
For the local model notebook (4.0_Docker_Model_Runner.ipynb) to work, ensure:
- A Docker-based model runner is running locally.
- It exposes an OpenAI-compatible endpoint at:
http://localhost:12434/engines/v1
- The selected model (
ai/qwen3:0.6B-F16) is available in that runner.
If the service is not running, retrieval chain setup will succeed, but answer generation calls will fail.
- Your vector index is stored locally in
chroma_db/. - Only data you load gets indexed.
- Keep API keys private and never commit
.env. - Review model and API usage policies before production use.
- Main implementation is notebook-based (not yet packaged as production modules).
- Embeddings currently depend on Gemini API.
- Local model quality and latency depend on your hardware/model size.
- The repository does not include Docker compose files; it assumes a running local model runner endpoint.
- Add scripted CLI pipeline (
ingest,query,evaluate). - Add full local embedding option for fully offline operation.
- Add Docker Compose for one-command local startup.
- Add automated tests and benchmark notebook outputs.
This project shows how to:
- Build a local vector knowledge base from your own text/PDF files.
- Retrieve relevant context with similarity search.
- Generate final answers with either cloud or local models.
- Use Docker-served local models while keeping the same retrieval architecture.