A production-ready Retrieval-Augmented Generation (RAG) chatbot system that allows you to chat with your documents using local LLMs. Upload files, YouTube videos, or web pages, and have intelligent conversations with the content.
- Features
- Architecture
- Tech Stack
- Installation
- Quick Start
- API Reference
- Project Structure
- Usage Examples
- Testing
- Troubleshooting
- Performance Tips
- Contributing
- π Multi-Source Document Upload - PDF, TXT, Markdown, YouTube, Web pages
- π§ Intelligent Conversation - Context-aware with conversation history
- π Semantic Search - Vector-based retrieval with source attribution
- π Automatic Pipeline - One-call upload to storage
- πΎ Session Management - Multiple concurrent conversations
- ποΈ Modular Architecture - Clean separation of concerns
- π RESTful API - Easy frontend integration
- π― Local LLM - Privacy-focused with Ollama
- π Monitoring - Built-in stats and health checks
- π‘οΈ Error Handling - Comprehensive error management
Client β Flask API β FileManager/ChatSession β DocumentProcessor/LLMManager
β VectorStore β Ollama
Data Flow:
- Upload: Document β Extract β Chunk β Embed β Store
- Chat: Question β Retrieve Context β LLM β Answer with Sources
| Component | Technology |
|---|---|
| Backend | Flask 3.0+ |
| LLM | Ollama (llama3.2) |
| Embeddings | nomic-embed-text |
| Vector DB | ChromaDB |
| Framework | LangChain |
| PDF Processing | PyPDF2 |
| Audio | OpenAI Whisper |
| Video | yt-dlp |
- Python 3.8+
- Ollama
- FFmpeg (for YouTube support)
# 1. Install Ollama
# macOS: brew install ollama
# Linux: curl -fsSL https://ollama.ai/install.sh | sh
# Windows: Download from https://ollama.com/download
# 2. Start Ollama and pull models
ollama serve
ollama pull llama3.2
ollama pull nomic-embed-text
# 3. Install FFmpeg
# macOS: brew install ffmpeg
# Linux: sudo apt-get install ffmpeg
# 4. Clone and setup project
git clone https://github.com/Rnamrata/llm-chatbot.git
cd llm-chatbot
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# 5. Create directories
mkdir -p uploads/media chroma_dbflask==3.0.0
flask-cors==4.0.0
langchain==0.1.0
langchain-community==0.0.10
chromadb==0.4.22
openai-whisper==20231117
yt-dlp==2023.12.30
PyPDF2==3.0.1
requests==2.31.0
numpy==1.24.3# Terminal 1: Start Ollama
ollama serve
# Terminal 2: Start Flask Server
cd llm-chatbot
source .venv/bin/activate
python main.py
# Terminal 3: Test
python test_system.pyServer will be available at: http://localhost:5001
Base URL: http://localhost:5001
| Endpoint | Method | Body | Description |
|---|---|---|---|
/upload/file |
POST | file: <file> |
Upload PDF/TXT/MD |
/upload/youtube |
POST | {"url": "..."} |
Transcribe YouTube video |
/upload/web |
POST | {"url": "..."} |
Scrape web page |
| Endpoint | Method | Body | Description |
|---|---|---|---|
/chat/new |
POST | - | Create new session |
/chat |
POST | {"question": "...", "session_id": "..."} |
Send message |
/chat/history/{id} |
GET | - | Get conversation history |
/chat/session/{id} |
GET | - | Get session info |
/chat/clear/{id} |
DELETE | - | Clear session |
/chat/sessions |
GET | - | List all sessions |
/chat/cleanup |
POST | {"inactive_hours": 24} |
Remove old sessions |
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/stats |
GET | Database statistics |
Upload Success:
{
"success": true,
"chunks_created": 45,
"filename": "document.pdf"
}Chat Response:
{
"success": true,
"answer": "Machine learning is...",
"sources": [...],
"num_sources": 5,
"session_id": "abc-123"
}llm-chatbot/
βββ main.py # Flask app entry point
βββ requirements.txt # Dependencies
βββ README.md # Documentation
β
βββ src/modules/
β βββ file_manager.py # Upload & processing
β βββ document_processor.py # Text extraction & chunking
β βββ vector_store_and_embedding.py # Vector operations
β βββ llm_manager.py # LLM management
β βββ chat_session.py # Session management
β
βββ test/
β βββ test_system.py # Test suite
β
βββ uploads/ # Uploaded files
β βββ media/ # YouTube audio
β
βββ chroma_db/ # Vector database
import requests
BASE_URL = 'http://localhost:5001'
# Upload PDF
with open('document.pdf', 'rb') as f:
requests.post(f'{BASE_URL}/upload/file', files={'file': f})
# Create session
response = requests.post(f'{BASE_URL}/chat/new')
session_id = response.json()['session_id']
# Ask question
response = requests.post(f'{BASE_URL}/chat', json={
'question': 'What is this document about?',
'session_id': session_id
})
print(response.json()['answer'])# Upload YouTube video
requests.post(f'{BASE_URL}/upload/youtube',
json={'url': 'https://www.youtube.com/watch?v=...'})
# Chat with transcription
response = requests.post(f'{BASE_URL}/chat',
json={'question': 'Summarize the video'})# Upload web page
curl -X POST http://localhost:5001/upload/web \
-H "Content-Type: application/json" \
-d '{"url": "https://en.wikipedia.org/wiki/Python_(programming_language)"}'
# Chat
curl -X POST http://localhost:5001/chat \
-H "Content-Type: application/json" \
-d '{"question": "What is Python?"}'
# Get stats
curl http://localhost:5001/statspython test_system.pyUse Postman, cURL, or Python requests to test endpoints. See Usage Examples.
| Issue | Solution |
|---|---|
| "Ollama call failed 404" | Run ollama pull llama3.2 and ollama pull nomic-embed-text |
| "Connection refused" | Ensure both Ollama (ollama serve) and Flask (python main.py) are running |
| "415 Unsupported Media Type" | Add header: -H "Content-Type: application/json" |
| "FFmpeg not found" | Install FFmpeg: brew install ffmpeg (macOS) or apt-get install ffmpeg (Linux) |
| Slow responses | Use smaller model (phi3), reduce chunk size, or decrease k parameter |
| Out of memory | Clean up sessions: POST to /chat/cleanup |
-
Choose Right Model
- Fast:
phi3 - Balanced:
llama3.2:1b - Best:
llama3.2(default)
- Fast:
-
Optimize Chunk Size
# In document_processor.py
chunk_size=500, # Smaller = faster
chunk_overlap=50- Reduce Retrieved Chunks
{"question": "...", "k": 3} # Default is 5- Regular Cleanup
# Clean up old sessions daily
curl -X POST http://localhost:5001/chat/cleanup \
-H "Content-Type: application/json" \
-d '{"inactive_hours": 24}'Planned Features:
- User authentication & authorization
- Database persistence (PostgreSQL)
- Frontend UI (React/Vue)
- More file formats (DOCX, PPTX, CSV)
- Streaming responses
- Document update/deletion
- Export conversations
- Docker containerization
- Analytics dashboard
Contributions welcome! Please:
- Fork the repository
- Create feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add feature') - Push to branch (
git push origin feature/AmazingFeature) - Open Pull Request
Code Style:
- Follow PEP 8
- Add docstrings
- Include type hints
- Write tests
Namrata Roy
- GitHub: @Rnamrata
- Email: [email protected]
- LangChain - RAG framework
- Ollama - Local LLM
- ChromaDB - Vector database
- Flask - Web framework
- Whisper - Transcription
Built using Python, LangChain, and Ollama