Your personal research assistant that remembers everything you've ever read.
Docify is an open-source, local-first AI application that lets you upload any resource (PDFs, URLs, documents, images, code), ask questions about them, and receive cited, grounded answers—all while keeping your data completely private.
- 🔒 Privacy-First: All processing happens locally (embeddings, LLM, storage)
- 🧠 Smart Deduplication: Content-based fingerprinting prevents duplicate processing
- 📚 Multi-Format Support: PDF, URL, Word, Excel, Markdown, images (OCR), code, and more
- 💬 Cited Answers: Every response includes citations to source documents
- 🔍 Hybrid Search: Combines semantic (vector) and keyword (BM25) search
- 🤖 Local LLM: Runs Mistral 7B via Ollama (optional cloud LLM support)
- 🌐 Workspace Model: Personal, team, or hybrid collaboration
- 🚀 One-Command Setup: Docker Compose orchestration
Docify's RAG pipeline integrates 11 core services:
- Resource Ingestion - Upload, parse, deduplicate
- Chunking - Semantic boundary preservation
- Embeddings (Async) - Vector generation via Celery
- Query Expansion - Better recall with variants
- Hybrid Search - Semantic + keyword (BM25)
- Re-Ranking - 5-factor scoring + conflict detection
- Context Assembly - Token budget management
- Prompt Engineering - Anti-hallucination prompts
- LLM Service - Ollama/OpenAI/Anthropic support
- Citation Verification - Verify claims against sources
- Message Generation - Full pipeline orchestration
See ARCHITECTURE.md for complete technical details.
- Docker & Docker Compose
- 8GB RAM minimum (16GB recommended)
- 20GB disk space (for models and data)
macOS / Linux:
# Clone the repository
git clone https://github.com/keshavashiya/docify.git
cd docify
# Run the setup script (handles everything!)
./scripts/setup.shWindows (PowerShell):
# Clone the repository
git clone https://github.com/keshavashiya/docify.git
cd docify
# Run the setup script (handles everything!)
.\scripts\setup.ps1That's it! The setup script will:
- ✅ Check prerequisites (Docker, memory, disk space)
- ✅ Create environment configuration
- ✅ Start all Docker services
- ✅ Initialize the database with pgvector
- ✅ Download AI models (~4GB, may take 10-15 min)
- ✅ Verify everything is working
Options:
# macOS / Linux
./scripts/setup.sh --skip-models # Skip model download (faster setup)
./scripts/setup.sh --reset # Reset everything and start fresh
./scripts/setup.sh --help # Show all options
# Windows (PowerShell)
.\scripts\setup.ps1 -SkipModels # Skip model download
.\scripts\setup.ps1 -Reset # Reset everything
.\scripts\setup.ps1 -Help # Show all optionsmacOS / Linux:
./scripts/start.sh # Start Docify (quick start for daily use)
./scripts/start.sh --logs # Start and follow logs
./scripts/start.sh --stop # Stop all services
./scripts/start.sh --status # Show service statusWindows (PowerShell):
.\scripts\start.ps1 # Start Docify
.\scripts\start.ps1 -Logs # Start and follow logs
.\scripts\start.ps1 -Stop # Stop all services
.\scripts\start.ps1 -Status # Show service status- Frontend: http://localhost:3000
- API Docs & Testing: http://localhost:8000/docs
- Health Endpoint: http://localhost:8000/api/health
# Check if all containers are running
docker-compose ps
# Test API health
curl http://localhost:8000/api/health
# Monitor system resources
docker stats docify-ollama docify-backend
# View logs
docker-compose logs -f backend
docker-compose logs -f celery-worker📋 Manual Setup (Advanced Users)
If you prefer to run each step manually:
# Clone and enter directory
git clone https://github.com/keshavashiya/docify.git
cd docify
# Copy environment configuration
cp .env.example .env
# Start all services
docker-compose up -d --build
# Wait for services to be healthy (~2-3 minutes)
docker-compose ps
# Initialize database (one-time setup)
docker-compose exec postgres psql -U docify -d docify -c "CREATE EXTENSION IF NOT EXISTS vector"
docker-compose exec backend alembic upgrade head
# Download optimized models (one-time, ~4GB total)
docker-compose exec ollama ollama pull mistral:7b-instruct-q4_0
docker-compose exec ollama ollama pull all-minilm:22m
# Restart services with models loaded
docker-compose restart backend celery-workercd backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start development server (requires running docker-compose services)
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
# Install dependencies
npm install
# Start development server
npm run devBackend
- FastAPI (Python 3.10+)
- PostgreSQL 15+ with pgvector
- Celery + Redis (async tasks)
- Ollama (local LLM: mistral:7b-instruct-q4_0, all-minilm:22m)
- sentence-transformers optional (OpenAI/Anthropic support)
Frontend
- React 18+ with TypeScript
- Vite, Tailwind CSS
- React Query, Zustand
Infrastructure
- Docker & Docker Compose
- Alembic (database migrations)
curl -X POST "http://localhost:8000/api/resources/upload" \
-F "file=@research_paper.pdf" \
-F "workspace_id=<your-workspace-id>"curl -X POST "http://localhost:8000/api/search" \
-H "Content-Type: application/json" \
-d '{"query": "What is RAG?", "workspace_id": "<id>"}'curl -X POST "http://localhost:8000/api/conversations/<id>/messages" \
-H "Content-Type: application/json" \
-d '{"content": "Explain the main findings", "role": "user"}'# Start all services
docker-compose up -d
# View logs (all services)
docker-compose logs -f
# View logs for specific service
docker-compose logs -f backend
docker-compose logs -f celery-worker
# Stop all services
docker-compose down
# Stop and remove data (WARNING: deletes all data)
docker-compose down -v
# Restart specific service
docker-compose restart backendIf you get "port already in use" errors:
# PostgreSQL: Docify uses 5433 (standard is 5432)
# Redis: Docify uses 6380 (standard is 6379)
# Backend: Docify uses 8000
# Frontend: Docify uses 3000
# Ollama: Docify uses 11434
# Check what's using a port (macOS/Linux)
lsof -i :8000
# Kill process (if needed)
kill -9 <PID>Use the built-in API documentation:
- Open http://localhost:8000/docs in your browser
- Try requests directly in Swagger UI
- All endpoints are documented with request/response schemas
Alternatively, use curl:
# Health check
curl http://localhost:8000/api/health
# List workspaces
curl http://localhost:8000/api/workspaces
# Create workspace
curl -X POST http://localhost:8000/api/workspaces \
-H "Content-Type: application/json" \
-d '{"name":"My Workspace","workspace_type":"personal"}'MIT License - see LICENSE file for details
- Built with FastAPI
- Powered by Ollama
- Vector search by pgvector
- Embeddings by sentence-transformers
Made with ❤️ for researchers, students, and knowledge workers
