Document intelligence platform for managing company knowledge using PageIndex (vectorless RAG).
- Documents: PDF, Word (.docx), Excel (.xlsx), PowerPoint (.pptx)
- Interface: Chat + Search
- Access: Single company, multi-user
- Auth: Username/password, all users can view/edit/add
- Hosting: Replit (primary) or cloud alternative
- Backend: Python (FastAPI)
- Frontend: React or simple HTML/JS
- RAG Engine: PageIndex (vectorless, reasoning-based)
- Database: SQLite (dev) → PostgreSQL (prod)
- Auth: Simple JWT or session-based
- File Processing:
- PDF: PyMuPDF / pdfplumber
- Word: python-docx
- Excel: openpyxl
- PowerPoint: python-pptx
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Frontend │────▶│ FastAPI │────▶│ PageIndex │
│ (React/JS) │ │ Backend │ │ Engine │
└─────────────┘ └─────────────┘ └─────────────┘
│
┌──────┴──────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ Database │ │ Files │
│ (Users, │ │ Storage │
│ Docs) │ │ │
└──────────┘ └──────────┘
- MVP — Upload docs, build index, chat query
- Search — Add search bar alongside chat
- Auth — User login/registration
- Polish — UI improvements, error handling
- Project created
- Full PageIndex integration (native library, not approximation)
- Document upload/processing (PDF, Word, Excel, PowerPoint)
- Chat interface
- Two-stage search (optimized for 4000+ documents)
- User authentication (JWT)
- Deployment to Replit
- Production hardening
Designed for 4000+ documents:
- Two-stage search: filters documents first, then deep-searches relevant ones
- Async processing pipeline
- Efficient tree structures (no vector DB overhead)
cd app
# Create virtual environment
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# Install dependencies
pip install -r requirements.txt
# Configure
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
# Run
python main.pyapp/
├── main.py # FastAPI application
├── models.py # Database models
├── auth.py # Authentication
├── config.py # Settings
├── document_processor.py # Document extraction & indexing
├── search.py # RAG search logic
├── requirements.txt # Dependencies
├── .env.example # Config template
└── static/
└── index.html # Frontend UI
Created: 2026-02-05