An intelligent document semantic search and Q&A engine built with FastAPI, React, and Sentence Transformers. This application allows users to upload documents in multiple formats and perform semantic searches or ask natural language questions to find relevant content based on meaning rather than exact keyword matching.
- Multi-Mode Engine (v2):
- Fast Mode: Uses
all-MiniLM-L6-v2(384d) for rapid embedding andms-marco-MiniLM-L-6-v2for efficient reranking. Ideal for standard hardware and quick queries. - Pro Mode: Leverages
ibm-granite/granite-embedding-english-r2(1024d) andgranite-rerankerfor professional-grade accuracy and high-fidelity results.
- Fast Mode: Uses
- Smart Q&A Mode (v2): Ask direct questions to your documents. The system uses a specialized extractive QA model (
deepset/tinyroberta-squad2) to pinpoint exact answers and provide citations. - Multi-format Support: Upload and process PDF, DOCX, TXT, MD, and CSV files.
- Hybrid Search Engine: Combines efficient vector retrieval (Bi-Encoder) with high-precision reranking (Cross-Encoder).
- Smart Highlighting: Pinpoints the exact sentence within a chunk that answers the query.
- FAISS Indexing: Fast and efficient similarity search using Facebook AI Similarity Search.
- Index Management: Tracks which mode was used to build each index, allowing seamless switching between different model configurations.
- Modern UI: React-based 3-column layout with optimized UX, mode selection, and persistent search history.
This application employs a state-of-the-art Retrieve & Rerank pipeline to deliver high-quality search results. Here is the breakdown for computer science practitioners:
Raw text extraction from PDFs often results in "hard wraps" (newlines in the middle of sentences). We implement a heuristic text cleaner (backend/main.py) that uses lookahead strategies to detect hyphenation and join broken lines, ensuring the embedding model receives coherent sentences rather than fragmented tokens.
We use a Bi-Encoder architecture to map both documents and queries into a shared vector space.
-
Fast Mode: Uses
all-MiniLM-L6-v2(384 dimensions). -
Pro Mode: Uses
ibm-granite/granite-embedding-english-r2(1024 dimensions). -
Indexing: Document chunks are embedded and stored in a FAISS (Facebook AI Similarity Search) index for
$O(1)$ approximate nearest neighbor lookup. -
Retrieval: For a user query
$Q$ , we retrieve the top$N$ candidates (where$N = K \times 5$ ) based on cosine similarity. This stage prioritizes Recall, ensuring the relevant content is likely in the candidate pool.
The top candidates from Stage 1 are passed to a Cross-Encoder.
-
Fast Mode: Uses
cross-encoder/ms-marco-MiniLM-L-6-v2. We apply a Sigmoid function to the raw logits to normalize scores to a [0, 1] probability range. -
Pro Mode: Uses
ibm-granite/granite-embedding-reranker-english-r2. - Mechanism: Unlike Bi-Encoders which process inputs independently ($f(A) \cdot f(B)$), a Cross-Encoder processes the pair simultaneously ($f(A, B)$). This allows the model's self-attention layers to attend to the interaction between specific query tokens and document tokens, capturing nuanced semantic relationships that vector dot products miss.
-
Result: We re-score the candidates and sort them to return the final top
$K$ results. This stage prioritizes Precision.
Optional When Q&A Mode is enabled:
- The system first performs the Hybrid Search (Stage 1 & 2) to retrieve the top 3 most relevant chunks.
- These chunks are concatenated to form a context window.
- A lightweight extractive QA model (
deepset/tinyroberta-squad2) processes the(Question, Context)pair. - The model identifies the start and end tokens of the answer span within the text.
- The system returns the specific answer string along with citations (source file and section).
For standard search results, we perform Sentence-Level Granularity Analysis:
- The retrieval unit (chunk) is often 500+ characters.
- We split the chunk into sentences and compute the similarity of each sentence against the query.
- The highest-scoring sentence is highlighted as the "Golden Snippet," directing the user's attention immediately to the answer.
graph TD
%% Styles
classDef input fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
classDef process fill:#fff9c4,stroke:#fbc02d,stroke-width:2px;
classDef storage fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
classDef decision fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px;
classDef output fill:#ffebee,stroke:#c62828,stroke-width:2px;
%% Nodes
UserDocs["📄 User Documents<br/>(PDF, DOCX, TXT, MD)"]:::input
DocProc["⚙️ DocumentProcessor<br/>Clean & Chunk Text"]:::process
ModeSelect{"🛠️ Select Mode<br/>Fast / Pro?"}:::decision
subgraph "Indexing Phase"
EmbedModel["🧠 Embedding Model<br/>(MiniLM / Granite)"]:::process
FAISS[("🗄️ FAISS Index<br/>Vector Store")]:::storage
Cache["💾 Disk Cache<br/>(Pickle + Metadata)"]:::storage
end
UserQuery["🔍 User Query / Question"]:::input
QueryEmbed["🧠 Encode Query"]:::process
ANN["⚡ FAISS ANN Search<br/>Retrieve Top N Candidates"]:::process
subgraph "Reranking Phase"
CrossEnc["⚖️ Cross-Encoder<br/>(MS-MARCO / Granite)"]:::process
ScoreNorm{"📉 Fast Mode?"}:::decision
Sigmoid["Math: Sigmoid(x)"]:::process
TopK["🏆 Select Top K Results"]:::process
end
QAMode{"❓ Q&A Mode?"}:::decision
subgraph "Q&A Extraction"
Context["📝 Build Context Window<br/>(Top 3 Chunks)"]:::process
QAModel["🤖 QA Pipeline<br/>(TinyRoberta)"]:::process
Answer["💬 Extracted Answer<br/>+ Citations"]:::output
end
SearchResults["📑 Ranked Snippets<br/>+ Smart Highlights"]:::output
%% Flow
UserDocs --> DocProc
DocProc --> ModeSelect
ModeSelect -->|Fast| EmbedModel
ModeSelect -->|Pro| EmbedModel
EmbedModel --> FAISS
FAISS --> Cache
UserQuery --> QueryEmbed
QueryEmbed --> ANN
FAISS -.-> ANN
ANN --> CrossEnc
CrossEnc --> ScoreNorm
ScoreNorm -->|Yes| Sigmoid
Sigmoid --> TopK
ScoreNorm -->|No| TopK
TopK --> QAMode
QAMode -->|Yes| Context
Context --> QAModel
QAModel --> Answer
QAMode -->|No| SearchResults
- Backend: FastAPI (Python) with Sentence Transformers, FAISS, and HuggingFace Pipelines.
- Frontend: React with Tailwind CSS.
- Proxy: Vite development server proxies API requests to backend.
- Python 3.10+
- Node.js 18+ and npm
- uv package manager (optional but recommended)
- Run the setup script:
setup.bat
-
Install Python dependencies:
uv sync # or pip install -r requirements.txt -
Install Node.js dependencies:
cd frontend npm install
-
Start the backend server:
run_backend.bat
-
In a new terminal, start the frontend:
run_frontend.bat
-
Open your browser to
http://localhost:5173.
-
Start the backend server:
./run_backend.sh
-
In a new terminal, start the frontend:
./run_frontend.sh
-
Open your browser to
http://localhost:5173.
- Select Mode: Toggle between Fast Mode and Pro Mode in the header.
- Upload Documents: Click "Choose Files" to upload one or more documents.
- Process Documents: Click "Process Documents" to create embeddings for the selected mode.
- Search: Enter your query in the search box.
- Ask Questions: Enable Q&A Mode to receive direct answers with citations instead of just snippets.
- Manage Indexes: Select different cached indexes from the sidebar. The UI indicates which mode each index was built with.
- Search History: View, pin, or remove previous searches.
GET /- Root endpointPOST /upload/- Upload filesPOST /process/- Process documents and build index (supportsmodeparameter)POST /search/- Perform semantic search (supportsmodeparameter)POST /answer/- Perform extractive Q&A (supportsmodeparameter)GET /cache/- Get cached index information (includes mode info)POST /load_cache/- Load a specific cached indexGET /history/- Get search historyPOST /history/pin/- Pin/unpin a search queryPOST /history/remove/- Remove a search from historyDELETE /cache/clear/- Clear all cached indexesGET /health/- Health check
doc-semantic-search/
├── backend/
│ └── main.py # FastAPI backend with multi-mode logic
├── frontend/
│ ├── src/
│ │ └── App.jsx # Main React component
│ ├── package.json # Frontend dependencies
│ ├── vite.config.js # Vite configuration
│ └── index.html # Main HTML file
├── cache/ # Cached FAISS indexes
├── temp_uploads/ # Temporary uploaded files
├── setup.bat # Windows setup script
├── run_backend.bat # Windows backend runner
├── run_frontend.bat # Windows frontend runner
├── setup.sh # Linux setup script
├── run_backend.sh # Linux backend runner
├── run_frontend.sh # Linux frontend runner
├── pyproject.toml # Python dependencies
└── README.md # This file
FastAPI: Web frameworksentence-transformers: For semantic embeddings and cross-encoderstransformers: For Q&A pipelinefaiss-cpu: For similarity searchPyPDF2,python-docx: File text extraction
React 18: UI libraryVite: Build toolTailwind CSS: StylingAxios: HTTP client@heroicons/react: Icons
- Scores > 100%: If you see this in Fast Mode, ensure you have the latest backend update which applies Sigmoid normalization to the
ms-marcologits. - CORS Errors: Ensure both backend (port 8000) and frontend (port 5173) are running.
- Model Downloads: First run of any mode will download models (~1GB+ for Pro mode). Ensure you have internet access.
GNU GENERAL PUBLIC LICENSE V3.0