Portfolio project · Privacy-first document Q&A with local RAG
A responsive web application where users upload PDFs and CSV files and ask natural-language questions about their content. Answers are grounded only in the files uploaded during the current temporary session — nothing is stored permanently and no data ever leaves your machine.
Users often need to query private documents without sending them to a cloud service. SafeQueryAI runs entirely on your local machine: upload, ask, get answers, then clear — powered by a local LLM through Ollama, with no database and no external API calls.
Full documentation is available at https://jyoshiro.github.io/SafeQueryAI/
Quick links:
- Home
- Business Overview
- Getting Started
- Architecture
- Security & Privacy
- API Reference
- Frontend Guide
- Testing
- Deployment
- Roadmap
- FAQ
- Upload — PDF or CSV files are saved to a temporary session folder.
- Extract — Text is extracted from the file immediately after upload.
- Chunk & Embed — The extracted text is split into overlapping chunks and each chunk is embedded using the local Ollama embedding model (
nomic-embed-text). - Ask — When a question is submitted, it is embedded and the most similar chunks are retrieved via cosine similarity from the in-memory vector store.
- Generate — The retrieved chunks are sent as context to the local Ollama generation model (
llama3.2), which produces a grounded answer. - Clear — When the session ends (manually or after 60 minutes of inactivity), all files, chunks, and embeddings are deleted.
If Ollama is offline when a file is uploaded, the system falls back to keyword matching so the application remains usable.
| Layer | Technology |
|---|---|
| Frontend | React 19, TypeScript, Vite |
| Backend | ASP.NET Core Web API, .NET 8 |
| Local LLM runtime | Ollama |
| Embedding model | nomic-embed-text (via Ollama) |
| Generation model | llama3.2 (via Ollama) |
| PDF extraction | PdfPig |
| Vector store | In-memory (session-scoped) |
| File storage | Local temp folder (session-scoped) |
| Tool | Version | Download |
|---|---|---|
| .NET SDK | 8.0+ | https://dotnet.microsoft.com/download |
| Node.js | 18+ | https://nodejs.org |
| Ollama | latest | https://ollama.com |
Download and install Ollama from https://ollama.com, then start the local server:
ollama serveOllama runs on http://localhost:11434 by default. Leave this terminal open.
Open a new terminal and pull both models (one-time download):
ollama pull nomic-embed-text
ollama pull llama3.2
nomic-embed-textis ~274 MB.llama3.2is ~2 GB. Both are stored locally by Ollama and never sent anywhere.
cd backend
dotnet runThe API starts on http://localhost:5000.
Swagger UI is available at http://localhost:5000/swagger.
On first run, .NET will restore NuGet packages automatically.
The backend will fail to start ifOllama:BaseUrlis set to a non-local address — this is intentional to prevent accidental data exfiltration.
Open a second terminal:
cd frontend
npm install
npm run devThe app opens at http://localhost:5173.
Vite proxies all
/apirequests tohttp://localhost:5000, so no CORS setup is needed.
| Component | URL |
|---|---|
| Frontend | http://localhost:5173 |
| Backend API | http://localhost:5000 |
| Swagger UI | http://localhost:5000/swagger |
| Ollama | http://localhost:11434 |
All backend configuration is in backend/appsettings.json:
{
"SafeQueryAI": {
"TempStoragePath": "TempSessions",
"MaxFileSizeMb": 20,
"SessionTimeoutMinutes": 60
},
"Ollama": {
"BaseUrl": "http://localhost:11434",
"EmbeddingModel": "nomic-embed-text",
"GenerationModel": "llama3.2"
}
}To swap models, change EmbeddingModel or GenerationModel to any model you have pulled with ollama pull. For example:
ollama pull mistralThen set "GenerationModel": "mistral" in appsettings.json and restart the backend.
Privacy guardrail:
Ollama:BaseUrlmust be a loopback address (localhost/127.0.0.1). The application will refuse to start if a remote URL is configured.
SafeQueryAI/
├── frontend/ # React + TypeScript + Vite
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── services/api.ts # API client layer
│ │ ├── types/api.ts # Shared TypeScript types
│ │ ├── App.tsx # App shell + state management
│ │ └── main.tsx # Entry point
│ └── vite.config.ts # Dev server + API proxy
│
└── backend/ # ASP.NET Core Web API
├── Controllers/ # HTTP endpoints
├── Services/
│ ├── OllamaService.cs # Embedding + generation via local Ollama
│ ├── DocumentIndexingService.cs # Chunking → embed → vector store
│ ├── VectorStoreService.cs # In-memory cosine similarity search
│ ├── QuestionAnsweringService.cs # RAG pipeline + keyword fallback
│ ├── SessionService.cs # In-memory session state
│ ├── SessionExpiryService.cs # Background expiry + cleanup
│ ├── FileStorageService.cs # Temp file save/delete
│ ├── TextExtractionService.cs # PDF + CSV text extraction
│ └── Interfaces/ # Service contracts
├── Models/ # Internal models (DocumentChunk, SessionInfo, etc.)
├── Contracts/ # API request/response DTOs
├── appsettings.json
└── Program.cs
| Method | Path | Description |
|---|---|---|
POST |
/api/sessions |
Create a new session |
GET |
/api/sessions/{id} |
Get session metadata |
DELETE |
/api/sessions/{id} |
Clear session, delete files, purge RAG index |
POST |
/api/sessions/{id}/files |
Upload a PDF or CSV (triggers embedding) |
GET |
/api/sessions/{id}/files |
List uploaded files (metadata only) |
POST |
/api/sessions/{id}/questions |
Ask a question (RAG answer) |
GET |
/api/health |
Liveness check |
- No database — session data is held in process memory only
- Local LLM only — Ollama runs on your machine; the
BaseUrlis validated to be a loopback address at startup - Temporary files — uploads are stored in
TempSessions/, scoped per session, and deleted when the session clears - Automatic expiry — sessions inactive for 60 minutes are automatically expired and all associated files, chunks, and embeddings are removed
- Startup cleanup — the
TempSessions/directory is wiped when the backend starts, removing any files left by a previous crashed process - No content in logs — logs include only filenames and counts, never file content or extracted text
- Extracted text is server-side only — raw file content is never returned to the client
- No telemetry — no analytics, tracking, or external calls of any kind
- Text-layer PDFs only (no OCR for scanned/image-based PDFs)
- Sessions do not persist across server restarts (by design)
- No authentication — designed for single-user local use only
- Ollama models must be pulled manually before first use
- Large documents with many chunks may slow down the embedding step depending on hardware
- Docker Compose setup to start backend + frontend together
- OCR support for scanned PDFs
- XLSX file support
- Optional streaming responses from the LLM
- UI indicator showing whether Ollama is online