FellowQuant RAG Server

A local RAG (Retrieval-Augmented Generation) server built for quantitative finance research documents. Ingests PDFs, LaTeX source files, and Jupyter notebooks with structure preservation — financial tables stay as tables, mathematical formulas stay as LaTeX, code blocks stay intact. Exposes a REST API and MCP server.

Installation

pip install fellowquant-rag

After installation, run rag-server setup (or any rag-server command) to configure the MCP server in Claude Code.

Requirements

Python 3.12+
Docker (for Qdrant vector database)
A running LLM endpoint (vLLM, llama.cpp, or AWS Bedrock credentials)
GPU with ~2.5 GB VRAM (BGE-M3 ~1 GB + Qwen3-Reranker ~1.2 GB)

CLI Commands

Command	Description
`rag-server setup`	Configure MCP server in Claude Code (global or local scope)
`rag-server start-qdrant`	Start Qdrant vector database via Docker
`rag-server start`	Start FastAPI REST server on port 8001
`rag-server mcp`	Start MCP stdio server (used internally by Claude Code)

MCP Setup

rag-server setup prompts for:

Global scope (--scope user): MCP available in all Claude Code projects
Local scope: writes .mcp.json in the current directory (commit to share with your team)

If the claude CLI is not installed, setup writes .mcp.json and prints instructions.

What it does

Ingests PDFs (via Docling), .tex files, and .ipynb notebooks with layout-aware chunking
Retrieves using three-mode hybrid search: BM25 keyword + BGE-M3 dense + BGE-M3 sparse, fused via RRF and reranked by Qwen3-Reranker-0.6B
Answers questions with a local LLM (vLLM, llama.cpp, or AWS Bedrock) and inline citations
Integrates with Claude Code via MCP stdio protocol — 5 tools: retrieve, ask, list_documents, get_document, delete_document

Setup (development / clone-and-run)

For most users: install via pip install fellowquant-rag (see Installation above). The steps below are for contributors cloning the repo directly.

1. Install dependencies

uv sync

2. Start Qdrant

docker compose up -d qdrant

3. Configure LLM

Copy and edit the LLM config:

cp llm.yaml.example llm.yaml
# Edit llm.yaml: set provider (vllm/llamacpp/bedrock), model, base_url

4. Run database migrations

uv run alembic upgrade head

5. Start the server

./scripts/start.sh

This sets PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True (prevents CUDA memory fragmentation stalls), binds to 0.0.0.0:8001 with hot-reload, and tees logs to .logs/rag_server.log.

REST API

All endpoints are under /api/v1. A health check is at /health.

Documents

Method	Path	Description
`POST`	`/api/v1/documents`	Upload a PDF, `.tex`, or `.ipynb` file
`GET`	`/api/v1/documents`	List all documents with indexing status
`GET`	`/api/v1/documents/{id}`	Get metadata for a specific document
`DELETE`	`/api/v1/documents/{id}`	Delete document and all associated chunks/vectors

Query

Method	Path	Description
`POST`	`/api/v1/retrieve`	Hybrid search — returns ranked chunks with citations
`POST`	`/api/v1/ask`	Full RAG — returns LLM-synthesized answer with citations (streaming SSE by default)

Retrieve request

{
  "query": "Sharpe ratio under non-normal returns",
  "top_k": 10,
  "document_ids": ["uuid-1", "uuid-2"],
  "min_score": 0.3
}

document_ids and min_score are optional. Omit document_ids to search across all documents.

Ask request

{
  "query": "What is the Kelly criterion for fractional betting?",
  "top_k": 10
}

Returns SSE token stream by default. Add ?streaming=false for a single JSON response.

Errors

All errors use RFC 7807 Problem Details format:

{
  "type": "about:blank",
  "title": "Not Found",
  "status": 404,
  "detail": "Document abc123 not found"
}

Upload size limit: 100 MB (returns 413).

MCP Server (Claude Code)

The .mcp.json at the project root registers the MCP server automatically when you open this project in Claude Code.

Tools

Tool	Description
`retrieve`	Hybrid search — returns ranked chunks with full citation metadata and scores
`ask`	Full RAG pipeline — LLM-synthesized answer with citations; falls back to raw chunks if LLM unavailable
`list_documents`	List all documents with metadata and indexing status
`get_document`	Get metadata for a document by ID
`delete_document`	Delete a document and all associated data

Notes

File upload is done via the REST API (POST /api/v1/documents), not through MCP
BM25 keyword index is loaded from disk at MCP server startup — restart the MCP server after indexing new documents
Running the MCP server simultaneously with the FastAPI server uses ~4.4 GB VRAM total (both load BGE-M3 and Reranker)

Manual MCP test

# Browse tools interactively (development only)
uv run mcp dev src/rag_server/mcp_server.py

# Re-run setup to (re-)register with Claude Code
rag-server setup

Technology stack

Component	Library	Notes
PDF parser	Docling (IBM)	97.9% table accuracy; Granite-Docling VLM for formula→LaTeX
LaTeX parser	pylatexenc 2.10	Direct `.tex` ingestion
Notebook parser	nbformat	Code and output cells preserved
Embeddings	BGE-M3 (FlagEmbedding)	Dense + sparse vectors from single inference pass
Keyword search	rank-bm25	Full-corpus BM25; persisted as pickle
Vector store	Qdrant v1.16.3	Local Docker; dense + sparse schema
Reranker	Qwen3-Reranker-0.6B	Cross-encoder; ~61 BEIR nDCG@10
LLM	vLLM / llama.cpp / AWS Bedrock	Provider-swappable via `llm.yaml`
Database	SQLite + SQLAlchemy	Document and chunk metadata
API	FastAPI + uvicorn	Async; RFC 7807 errors; SSE streaming
MCP	fastmcp	stdio transport; 5 tools

Verification scripts

# Verify the installed wheel (entry points, assets, subcommands)
bash scripts/smoke_test_cli.sh

# REST API — requires running server at localhost:8001
python scripts/verify_api.py

# Earlier phases (run in order against a running server)
python scripts/verify_storage.py
python scripts/verify_ingestion.py
python scripts/verify_retrieval.py
python scripts/verify_llm.py

Data layout

data/
├── uploads/      # Uploaded files, stored as {sha256}{ext}
├── qdrant/       # Qdrant persistent storage (mounted into Docker)
└── bm25.pkl      # BM25 index (rebuilt after each document is indexed)

DATA_DIR environment variable controls the base path (default: ./data).

Configuration

Environment variables (`.env`)

DATA_DIR=./data
QDRANT_HOST=localhost      # or container name inside Docker
QDRANT_PORT=6333
MAX_UPLOAD_SIZE=104857600  # 100 MB in bytes

LLM (`llm.yaml`)

llm:
  provider: vllm            # vllm | llamacpp | bedrock
  model: Qwen/Qwen2.5-7B-Instruct
  base_url: http://localhost:8000/v1
  context_chunks: 5
  max_context_tokens: 8000

See llm.yaml.example for all options including Bedrock and the system prompt.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
.planning		.planning
alembic		alembic
assets		assets
postman		postman
scripts		scripts
src/rag_server		src/rag_server
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
.python-version		.python-version
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
llm.yaml		llm.yaml
llm.yaml.example		llm.yaml.example
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FellowQuant RAG Server

Installation

Requirements

CLI Commands

MCP Setup

What it does

Setup (development / clone-and-run)

REST API

Documents

Query

Retrieve request

Ask request

Errors

MCP Server (Claude Code)

Tools

Notes

Manual MCP test

Technology stack

Verification scripts

Data layout

Configuration

Environment variables (`.env`)

LLM (`llm.yaml`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FellowQuant RAG Server

Installation

Requirements

CLI Commands

MCP Setup

What it does

Setup (development / clone-and-run)

REST API

Documents

Query

Retrieve request

Ask request

Errors

MCP Server (Claude Code)

Tools

Notes

Manual MCP test

Technology stack

Verification scripts

Data layout

Configuration

Environment variables (.env)

LLM (llm.yaml)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment variables (`.env`)

LLM (`llm.yaml`)

Packages