LiteRAG

main page
our results on freecodecamps repo
our results on our own repo

Inspiration

We’ve been researching retrieval-augmented generation (RAG) and storage-efficient embedding systems at Berkeley, and noticed a gap: most retrieval engines are built for the cloud. They’re powerful but bulky—too large for laptops, edge servers, or mobile devices. We wanted to see if we could bring the same high-recall semantic search to the edge, with minimal storage and no loss in quality.

What it does

LiteRAG is a storage-optimized retrieval engine designed for the edge. It compresses indexes to under 5% of raw data, achieving up to 50× smaller footprints than standard vector databases while maintaining comparable recall. LiteRAG can ingest any GitHub repo, build a graph-based index, and let users query side-by-side against a Chroma baseline—with optional Groq acceleration for instant, context-aware answers.

How we built it

We built LiteRAG with a FastAPI backend managing ingestion, indexing, and evaluation, and a Next.js frontend for visualization and benchmarking. Each run creates isolated Chroma stores and reproducible builds to ensure fair comparisons. We experimented with graph compression, text normalization, and quantization strategies to achieve near-lossless retrieval quality in a tiny footprint.

Challenges we ran into

Our biggest challenge was balancing recall quality with aggressive compression. Pushing storage down 50× without breaking semantic precision required iterating on embedding sparsification, graph layouts, and normalization pipelines. We also had to ensure consistent benchmarks across multiple frameworks, which meant designing reproducible evaluation loops from scratch.

Accomplishments that we're proud of

We achieved a working retrieval system that’s orders of magnitude smaller than existing solutions—yet delivers nearly identical recall quality. LiteRAG runs cleanly on devices with limited memory, cold-starts instantly, and supports deterministic comparisons with Chroma. Seeing a 50× reduction in storage without measurable loss in accuracy was a huge milestone.

What we learned

We deepened our understanding of graph-based retrieval, index compression, and storage-aware design for RAG systems. We also learned how small architectural decisions—like memory layout or token normalization—can drastically affect both performance and reproducibility. These lessons tie directly into our ongoing research on efficient, adaptive retrieval models.