Cache Me If You Can

cache me if you can

Inspiration

Large language models are compute-hungry, expensive, and slow at scale. I wanted to build something that could dramatically reduce the cost and latency of using ChatGPT, while making it more sustainable and accessible — especially in real-time, edge, or low-resource environments.

What it does

Cache-22 is "ChatGPT, but faster and cheaper." It breaks down user prompts into semantically meaningful components, checks if any were previously answered, and selectively reuses those results — saving time, tokens, and energy. Responses are synthesized into a coherent final answer.

How we built it

Prompt Decomposition: Prompts are split into atomic questions using GPT-3.5-turbo.

Similarity-Based Caching: Components are embedded with SentenceTransformer and compared via FAISS.

Selective Generation: GPT-4 is only called for novel components; cached responses are reused otherwise.

Lightweight Synthesis: GPT-3.5-turbo stitches partial answers into one fluent response.

Full-stack app built with Next.js, FastAPI, and OpenAI's API.

Challenges we ran into

Ensuring decomposed prompts preserve numeric and semantic fidelity (especially for math problems).

Managing token accounting and performance metrics.

Making responses feel natural when stitched from multiple sources.

Hooking up the frontend without introducing latency.

Accomplishments that we're proud of

Meaningful compute/token savings (up to 66% reuse on follow-up prompts).

Live working demo with metrics, real-time streaming, and a ChatGPT-style frontend.

Generalizable framework for caching at the subprompt level.

What we learned

GPT models are surprisingly good at prompt decomposition and synthesis.

Subprompt-level caching offers a promising middle ground between full memory and raw generation.

Even small tweaks (like preserving numeric values) can make or break similarity-based caching.

What's next for Cache-22

Faster, smoother frontend to feel identical to real chatbots.

Smarter decomposition with better number handling.

Persistent vector store and cache across users.

Fine-tuned synthesis for domain-specific use cases (e.g., customer support, coding).

Built With

faiss
fastapi
gpt-3.5-turbo
gpt-4
next.js
python
sentencetransformer

Updates

Mingkuan Yan started this project — Jun 22, 2025 01:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.