Inspiration
Large language models are compute-hungry, expensive, and slow at scale. I wanted to build something that could dramatically reduce the cost and latency of using ChatGPT, while making it more sustainable and accessible — especially in real-time, edge, or low-resource environments.
What it does
Cache-22 is "ChatGPT, but faster and cheaper." It breaks down user prompts into semantically meaningful components, checks if any were previously answered, and selectively reuses those results — saving time, tokens, and energy. Responses are synthesized into a coherent final answer.
How we built it
Prompt Decomposition: Prompts are split into atomic questions using GPT-3.5-turbo.
Similarity-Based Caching: Components are embedded with SentenceTransformer and compared via FAISS.
Selective Generation: GPT-4 is only called for novel components; cached responses are reused otherwise.
Lightweight Synthesis: GPT-3.5-turbo stitches partial answers into one fluent response.
Full-stack app built with Next.js, FastAPI, and OpenAI's API.
Challenges we ran into
Ensuring decomposed prompts preserve numeric and semantic fidelity (especially for math problems).
Managing token accounting and performance metrics.
Making responses feel natural when stitched from multiple sources.
Hooking up the frontend without introducing latency.
Accomplishments that we're proud of
Meaningful compute/token savings (up to 66% reuse on follow-up prompts).
Live working demo with metrics, real-time streaming, and a ChatGPT-style frontend.
Generalizable framework for caching at the subprompt level.
What we learned
GPT models are surprisingly good at prompt decomposition and synthesis.
Subprompt-level caching offers a promising middle ground between full memory and raw generation.
Even small tweaks (like preserving numeric values) can make or break similarity-based caching.
What's next for Cache-22
Faster, smoother frontend to feel identical to real chatbots.
Smarter decomposition with better number handling.
Persistent vector store and cache across users.
Fine-tuned synthesis for domain-specific use cases (e.g., customer support, coding).
Built With
- faiss
- fastapi
- gpt-3.5-turbo
- gpt-4
- next.js
- python
- sentencetransformer
Log in or sign up for Devpost to join the conversation.