D-RAG

Inspiration

Humans have developed countless tools to share information and context with each other to collaborate better. But now that many teams are becoming AI teams, with AI agents needing to work together, we need tools for LLMs to work in sync with shared memory. Imagine you and your friend are working on an operating systems project. You generate a plan on ChatGPT, start coding on Claude, and want your friend’s ChatGPT and Claude to understand the same context. Right now, you’d have to manually transfer everything between chats and people. D-RAG eliminates that problem. By simply connecting through its MCP server and creating a project, the LLMs automatically share and update context for you. The system handles memory syncing behind the scenes so your agents and teammates stay aligned without extra effort.

What it does

D-RAG, or Dynamic Retrieval-Augmented Generation, creates a shared-memory layer that connects humans and LLMs. It uses Gemini for embeddings and semantic vector searching to dynamically retrieve the most relevant context, while Snowflake provides observability and vector querying for deeper insight into memory usage. The MCP server automates everything, managing the context in the backend without requiring users to go onto a separate platform. MongoDB stores projects, user data, and contextual relationships, allowing each user to maintain multiple projects with structured knowledge graphs that evolve automatically as the project grows.

How we built it

We built the MCP server using FastMCP to handle routing, automation, and context synchronization from any LLM client. MongoDB serves as both the primary database and lightweight vector store, while the Snowflake Cortex API powers our vector search and observability service. Gemini’s embedding APIs are used to generate high-quality vector representations of context chunks, improving retrieval accuracy. The system follows a clean hierarchy where each user can have multiple projects, and each project holds its own contextual chunks and graph relationships. Together, this design enables seamless cross-agent collaboration without breaking stateless architecture principles.

Challenges we ran into

We faced several challenges around hosting and security. AWS firewall rules initially prevented us from connecting to MongoDB securely, which took significant debugging. Working with Snowflake was another challenge due to its strict authentication layers and dependency conflicts across different systems. We also had to carefully design APIs that could bridge Gemini, Snowflake, and the FastMCP server into one unified, stateless framework that would scale easily and avoid breaking under concurrent requests.

Accomplishments that we're proud of

One of our biggest accomplishments is the interactive knowledge graph that visualizes how different chunks of context connect and influence each other. This feature helps both LLMs and users understand how memory evolves, improving contextual accuracy. We also integrated Snowflake to provide LLM observability, allowing us to track token usage, retrieval frequency, and system costs in real time. Our authentication layer combines JWT and API key authorization for secure multi-user access, and requests. Another technical highlight is our Gemini-based embedding pipeline, which chunks content based on semantic meaning rather than size, improving retrieval precision and minimizing hallucination. Finally, we are proud of building a stateless MCP architecture that allows any LLM to connect and collaborate through a shared memory layer, with all agent requests secured by bearer auth.

What we learned

We learned how to design efficient stateless MCP servers that can route memory updates and multi-agent requests in real time. Working with Gemini embeddings taught us best practices for chunking text by context boundaries and embedding them for optimal vector search results. Integrating Snowflake for observability also gave us insight into how large-scale AI systems can be monitored for usage patterns, retrieval performance, and cost management. Most importantly, we learned that enabling AI collaboration is not just about sharing prompts but about building structured systems for sharing context.

What's next for D-RAG

Next, we plan to add WebSocket support for real-time memory updates so that context evolves as users and agents work. We also want to expand the observability service to analyze which retrieved chunks are most useful and use that data to train a reward-based model that continuously improves retrieval quality. Another major goal is to introduce dashboards where users can see live memory graphs and team-level collaboration metrics. Ultimately, we envision D-RAG Cloud, a scalable service where any LLM can plug in and instantly gain access to a shared, secure, and dynamic memory space for true human-AI teamwork.