Inspiration

Working through dense research documents on complex subjects such as new deep learning techniques or sophisticated statistics can be a laborious, linear experience. Text-based summaries can be limiting when seeking to grasp the interplay between fundamental concepts, techniques, and findings. From a deep well of passion for data visualization, the challenge was to transcend the limitations of traditional chat-based interfaces. A spatial learning experience that visually lays out relationships and a Web3-based micro-transaction model to ensure the underlying AI infrastructure can sustain itself were the objectives.

What it does

ConceptCanvas is an AI research assistant that turns static PDFs into interactive and conversational knowledge graphs. Document Ingestion:Users upload PDF research papers, which are then chunked and integrated into a MongoDB vector database. Concept Graphing:Using Google's Gemini 2.5 Flash model, the app generates the key components of the research paper as a Directed Acyclic Graph (DAG) that represents the concepts' relation to each other. Conversational AI with Voice: Users can converse with specific nodes on the graph. The information exchanged is entirely grounded in the research paper's context through a hybrid retrieval system with ElevenLabs' text-to-speech functionality.

How we built it

We broke the architecture into a high-performance backend and a highly interactive frontend to accommodate the high data processing and rendering requirements: Backend: It is developed using Python and FastAPI with AsyncIOMotorClient to perform asynchronous MongoDB operations. AI & Embeddings: In our architecture, we used the Gemini API gemini-embedding-001 and gemini-2.5-flash to perform hybrid vector-lexical queries, JSON schema concept queries, and user queries. ElevenLabs was used to generate high-fidelity TTS audio. Frontend: It is a Next.js React 19 application with Tailwind CSS. Visualization: In our architecture, we used D3.js to create a visual representation of the JSON nodes and edges. Deployment: In our architecture, we used Vultr to host and scale our backend infrastructure.

Challenges we ran into

AI Rate Limiting & Fallbacks:Contending with API limits during heavy use of the document embedding/generation process was a problem. We created a strong fallback system that fails gracefully. That is, in the event of vector embeddings being limited, the system will dynamically switch to a lexical-only fallback search so the user can continue to query without interruption. Enforcing Graph Integrity: LLMs tend to generate cyclical relationships when graphing out concepts. We had to design a custom algorithm for enforcing a directed acyclic graph so that back edges would be eliminated to keep the graph mathematically stable. Structured JSON Constraints:Forcing the LLM to output strict JSON for the D3 frontend without markdown fences or broken formatting required heavy prompting guardrails.

Accomplishments that we're proud of

within the span of 24 hours (23 with daylights savings reset), The Hybrid Search Engine: Successfully implementing a custom retrieval system that mathematically combines vector search scores with lexical token matching to provide highly accurate citations and text snippets. Deterministic AI Output: Engineering the system to consistently force the Gemini model to act as a structured data-parser, bridging the gap between generative text and strict frontend rendering requirements. Seamless Audio Integration: Wiring up ElevenLabs to stream natural-sounding voice responses based only on the highly specific context retrieved from the paper.

What we learned

Developing this full-stack application has been a challenge to our capabilities to utilize the latest technologies to create a seamless experience. We developed extensive hands-on expertise to leverage Google's Gemini API, including gemini-2.5-flash and gemini-embedding-001, to create a sophisticated experience with text information and vectors. We developed expertise to optimize the structure and query high-dimensional data by leveraging MongoDB as our vector database. Creating the backend infrastructure has been a challenge to leverage Vultr to create a seamless experience between our backend written with Python's FastAPI and our frontend written with Next.js. Furthermore, we developed expertise to leverage React's lifecycle with D3.js to create the concept graphs

What's next for ConceptCanvas

Cross-Document Graphing: Allowing the canvas to connect concepts across a library of multiple uploaded PDFs (e.g., tracking how a specific quantitative model evolves across different research papers). Multi-modal Nodes: Expanding the graph nodes to extract and display equations and charts directly from the source PDFs, rather than just text summaries. Exportable Context: Allowing users to export their curated concept graphs into a structured format for use in other data science or research pipelines.

Built With

Share this project:

Updates