Inspiration
Every student knows the frustration of staring at a dense textbook page, trying to make sense of it while juggling a dozen tools — note apps, search engines, AI chatbots, and PDF viewers. We built Clarify to solve that problem. The inspiration came from our own late-night study sessions where learning felt fragmented and inefficient. We wanted a single place where students could upload their textbooks, ask natural questions, and actually understand what they were reading — visually, interactively, and conversationally. Clarify was born from that idea: to make learning feel less like decoding and more like discovery. It’s built on the belief that education should be clear, intuitive, and accessible — powered by AI that helps students truly connect with their material.
What it does
Clarify transforms static textbooks into interactive learning experiences. Students can upload a PDF, take a screenshot of any section they’re struggling with — a diagram, formula, or dense paragraph — and instantly get an AI-powered explanation using Google’s Gemini. Beyond text, Clarify understands visuals too: it can break down graphs, tables, and equations step-by-step.
But Clarify doesn’t stop there — it brings the learning experience to life through voice. Using ElevenLabs conversational AI, students can actually talk to their textbook, ask follow-up questions, and receive spoken, context-aware answers in real time. On top of that, Clarify includes a clean and modern annotation system, letting users highlight key ideas and jot down notes directly on their PDFs.
The result? A single, AI-powered platform that helps students see, hear, and interact with their learning material — all in one place.
How we built it
We built Clarify using a modern full-stack architecture centered around Next.js 15 and React 19 for a fast, modular, and scalable web experience. The front end was styled with Tailwind CSS 4, enabling a clean, minimalist UI with smooth dark/light theme transitions and responsive layouts across devices.
On the AI side, Google’s Gemini API powers Clarify’s visual understanding and text analysis — it performs OCR on screenshots, interprets diagrams, and generates clear explanations of complex topics. For conversational learning, we integrated ElevenLabs’ conversational AI, allowing students to speak naturally with their textbook and receive spoken responses. These two AI systems are orchestrated through secure API routes built directly into Next.js, ensuring fast and seamless data flow.
To support interactive learning, we developed a custom PDF annotation system inspired by DocHub’s simplicity. Users can add resizable, movable notes directly on their textbook pages, with all state managed through TypeScript and React Context API. Finally, we used html2canvas and pdf.js to render and export annotated PDFs with high-quality visuals, completing a fully immersive learning tool.
Throughout development, we collaborated in real time on GitHub, used Turbopack for lightning-fast builds, and focused on creating a smooth, reliable experience that blends AI with intuitive design.
Challenges we ran into
Building Clarify pushed us to solve a wide range of technical and design challenges. One of the hardest parts was getting the screenshot-to-AI analysis pipeline to work smoothly — capturing a selected region from a rendered PDF, converting it into a high-quality image, and sending it to Gemini for OCR and explanation, all without freezing the interface. We also ran into unexpected color and rendering issues with html2canvas and Tailwind’s oklch() colors, which required custom inline style conversions.
Integrating ElevenLabs’ real-time voice conversation API alongside Gemini’s text-based output presented another challenge — syncing two different AI systems while maintaining a natural flow of interaction. Finally, managing state synchronization between annotation mode, screenshot mode, and the chat interface tested our React architecture to its limits. Balancing complexity, performance, and user simplicity was the biggest challenge — but also the most rewarding one.
Accomplishments that we're proud of
We’re proud that Clarify evolved into more than just a tool — it became a seamless AI learning companion. We successfully built a fully functional multi-mode learning system that combines vision, text, and speech, all inside a single browser-based app. Our annotation system works with pixel-perfect precision, our Gemini integration can analyze diagrams in seconds, and our ElevenLabs voice tutor brings a new layer of engagement to studying.
We’re also proud of the design quality — the smooth animations, theme transitions, and minimalist interface make Clarify feel polished and modern. Most importantly, we’re proud that Clarify can genuinely help students learn better, not just faster.
What we learned
Throughout the development of Clarify, we learned how to orchestrate multiple AI systems into one unified experience — handling asynchronous data, real-time audio streams, and complex frontend interactions. We deepened our understanding of Next.js App Router, React Context API, and modern frontend performance optimization with Turbop
Built With
- elevenlabs-api
- github
- google-gemini-api
- html2canvas
- next.js
- pdf.js
- react
- tailwindcss
- typescript
- vercel

Log in or sign up for Devpost to join the conversation.