🌟 Inspiration
150 years ago, the smartest student in class was the one with the most knowledge. Today, it's the one who can ask the best questions. Yet, modern classrooms rely on static, disconnected tools - slides, whiteboards, and notes - that fail to capture the dynamic nature of human learning.
Students today juggle lecture slides, notes, videos, and AI tools across different tabs - a fragmented experience that slows real understanding. While LLMs and note-taking tools help, they rarely adapt to how each student actually learns. Many students struggle with passive, screen-based learning experiences - especially neurodiverse or physically limited learners. We wanted to design a tool that makes studying more accessible, tactile, and interactive, enabling learning through movement, speech, and visualization.
We built MindPad to reintroduce that magic - helping students think, speak, and learn like humans, not machines.
🪄 What it does
MindPad is an AI-powered study and exploration platform that helps students learn, revise, and understand course material through gestures, voice, and an intelligent interactive canvas.
Using advanced computer-vision and language-model technology, MindPad blends gesture recognition, voice interaction, and an AI-powered canvas into one seamless experience. MindPad lets students:
- Interact Hands-Free: Use real-time hand-gesture recognition (OpenCV + MediaPipe + custom ANN for detection + classification) to navigate slides, highlight key points, or sketch diagrams on the AI canvas without touching a device.
- Learn by Talking: Speak naturally to the OpenAI Agent SDK-based voice agent to ask questions, generate examples, or request diagrams and clarifications during study sessions.
- Understand Through Visualization: The intelligent canvas can automatically draw charts, concept maps, and summaries in real time from lecture transcripts or uploaded PDFs for more engaging learning experiences.
- Promote Kinesthetic Learning: Encourages active engagement by letting students interact physically with course content through gestures: proven to improve comprehension and memory retention.
- Enhance Accessibility: Designed with a touch-free interface for students with motor or mobility challenges, allowing voice-first and gesture-only navigation. Together, these tools turn passive reading into active multimodal learning.
⚙️ How we built it
Architecture Model: (in our gallery)
Languages: Python ∙ HTML ∙ CSS ∙ TypeScript
Frameworks and Tools: OpenCV ∙ MediaPipe ∙ PyAutoGUI ∙ Pynput ∙ OpenAI Agent SDK ∙ Figma ∙ Custom ANN
1. Gesture Recognition and Classification
- We built a real-time gesture detection and control pipeline using OpenCV and MediaPipe.
- Landmark Extraction: MediaPipe captures and tracks 21 hand landmarks per frame.
- Feature Engineering: We calculate pairwise distances and angles between key landmarks to generate gesture vectors.
- Classification Model: A custom Artificial Neural Network (ANN) trained on labeled gesture datasets classifies gestures (e.g., scroll, tap, annotate).
- System Control: Classified gestures are mapped to PyAutoGUI and Pynput actions, enabling users to perform mouse clicks, slide navigation, or drawing without touching a device.
- This pipeline runs in under 200ms latency and achieves 99% classification accuracy on test gestures.
2. Voice Agent Integration
- To make interaction natural and multimodal, we integrated a voice-driven AI assistant using the OpenAI Agent SDK.
- The agent performs tool-calling for real-time tasks: generating charts, summarizing concepts, or writing on the web-based canvas.
- Spoken queries are transcribed, processed, and passed through the OpenAI API.
- The agent determines whether to generate text, draw a figure, or retrieve lecture material, returning the result seamlessly to the user interface.
3. Web-Based Intelligent Canvas
- The core of MindPad’s user experience: supports gesture-based drawing, AI annotations, and voice-generated content.
- Students can ask the agent to "draw a graph," "summarize this chapter," or "create a concept map," and the canvas updates dynamically.
- This layer connects all input modes - gesture, voice, and knowledge - into one synchronized workspace.
4. Frontend and UI/UX Design
- We used HTML/CSS/Figma to design an intuitive, distraction-free interface optimized for learning.
- Smooth animations and transitions ensure gesture and voice interactions feel organic.
- We implemented adjustable gesture sensitivity, high-contrast UI themes, and multimodal input support to ensure accessibility for users with different learning and mobility needs.
⏳ Challenges we ran into
- Synchronizing gesture recognition, voice commands, and AI input in real-time
- Designing smooth hand gesture thresholds for accuracy and comfort
- Integrating OpenAI Agents with web-based rendering and tool calls
- Managing latency across multimodal pipelines
🏆 Accomplishments that we're proud of
- Created an accessible, multimodal workspace that supports diverse learning styles - visual, auditory, and kinesthetic - making technology more human-centered.
- Built a functional gesture + voice + AI canvas system from scratch in under 36 hours
- Created a seamless interface for students to learn dynamically in a multimodal format
- Achieved high accuracy and responsiveness while maintaining a clean UX
💡 What we learned
Building MindPad taught us that true immersion happens when technology adapts to humans, not the other way around. Developing multimodal systems requires careful user-centric calibration, and we saw how even minor gesture differences make a world of a difference. We learned that accessibility and engagement go hand in hand, and that designing for kinesthetic and voice-based input not only improves inclusion but also enhances comprehension for all learners. When integrated right, multimodality helped us unlock creative, human-centered learning and improve focus and retention.
Our biggest takeaway was that the future of education lies in adaptive, expressive learning interfaces that let students think beyond text.
🚀 What's next for MindPad
- Integrate MindPad with learning platforms like Notion, Canvas, and Google Classroom for direct lecture import
- Add personalized learning insights for students via emotion and engagement tracking to detect confusion and tailor feedback
- Expand accessibility features such as customizable gesture ranges, AR sign-language recognition, and real-time captioning for a fully inclusive learning environment
- Support collaborative study sessions via shared multimodal canvases
Built With
- ann
- css
- figma
- gemini-api
- html
- mediapipe
- openai-agent-sdk
- openai-api
- opencv
- pyautogui
- pynput
- python
- typescript



Log in or sign up for Devpost to join the conversation.