MindPad

Problem Statement
Gesture Recognition
Core Features
Landing Page - New Blank Canvas
MindPad In Action - Flashcards + Diagramming
MindPad Logo

🌟 Inspiration

150 years ago, the smartest student in class was the one with the most knowledge. Today, it's the one who can ask the best questions. Yet, modern classrooms rely on static, disconnected tools - slides, whiteboards, and notes - that fail to capture the dynamic nature of human learning.

Students today juggle lecture slides, notes, videos, and AI tools across different tabs - a fragmented experience that slows real understanding. While LLMs and note-taking tools help, they rarely adapt to how each student actually learns. Many students struggle with passive, screen-based learning experiences - especially neurodiverse or physically limited learners. We wanted to design a tool that makes studying more accessible, tactile, and interactive, enabling learning through movement, speech, and visualization.

We built MindPad to reintroduce that magic - helping students think, speak, and learn like humans, not machines.

🪄 What it does

MindPad is an AI-powered study and exploration platform that helps students learn, revise, and understand course material through gestures, voice, and an intelligent interactive canvas.

Using advanced computer-vision and language-model technology, MindPad blends gesture recognition, voice interaction, and an AI-powered canvas into one seamless experience. MindPad lets students:

Interact Hands-Free: Use real-time hand-gesture recognition (OpenCV + MediaPipe + custom ANN for detection + classification) to navigate slides, highlight key points, or sketch diagrams on the AI canvas without touching a device.
Learn by Talking: Speak naturally to the OpenAI Agent SDK-based voice agent to ask questions, generate examples, or request diagrams and clarifications during study sessions.
Understand Through Visualization: The intelligent canvas can automatically draw charts, concept maps, and summaries in real time from lecture transcripts or uploaded PDFs for more engaging learning experiences.
Promote Kinesthetic Learning: Encourages active engagement by letting students interact physically with course content through gestures: proven to improve comprehension and memory retention.
Enhance Accessibility: Designed with a touch-free interface for students with motor or mobility challenges, allowing voice-first and gesture-only navigation. Together, these tools turn passive reading into active multimodal learning.

⚙️ How we built it

Architecture Model: (in our gallery)

Languages: Python ∙ HTML ∙ CSS ∙ TypeScript

Frameworks and Tools: OpenCV ∙ MediaPipe ∙ PyAutoGUI ∙ Pynput ∙ OpenAI Agent SDK ∙ Figma ∙ Custom ANN

1. Gesture Recognition and Classification

We built a real-time gesture detection and control pipeline using OpenCV and MediaPipe.
Landmark Extraction: MediaPipe captures and tracks 21 hand landmarks per frame.
Feature Engineering: We calculate pairwise distances and angles between key landmarks to generate gesture vectors.
Classification Model: A custom Artificial Neural Network (ANN) trained on labeled gesture datasets classifies gestures (e.g., scroll, tap, annotate).
System Control: Classified gestures are mapped to PyAutoGUI and Pynput actions, enabling users to perform mouse clicks, slide navigation, or drawing without touching a device.
This pipeline runs in under 200ms latency and achieves 99% classification accuracy on test gestures.

2. Voice Agent Integration

To make interaction natural and multimodal, we integrated a voice-driven AI assistant using the OpenAI Agent SDK.
The agent performs tool-calling for real-time tasks: generating charts, summarizing concepts, or writing on the web-based canvas.
Spoken queries are transcribed, processed, and passed through the OpenAI API.
The agent determines whether to generate text, draw a figure, or retrieve lecture material, returning the result seamlessly to the user interface.

3. Web-Based Intelligent Canvas

The core of MindPad’s user experience: supports gesture-based drawing, AI annotations, and voice-generated content.
Students can ask the agent to "draw a graph," "summarize this chapter," or "create a concept map," and the canvas updates dynamically.
This layer connects all input modes - gesture, voice, and knowledge - into one synchronized workspace.

4. Frontend and UI/UX Design

We used HTML/CSS/Figma to design an intuitive, distraction-free interface optimized for learning.
Smooth animations and transitions ensure gesture and voice interactions feel organic.
We implemented adjustable gesture sensitivity, high-contrast UI themes, and multimodal input support to ensure accessibility for users with different learning and mobility needs.

⏳ Challenges we ran into

Synchronizing gesture recognition, voice commands, and AI input in real-time
Designing smooth hand gesture thresholds for accuracy and comfort
Integrating OpenAI Agents with web-based rendering and tool calls
Managing latency across multimodal pipelines

🏆 Accomplishments that we're proud of

Created an accessible, multimodal workspace that supports diverse learning styles - visual, auditory, and kinesthetic - making technology more human-centered.
Built a functional gesture + voice + AI canvas system from scratch in under 36 hours
Created a seamless interface for students to learn dynamically in a multimodal format
Achieved high accuracy and responsiveness while maintaining a clean UX

💡 What we learned

Building MindPad taught us that true immersion happens when technology adapts to humans, not the other way around. Developing multimodal systems requires careful user-centric calibration, and we saw how even minor gesture differences make a world of a difference. We learned that accessibility and engagement go hand in hand, and that designing for kinesthetic and voice-based input not only improves inclusion but also enhances comprehension for all learners. When integrated right, multimodality helped us unlock creative, human-centered learning and improve focus and retention.

Our biggest takeaway was that the future of education lies in adaptive, expressive learning interfaces that let students think beyond text.

🚀 What's next for MindPad

Integrate MindPad with learning platforms like Notion, Canvas, and Google Classroom for direct lecture import
Add personalized learning insights for students via emotion and engagement tracking to detect confusion and tailor feedback
Expand accessibility features such as customizable gesture ranges, AR sign-language recognition, and real-time captioning for a fully inclusive learning environment
Support collaborative study sessions via shared multimodal canvases

Built With

ann
css
figma
gemini-api
html
mediapipe
openai-agent-sdk
openai-api
opencv
pyautogui
pynput
python
typescript

Submitted to

HackPrinceton Fall 2025
- Winner Best Overall Hack [Courtesy of Amplitude, Anthropic, OpenAI]
- Winner Best Practical AI Innovation by Amazon
- Winner Best Use of Gemini API by MLH

Created by

I designed and built the full technical stack for MindPad, including the multi-agent system using OpenAI’s Realtime API, all real-time voice + tool-calling logic, the gesture recognition pipeline (OpenCV + Mediapipe + ANN), and the complete Next.js web app that connected everything together.

Vamsi Naghichetty Kishore Kumar
Kavya Venkatesh
MS CS @ Columbia | Georgia Tech Alumna
Krish Golcha
building
Prithvi Seshadri