Inspiration
It all started when one of our teammates was scrolling through TikTok (as you do) and stumbled across this mind-blowing video of someone playing an "air piano" using computer vision. We couldn't stop talking about how cool it looked!
We immediately knew we wanted to build our own music-making computer vision project, but as we started brainstorming, something important hit us. Not everyone has the same finger dexterity or hand mobility to make complex piano-like gestures. That's when the lightbulb moment happened - what if we used American Sign Language instead?
Suddenly, our "cool tech project" became something so much bigger. We weren't just building a game anymore; we were creating a bridge between communities. ASL signs are designed to be accessible and meaningful, and by using them as our input method, we could make music gaming inclusive for everyone while also helping people learn this beautiful language.
What it does
Picture this: notes are cascading down your screen just like in classic rhythm games, but instead of mashing buttons or tapping tiles, you and a friend are signing letters A through G in American Sign Language. Our computer vision system watches both players' hands in real-time and recognizes when you make the correct ASL sign, triggering the corresponding musical note to play.
Choose your favorite song, and midi-air will challenge you and your teammate to sign along to the melody together. See an 'A' note coming down on your side? Flash the ASL sign for 'A' and hear that note ring out perfectly in time with the music. Miss the timing or sign the wrong letter? You'll know immediately as your combo breaks and the music falters!
The game becomes this incredible collaborative dance between two hands and the music, where every gesture has meaning and every sign creates sound. You're not just playing a game – you're learning ASL together, making music as a team, and having a blast all at the same time. It's accessible gaming at its finest, where the barrier between hearing and deaf communities dissolves into pure, visual rhythm. Whether you're ASL pros showing off your lightning-fast signing skills in perfect sync or complete beginners learning the alphabet together, midi-air creates an unforgettable multiplayer experience where teamwork meets music and communication becomes celebration.
How we built it
Building midi-air was like assembling a high-tech puzzle where every piece had to talk to every other piece – and somehow we made it all work!
The core application was built using Next.js and React with TypeScript for type safety, styled with Tailwind CSS for a clean, responsive interface. We designed the user experience in Figma and implemented smooth animations using Framer Motion to create engaging visual feedback for players. The heart of our project lies in MediaPipe, Google's computer vision framework, which handles real-time hand tracking and gesture recognition through the webcam. This allows us to detect ASL signs with impressive accuracy and speed. We also used sprite assets from musedash.
For audio synthesis, we integrated Tone.js to generate musical notes in real-time based on the detected signs, creating an immediate audio-visual feedback loop. Font Awesome provided the iconography to polish the interface.
Challenges we ran into
One of our biggest hurdles was striking the perfect balance between smooth animations and optimal performance. We wanted midi-air to feel responsive and visually engaging, but every animation we added threatened to bog down the real-time computer vision processing. Finding that sweet spot between beautiful visuals and lightning-fast gesture recognition required constant optimization and creative problem-solving.
We also struggled with information design – figuring out exactly what users needed to see on screen without overwhelming them. Too little information and players felt lost; too much and the interface became cluttered and distracting. We went through countless iterations trying to present scores, timing feedback, upcoming notes, and ASL guidance in a way that enhanced gameplay rather than hindering it. Perhaps our most ambitious challenge was building what's essentially a real-time multiplayer rhythm game entirely in a web browser. Web technologies weren't exactly designed for the low-latency, high-performance requirements of music gaming. We had to push the boundaries of what's possible with web-based computer vision, audio synthesis, and multiplayer synchronization – all while ensuring the game worked reliably across different devices and browsers. It was like trying to fit a console gaming experience into a browser tab.
Accomplishments that we're proud of
The biggest technical challenge was synchronizing computer vision recognition, audio generation, and game state management across two players simultaneously while maintaining smooth performance. Getting all these systems to work together in real-time browser environment required careful optimization and state management. The result is a responsive web application that transforms sign language into music seamlessly!
What we learned
This project became an incredible learning experience for our entire team. Several of us dove deep into Next.js for the first time, discovering how powerful modern React frameworks can be for building complex, real-time applications. We also explored the fascinating world of custom machine learning models by training our own MediaPipe implementations to better recognize ASL gestures, which opened our eyes to how accessible computer vision has become. Beyond the technical skills, we learned valuable lessons about balancing user experience with performance constraints, and discovered that some of the most impactful projects come from combining accessibility with cutting-edge technology.
What's next for Midi-Air
Our development roadmap focuses on two key areas of expansion. First, we plan to significantly expand our song library by integrating additional MIDI processing capabilities to support a wider variety of musical genres and complexity levels. Second, we aim to implement sharp and flat notes by developing extended ASL gesture recognition or modifier systems that can represent the full chromatic scale. These enhancements would transform midi-air from a basic rhythm game into a more comprehensive musical education tool capable of handling complex musical arrangements and providing users with a broader range of learning opportunities through ASL-based interaction.
Built With
- figma
- font-awesome
- framer-motion
- garage-band
- media-pipe
- next.js
- online-sequencer
- react
- tailwind-css
- tone.js
- typescript

Log in or sign up for Devpost to join the conversation.