Inspiration
At major concerts like Travis Scott's shows, hand sign interpreters bring music to life for deaf and hard-of-hearing audiences. Music has transformative healing power, and everyone deserves to experience it. We wanted to flip this concept—what if hand signs could create music in real-time? What if ASL itself could become an instrument, allowing anyone to perform and shape music through gesture alone?
What it does
Singing Hands translates American Sign Language into live, dynamic music. Users sign lyrics using ASL, and the system generates corresponding instrumental music that responds in real-time to hand movements:
Hand position (left/right): Controls tempo—move right for faster speed, left for slower Hand position (up/down): Controls pitch—raise hands for higher notes, lower for deeper tones ASL signs: Represent lyrics and musical phrases
The result is an interactive musical instrument controlled entirely through sign language, creating an inclusive bridge between deaf culture and musical expression.
How we built it
MediaPipe: Real-time hand tracking and gesture recognition Gemini AI: Classifies complex ASL words and multi-hand gestures Claude AI: Assists with gesture interpretation and workflow logic Google DeepMind's Lyria: Real-time audio generation engine WebSocket Architecture: Manages live streaming between hand detection, AI classification, and audio synthesis Python Backend: Orchestrates the entire pipeline with low-latency processing
We built a socket-based system that captures hand gestures at 30+ fps, classifies them through our AI pipeline, and streams parameters to Lyria for instant lo-fi music generation.
Challenges we ran into
Our original vision was to create a singing AI that would vocalize the signed lyrics in real-time. However, we hit a major technical barrier: the latency between ASL translation and vocal synthesis was too high for a seamless musical experience. The pipeline (gesture → classification → lyric extraction → vocal generation) introduced 2-3 second delays, breaking the real-time magic. Socket session management also proved incredibly complex—maintaining stable WebSocket connections while handling rapid gesture data, AI processing, and audio streaming required careful caching strategies and state management we hadn't anticipated.
Accomplishments that we're proud of
✨ Achieved real-time instrumental music generation with under 200ms latency 🎵 Dynamic music control: Pitch and tempo respond instantly to hand position 🤝 Accurate ASL interpretation for both one-handed and two-handed signs 🎼 Created a playable instrument entirely controlled by sign language 🏆 Won 2nd place at the Claude Hackathon!
What we learned
WebSocket architecture for real-time AI applications—session management, connection pooling, and state synchronization Latency optimization strategies for chaining multiple AI models Caching techniques to reduce redundant API calls during continuous gesture streams The technical challenges of real-time audio-visual synthesis How to pivot quickly when core technical assumptions don't hold
What's next for Singing Hands
We're not giving up on the original vision! Next steps:
Real-time lyrical translation: Implementing faster ASL-to-text pipelines and optimized vocal synthesis to achieve singing with lyrics Expanded gesture vocabulary: Supporting full ASL alphabet and common phrases Multi-instrumental support: Let users choose instruments (piano, guitar, drums) through gestures Performance mode: Record and playback ASL musical performances Accessibility features: Make this a tool for music education in deaf communities
Our goal is to enable full song performances—lyrics, melody, and harmony—using only ASL.
Built With
claude gemini google-deepmind-lyria mediapipe python websockets
Log in or sign up for Devpost to join the conversation.