Inspiration
I often come up with unique instrument patterns but don’t always have access to a DAW or instrument to record them. To solve this, I created a lightweight React web application that works seamlessly on desktop browsers. This project also served as a personal challenge — to build a fully “vibe-coded” application from start to finish.
What it does
Turn your voice into music. A browser-based, AI-powered drum, guitar and piano melody machine that transcribes your vocalizations into a multi-track MIDI sequence.
How it does
- User Input: The user records their voice directly in the browser or uploads a pre-existing audio file.
- Gemini Transcription: The audio data is sent to the Gemini API with a specialized prompt. The prompt instructs the model to transcribe the sounds (distinguishing between beatboxing and singing) and return each distinct "word" along with its precise start time. A strict JSON schema is enforced on the response to ensure data consistency.
- Vocal Mapping: The application's logic parses the structured response from Gemini. It then cross-references the transcribed words/syllables with the user's custom mappings (e.g., "boom" -> "kick", "do" -> "C4").
- Hit Generation: For each successfully mapped sound, a
Hitobject is created. This object contains its instrument ID, precise timing within the sequence, and, for melodic hits, the specific note and its duration. - Sequencer Update: The newly generated hits are dynamically added to the active track in the MIDI editor, providing immediate visual feedback.
- Audio Playback: The Web Audio API is used to schedule and synthesize all instrument sounds based on the data in the sequencer, ensuring sample-accurate playback.
How I built it
The entire application was built using Google AI Studio and deployed on Cloud Run. All development, from concept to deployment, was done through AI Studio. Read my experience in detail here.
Challenges I ran into
The biggest challenge was learning how to effectively use AI Studio. In my first attempt, I provided overly detailed project structures, which made the process unnecessarily complex. On my second attempt, I refined my approach — limiting each prompt to 1–3 specific tasks based on complexity — which significantly improved the workflow.
Accomplishments that I am proud of
My main goal was to create and deploy a project without writing any code myself. Despite being an experienced developer, I wanted to rely entirely on AI-driven development. After some initial learning curves, I successfully built and deployed a 100% vibe-coded application.
What's next for VoiceBeats
- Make the app responsive for mobile and tablet applications.
- After successfully implementing Gemini Voice Mapping, my next goal is to explore sound mapping, enabling users to use taps, claps, and clicks to create rhythm patterns.
Built With
- ai-studio
- react


Log in or sign up for Devpost to join the conversation.