Inspiration

I often come up with unique instrument patterns but don’t always have access to a DAW or instrument to record them. To solve this, I created a lightweight React web application that works seamlessly on desktop browsers. This project also served as a personal challenge — to build a fully “vibe-coded” application from start to finish.

What it does

Turn your voice into music. A browser-based, AI-powered drum, guitar and piano melody machine that transcribes your vocalizations into a multi-track MIDI sequence.

How it does

  1. User Input: The user records their voice directly in the browser or uploads a pre-existing audio file.
  2. Gemini Transcription: The audio data is sent to the Gemini API with a specialized prompt. The prompt instructs the model to transcribe the sounds (distinguishing between beatboxing and singing) and return each distinct "word" along with its precise start time. A strict JSON schema is enforced on the response to ensure data consistency.
  3. Vocal Mapping: The application's logic parses the structured response from Gemini. It then cross-references the transcribed words/syllables with the user's custom mappings (e.g., "boom" -> "kick", "do" -> "C4").
  4. Hit Generation: For each successfully mapped sound, a Hit object is created. This object contains its instrument ID, precise timing within the sequence, and, for melodic hits, the specific note and its duration.
  5. Sequencer Update: The newly generated hits are dynamically added to the active track in the MIDI editor, providing immediate visual feedback.
  6. Audio Playback: The Web Audio API is used to schedule and synthesize all instrument sounds based on the data in the sequencer, ensuring sample-accurate playback.

How I built it

The entire application was built using Google AI Studio and deployed on Cloud Run. All development, from concept to deployment, was done through AI Studio. Read my experience in detail here.

Challenges I ran into

The biggest challenge was learning how to effectively use AI Studio. In my first attempt, I provided overly detailed project structures, which made the process unnecessarily complex. On my second attempt, I refined my approach — limiting each prompt to 1–3 specific tasks based on complexity — which significantly improved the workflow.

Accomplishments that I am proud of

My main goal was to create and deploy a project without writing any code myself. Despite being an experienced developer, I wanted to rely entirely on AI-driven development. After some initial learning curves, I successfully built and deployed a 100% vibe-coded application.

What's next for VoiceBeats

  • Make the app responsive for mobile and tablet applications.
  • After successfully implementing Gemini Voice Mapping, my next goal is to explore sound mapping, enabling users to use taps, claps, and clicks to create rhythm patterns.

Built With

Share this project:

Updates