Inspiration
We wanted to build something that makes you feel like a kid again. That feeling of drawing your own superhero on the back of a notebook and imagining them in a real fight. We thought: what if you could actually see that happen? What if you could draw a character in the air with your hands, describe their powers with your voice, and then watch them go head-to-head with an anime boss in a cinematic battle? We're all huge anime fans, and the idea of turning a rough sketch into a full-blown animated showdown was too exciting not to build.
What it does
Drawn lets you create your own anime hero and watch them battle iconic bosses. You draw your character mid-air using hand gestures tracked by your webcam. While you draw, you talk out loud to describe your hero's powers, backstory, and abilities. The app listens, understands, and even responds to voice commands like changing pen color, adding fire particle effects, or setting a backdrop.
Once your drawing is done, everything kicks off: the app analyzes your sketch and voice description, generates a full origin story with narration, creates a polished anime portrait of your character, and produces a cinematic 8-second battle video. You hear a sports announcer hyping up the fight, watch the action unfold, and then a dramatic winner reveal tells you if your hero won or lost. Every battle gets saved to a gallery so you can revisit your creations.
How we built it
The frontend is React with Vite and Tailwind CSS. For hand tracking, we use MediaPipe's HandLandmarker running on the GPU through the browser's webcam. We detect a pinch gesture (thumb meets index finger) to draw, and we built adaptive smoothing with speed-based filtering and quadratic Bezier curve interpolation to make the strokes feel natural instead of jagged.
Voice input runs on the Web Speech API in continuous mode. We parse commands in real time (colors, tools, effects, backdrops) and accumulate the full transcript for character description extraction later.
The backend is Express with MongoDB for persistence. When the drawing phase ends, we send the sketch and voice transcript to Google's Gemini 2.0 Flash, which does double duty: it parses the messy voice transcript into clean character lore, AND it visually analyzes the drawing to determine the battle outcome, generate a narrative, write fight commentary, and craft prompts for the other services.
From there, three things happen in parallel: Imagen 3.0 generates a polished anime portrait from the sketch, ElevenLabs converts the origin story into dramatic narration with a dedicated narrator voice, and Veo 3.0 takes the hero image and creates an 8-second cinematic battle video. We use three different ElevenLabs voices for narration, fight commentary, and the winner announcement.
Everything ties together in the battle screen: rumble intro with timed word reveals, video playback with live commentary audio, confetti, and a winner card that gets auto-saved to the gallery via Cloudinary and MongoDB.
Challenges we ran into
Hand tracking was brutal at first. Raw landmark positions from MediaPipe are noisy, so early drawings looked like they were made during an earthquake. We had to build a custom adaptive smoothing system where the filter strength changes based on how fast your hand moves. Fast movements get less smoothing (so you can make quick strokes), slow movements get heavy smoothing (so details stay clean). We also added minimum distance filtering and Bezier curve interpolation to get strokes that actually look hand-drawn instead of pixelated.
Video generation was another headache. Veo 3.0's safety filter kept rejecting our prompts because words like "explosion," "shatter," and "destroy" are common in anime battle descriptions. We ended up building a sanitizer with 50+ regex patterns that replaces flagged words with safe alternatives before the prompt ever reaches the API. Getting the balance right between "epic anime battle" and "family-friendly enough for the content filter" was a real puzzle.
Synchronizing all the async operations was tricky too. We have narration audio, a generated image, and a video all being created in parallel, and the UI needs to transition smoothly between phases regardless of which finishes first. Lots of refs, state flags, and timeout fallbacks to keep everything feeling cinematic instead of broken.
Accomplishments that we're proud of
The hand tracking drawing actually feels good. When you pinch to draw and see smooth strokes appear in the air in real time, it genuinely feels magical. The adaptive smoothing was worth every hour we spent on it.
The voice command system during drawing is something we're really happy with. You can say "red," "thicker," "fire," "city backdrop," or "I'm done" and the app just responds instantly while still accumulating your character description in the background. It feels like talking to your canvas.
The full pipeline working end-to-end is wild. You go from waving your hand in front of a webcam to watching a cinematic anime battle with narration, commentary, and a winner reveal. The fact that every piece connects smoothly (hand tracking, voice parsing, image generation, video creation, three different narrator voices, gallery persistence) is something we didn't think we'd actually pull off in a hackathon.
What we learned
MediaPipe in the browser is incredibly powerful but the raw data needs a lot of post-processing to be usable. We learned a ton about signal processing, smoothing algorithms, and gesture detection thresholds.
We learned that content safety filters on generative video are way more aggressive than on text or images, and that prompt engineering for video generation is its own skill. You can't just describe what you want, you have to describe it in a way that won't get flagged.
Coordinating multiple async services (Gemini, Imagen, Veo, ElevenLabs) in a user-facing flow taught us a lot about graceful degradation. Not everything will finish on time, not everything will succeed, and the user experience needs to feel smooth regardless.
What's next for Drawn
Multiplayer battles where two players draw heroes and face off against each other. We also want to add more bosses (user-submitted ones), let players customize their arena with voice commands during the battle phase, and build a leaderboard based on win streaks. Longer and more dynamic battle videos with multiple rounds would make the fights feel even more intense. We'd also love to bring it to mobile with on-screen touch drawing as an alternative to hand tracking.
Built With
- cloudinary
- computer-vision
- elevenlabs
- express.js
- google-gemini
- javascript
- mediapipe
- mongodb
- react
- tailwind
- veo
Log in or sign up for Devpost to join the conversation.