Inspiration
The primary inspiration came from the concept of "Autonomous Creativity." We wanted to see if an AI could not just generate a single image, but understand the narrative flow required for a 30-second commercial. The project draws aesthetic inspiration from futuristic, glassmorphic interfaces and the efficiency of modern cloud-based rendering pipelines.
What it does
SynthoReel takes a short text brief (e.g., "A commercial for a sleek mountain bike in the Alps") and:
- Architects a Script: Uses
gemini-3-flash-previewto break the concept into logical scenes with professional narration. - Visualizes Assets: Generates bespoke 16:9 high-resolution images for every scene using
gemini-2.5-flash-image. - Synthesizes Voice: Produces human-like voiceovers for the script using
gemini-2.5-flash-preview-tts. - Simulates Production: Applies cinematic camera movements (Pan, Zoom, Dolly) and syncs them to the audio duration in a real-time web player.
How we built it
The application is built using a modern React 19 frontend, utilizing Tailwind CSS for a low-latency, responsive UI. The core "intelligence" is provided by the @google/genai SDK.
We implemented a custom Production Orchestrator in App.tsx that manages the state transitions of the pipeline:
$$L_{total} = T_{script} + \max(T_{img1...n}, T_{audio1...n})$$
Where $L_{total}$ is the total production latency, optimized by running asset generation in parallel tasks.
Challenges we ran into
- Permission Paradigms: Navigating the 403 Permission Denied errors when switching between "Pro" and "Flash" models. We solved this by implementing a robust API Key selection flow and defaulting to the high-compatibility
gemini-2.5-flash-image. - Audio Buffering: The Gemini TTS API returns raw PCM data. We had to implement a custom
createWavHeaderfunction to wrap this raw data in a format the browser could play natively. - Ken Burns Syncing: Ensuring that CSS animations for camera movements perfectly matched the variable length of the generated audio files.
Accomplishments that we're proud of
- Zero-Edit Workflow: A user can go from an empty screen to a playing commercial without clicking a single "Edit" button.
- The " Ken Burns" Engine: Creating a CSS-based motion engine that breathes life into static images through coordinated transforms.
- Error Recovery: Implementing a system that detects missing API permissions and allows the user to rotate keys without losing their current production progress.
What we learned
We learned that multimodal LLMs are not just content generators; they are excellent project managers. Using a model like gemini-3-flash-preview to decide the "camera action" for another model's "image generation" creates a feedback loop that feels surprisingly cohesive.
What's next for SynthoReel
The next evolution of SynthoReel is the transition from static scenes to full motion video. We plan to integrate the Veo 3.1 models to generate 720p/1080p video clips for each scene, effectively moving from a "slideshow with motion" to a true cinematic experience.
Built With
- gemini3
- typescript
Log in or sign up for Devpost to join the conversation.