SynthoReel

Inspiration

The primary inspiration came from the concept of "Autonomous Creativity." We wanted to see if an AI could not just generate a single image, but understand the narrative flow required for a 30-second commercial. The project draws aesthetic inspiration from futuristic, glassmorphic interfaces and the efficiency of modern cloud-based rendering pipelines.

What it does

SynthoReel takes a short text brief (e.g., "A commercial for a sleek mountain bike in the Alps") and:

Architects a Script: Uses gemini-3-flash-preview to break the concept into logical scenes with professional narration.
Visualizes Assets: Generates bespoke 16:9 high-resolution images for every scene using gemini-2.5-flash-image.
Synthesizes Voice: Produces human-like voiceovers for the script using gemini-2.5-flash-preview-tts.
Simulates Production: Applies cinematic camera movements (Pan, Zoom, Dolly) and syncs them to the audio duration in a real-time web player.

How we built it

The application is built using a modern React 19 frontend, utilizing Tailwind CSS for a low-latency, responsive UI. The core "intelligence" is provided by the @google/genai SDK.

We implemented a custom Production Orchestrator in App.tsx that manages the state transitions of the pipeline: $$L_{total} = T_{script} + \max(T_{img1...n}, T_{audio1...n})$$ Where $L_{total}$ is the total production latency, optimized by running asset generation in parallel tasks.

Challenges we ran into

Permission Paradigms: Navigating the 403 Permission Denied errors when switching between "Pro" and "Flash" models. We solved this by implementing a robust API Key selection flow and defaulting to the high-compatibility gemini-2.5-flash-image.
Audio Buffering: The Gemini TTS API returns raw PCM data. We had to implement a custom createWavHeader function to wrap this raw data in a format the browser could play natively.
Ken Burns Syncing: Ensuring that CSS animations for camera movements perfectly matched the variable length of the generated audio files.

Accomplishments that we're proud of

Zero-Edit Workflow: A user can go from an empty screen to a playing commercial without clicking a single "Edit" button.
The " Ken Burns" Engine: Creating a CSS-based motion engine that breathes life into static images through coordinated transforms.
Error Recovery: Implementing a system that detects missing API permissions and allows the user to rotate keys without losing their current production progress.

What we learned

We learned that multimodal LLMs are not just content generators; they are excellent project managers. Using a model like gemini-3-flash-preview to decide the "camera action" for another model's "image generation" creates a feedback loop that feels surprisingly cohesive.

What's next for SynthoReel

The next evolution of SynthoReel is the transition from static scenes to full motion video. We plan to integrate the Veo 3.1 models to generate 720p/1080p video clips for each scene, effectively moving from a "slideshow with motion" to a true cinematic experience.

Built With

gemini3
typescript

Updates

Hector Ta started this project — Feb 09, 2026 01:19 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.