InfiniTV

Our Logo!
Enter your video prompt and InfinitV will generate and save your animation.
Video Generated with shown text

∞‑V (InfiniTV)

Team: Sidharth Anand, Rohan Shah, Yash Pradhan Event: Hack Berkeley 2025 Tracks: Creative, AI, Voice, Multimodal

🚀 Inspiration

Short-form video tools like Pika and Runway have revolutionized visual generation, but storytelling tools remain fragmented and short-lived. We aimed to create a seamless way to generate longer-form, narrative-driven scenes from just a single prompt—combining scriptwriting, voice acting, and visual storytelling into one pipeline. Our goal was to empower creators, educators, and learners to bring rich, multi-character scenes to life with no technical skills required.

🎥 What it does

∞‑V (InfiniTV) turns a one-line prompt into a fully rendered, 2–3 minute animated video scene. The tool:

Expands a short prompt into a multi-turn scene using Gemini and Groq.
Generates character-specific voice lines using ElevenLabs.
Renders the script into a playable scene using Ren'Py or optionally into a video using AI image tools.

The pipeline is fully automated, but flexible enough for human editing, making it ideal for creative prototyping, educational storytelling, or interactive learning tools.

🛠️ How we built it

Prompt → Script: Gemini and a Groq-hosted agent expand the prompt into a structured scene, including:
- Character descriptions
- Dialogues with speaker labels
- Scene transitions and timings
Script → Voice: We use ElevenLabs for natural-sounding, character-specific text-to-speech, with voice regeneration and playback support.
Voice + Script → Scene: Our renderer uses Ren’Py to create a playable 2D visual novel-style scene. For more cinematic animations, we experimented with OpenAI’s image model and RunwayML.
Frontend + Backend:
- React frontend (bootstrapped with Vercel v0)
- Flask backend orchestrating the generation pipeline
- File storage organized by prompt-driven project folders

🧱 Challenges we ran into

TTS Quality + Integration: We initially used Vapi for speech synthesis but found it lacked the text-to-speech capabilities we required. We pivoted mid-hackathon to ElevenLabs, which required rewriting parts of our pipeline.
Prompt Engineering for Scenes: Expanding a single sentence into a full 3-act scene (start, middle, end) with character development and timing was non-trivial. It took multiple iterations of prompt tuning and fine-grained instructions to get a consistent structure from the LLM. We initially tried using Groq-hosted models alone for script generation, but they didn’t produce scenes with enough structure or detail—so we integrated Gemini alongside Groq, which significantly improved the quality and depth of the generated scripts.
Image Generation + Ren’Py: Ren’Py doesn't support dynamic image generation. Adding AI image generation (via OpenAI or AnimateDiff) introduced roadblocks such as rate limits, poor visual alignment, and integration delays. This remains an area of experimentation.

🏆 Accomplishments that we're proud of

Fully functional pipeline from prompt to rendered scene
Successfully integrated Gemini, Groq, ElevenLabs, and Ren’Py
Built an interactive UI with audio playback and live scene preview
Clear educational and creative use cases, easy to demo and iterate
Strong team collaboration with clear division of backend, frontend, and AI tasks

📚 What we learned

How to design a modular, multi-agent AI pipeline
Effective prompt engineering strategies for scene generation
Trade-offs between real-time generation and pre-rendered assets
The limitations of visual storytelling engines like Ren’Py in AI workflows
Importance of fallbacks and flexibility when core tools don’t work as expected