∞‑V (InfiniTV)

Team: Sidharth Anand, Rohan Shah, Yash Pradhan Event: Hack Berkeley 2025 Tracks: Creative, AI, Voice, Multimodal

🚀 Inspiration

Short-form video tools like Pika and Runway have revolutionized visual generation, but storytelling tools remain fragmented and short-lived. We aimed to create a seamless way to generate longer-form, narrative-driven scenes from just a single prompt—combining scriptwriting, voice acting, and visual storytelling into one pipeline. Our goal was to empower creators, educators, and learners to bring rich, multi-character scenes to life with no technical skills required.

🎥 What it does

∞‑V (InfiniTV) turns a one-line prompt into a fully rendered, 2–3 minute animated video scene. The tool:

  • Expands a short prompt into a multi-turn scene using Gemini and Groq.

  • Generates character-specific voice lines using ElevenLabs.

  • Renders the script into a playable scene using Ren'Py or optionally into a video using AI image tools.

The pipeline is fully automated, but flexible enough for human editing, making it ideal for creative prototyping, educational storytelling, or interactive learning tools.

🛠️ How we built it

  • Prompt → Script: Gemini and a Groq-hosted agent expand the prompt into a structured scene, including:

    • Character descriptions
    • Dialogues with speaker labels
    • Scene transitions and timings
  • Script → Voice: We use ElevenLabs for natural-sounding, character-specific text-to-speech, with voice regeneration and playback support.

  • Voice + Script → Scene: Our renderer uses Ren’Py to create a playable 2D visual novel-style scene. For more cinematic animations, we experimented with OpenAI’s image model and RunwayML.

  • Frontend + Backend:

    • React frontend (bootstrapped with Vercel v0)
    • Flask backend orchestrating the generation pipeline
    • File storage organized by prompt-driven project folders

🧱 Challenges we ran into

  • TTS Quality + Integration: We initially used Vapi for speech synthesis but found it lacked the text-to-speech capabilities we required. We pivoted mid-hackathon to ElevenLabs, which required rewriting parts of our pipeline.

  • Prompt Engineering for Scenes: Expanding a single sentence into a full 3-act scene (start, middle, end) with character development and timing was non-trivial. It took multiple iterations of prompt tuning and fine-grained instructions to get a consistent structure from the LLM. We initially tried using Groq-hosted models alone for script generation, but they didn’t produce scenes with enough structure or detail—so we integrated Gemini alongside Groq, which significantly improved the quality and depth of the generated scripts.

  • Image Generation + Ren’Py: Ren’Py doesn't support dynamic image generation. Adding AI image generation (via OpenAI or AnimateDiff) introduced roadblocks such as rate limits, poor visual alignment, and integration delays. This remains an area of experimentation.

🏆 Accomplishments that we're proud of

  • Fully functional pipeline from prompt to rendered scene

  • Successfully integrated Gemini, Groq, ElevenLabs, and Ren’Py

  • Built an interactive UI with audio playback and live scene preview

  • Clear educational and creative use cases, easy to demo and iterate

  • Strong team collaboration with clear division of backend, frontend, and AI tasks

📚 What we learned

  • How to design a modular, multi-agent AI pipeline

  • Effective prompt engineering strategies for scene generation

  • Trade-offs between real-time generation and pre-rendered assets

  • The limitations of visual storytelling engines like Ren’Py in AI workflows

  • Importance of fallbacks and flexibility when core tools don’t work as expected

🔮 What's next for InfiniTV

  • Manual script editing: Let users fine-tune auto-generated scenes

  • Emotion tagging + voice expression syncing: More nuanced speech performance

  • Background music & sound effects: Enrich scene atmosphere

  • 3D rendering experiments: Extend visuals beyond 2D novels

Built With

Share this project:

Updates