Turn long-form videos into viral shorts instantly using Generative AI, Computer Vision, and Audio Alignment.
ViralReel AI is a fully automated video repurposing pipeline. It takes a long-form video (YouTube URL or file upload), intelligently analyzes the content to find "viral hooks," and autonomously renders vertical (9:16) shorts with Face Tracking and Karaoke Subtitles.
Unlike basic wrappers, this project implements a custom rendering engine using OpenCV and Multithreading to achieve better performance.
(Screen recording of the application interface)
- 🧠 AI Content Curator: Uses Google Gemini 2.5 Flash to analyze transcripts and identify the most engaging 30-60 second segments based on viral storytelling principles.
- 🗣️ Word-Level Alignment: Powered by WhisperX, providing millisecond-accurate timestamps for subtitles (Forced Alignment).
- 👀 Smart Face Tracking: Uses MediaPipe to detect the speaker and dynamically crop landscape video into vertical format, keeping the subject centered.
- ⚡ Parallel Rendering Engine: Renders 3 reels simultaneously using Python
ThreadPoolExecutor, maximizing GPU/CPU usage. - 🎨 Custom Karaoke Engine: A bespoke renderer built on
PILandOpenCVthat draws professional "Alex Hormozi style" subtitles with active word highlighting and auto-wrapping titles. - 🌐 Universal Downloader: Integrated
yt-dlpto download H.264/AVC web-compatible footage.
The pipeline consists of 5 distinct "Brains" working in sequence:
- Ingestion Layer: Downloads video and extracts raw audio (
16kHz PCM). - Transcription Layer (WhisperX): Transcribes audio and performs forced alignment to get
{word: start_time, end_time}JSON data. - Intelligence Layer (Gemini): Reads the transcript and identifies viral hooks, returning strict start/end timestamps and engaging titles.
- Vision Layer (MediaPipe): Scans video frames to calculate the "Center of Interest" (Face) for dynamic cropping.
- Rendering Layer (OpenCV + PIL):
- Composites the crop.
- Draws the dynamic karaoke overlay.
- Encodes to
H.264(Ultrafast preset) for instant playback on web/mobile.
- Python 3.10+
- FFmpeg installed on system (
sudo apt install ffmpeg) - GPU recommended (NVIDIA T4 or better for WhisperX)
git clone https://github.com/LakhindarPal/ViralReel-AI.git
cd ViralReel-AIpip install -r requirements.txtYou need a Google Gemini API Key (Free tier available at Google AI Studio).
Create a .env file or export it in your terminal:
export GOOGLE_API_KEY="your_api_key_here"Run the Gradio interface:
python app.py- Open the local URL provided (e.g.,
http://127.0.0.1:7860). - Input: Paste a YouTube URL or upload an MP4 file.
- Click: "Generate Reels".
- Wait: The system logs will update in real-time as it downloads, transcribes, thinks, and renders.
- Result: 3 ready-to-upload viral shorts will appear with their specific titles.
You can tweak the constants in app.py to change the behavior:
MAX_DURATION = 60 # Hard cap for reel length (seconds)
BATCH_SIZE = 16 # Whisper inference batch size (Lower if VRAM is low)
DEVICE = "cuda" # "cpu" or "cuda"- Subtitle Jitter: Solved by replacing sliding window logic with a "Chunking" algorithm that groups words into blocks of 3 for readability.
- Web Playback Issues: OpenCV defaults to raw codecs. Implemented an FFmpeg post-processing step to enforce
yuv420ppixel format andlibx264encoding for browser compatibility. - 403 Forbidden Errors: Hardened the
yt-dlpdownloader with custom User-Agent headers to mimic a real Chrome browser.
Distributed under the Apache-2.0 license. See LICENSE file for more information.
Built with ❤️ by Lakhindar Pal