AI-powered tool to automatically generate engaging YouTube Shorts from long-form videos. Uses GPT-class LLM highlight detection and Whisper to extract the most viral-worthy moments and crop them vertically for social media.
Looking for an api to build your own Opus-clip style product ? Check the api below
https://muapi.ai/playground/autocrop
https://muapi.ai/playground/ai-clipping
- 🎬 YouTube In, Vertical Out: Hand it any YouTube URL — get back N viral-ready 9:16 mp4s
- 🤖 Virality-Aware Highlight Selection: Clips ranked on hooks, emotional peaks, opinion bombs, revelation moments, conflict, quotable lines, story peaks, and practical value — not just generic "interesting"
- 📈 Score + Hook + Reason for Every Clip: Each highlight comes with a viral score, an opening hook line, and a one-sentence explanation of why it works
- 🎤 Local Whisper Transcription: No managed transcription service — your audio stays on your machine
- 🧩 Long-Video Aware: Videos over 30 minutes are auto-chunked with overlap so nothing gets missed
- ♻️ Smart Dedupe: Overlapping highlights are collapsed by score so you never get two near-duplicate clips
- 🎯 Smart Vertical Crop: Auto-cropping handles face tracking and screen recordings automatically — no Haar cascades, no OpenCV setup
- 📱 Any Aspect Ratio: 9:16 for TikTok/Reels/Shorts, 1:1 for square, anything else by flag
- 🧰 CLI + Python Library: Use it from the shell or import
generate_shorts(...)into your own pipeline - 📦 JSON Output:
--output-jsondumps the full result (transcript + every candidate highlight + final clip URLs) for downstream automation
Want better results without the setup? The AI Clipping API offers improved clip selection, faster processing, and no dependencies to manage.
- Python 3.10+
- FFmpeg (required by Whisper)
- An API key for the clipping/highlight backend
-
Clone the repository:
git clone https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator.git cd AI-Youtube-Shorts-Generator -
Install system dependencies:
Ubuntu/Debian:
sudo apt install -y ffmpeg
macOS:
brew install ffmpeg
Windows:
- Install FFmpeg and add it to PATH
-
Create and activate a virtual environment:
python3.10 -m venv venv source venv/bin/activate -
Install Python dependencies:
pip install -r requirements.txt
-
Set up environment variables:
Create a
.envfile in the project root (copy from.env.example):MUAPI_API_KEY=your_api_key_here
python main.py "https://www.youtube.com/watch?v=VIDEO_ID"python main.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--num-clips 5 \
--aspect-ratio 9:16 \
--whisper-model base \
--output-json result.jsonDrop in a hosted mp4 URL directly via the Python API (the CLI is YouTube-first):
from shorts_generator import generate_shorts
result = generate_shorts(
"https://www.youtube.com/watch?v=...",
num_clips=5,
aspect_ratio="9:16",
)
for short in result["shorts"]:
print(short["score"], short["title"], short["clip_url"])Create a urls.txt file with one URL per line, then:
xargs -a urls.txt -I{} python main.py "{}"| Flag | Default | Notes |
|---|---|---|
--num-clips |
3 |
How many shorts to render |
--aspect-ratio |
9:16 |
Any ratio; 9:16 for TikTok/Reels, 1:1 for square |
--format |
720 |
Source download resolution: 360 / 480 / 720 / 1080 |
--whisper-model |
base |
tiny / base / small / medium / large |
--language |
auto | Force Whisper language code (e.g. en) |
--output-json |
— | Dump the full result (transcript + all candidates) to a file |
- Download: Fetches the source video from YouTube
- Transcribe: Local Whisper produces a timestamped transcript
- Detect content type: An LLM classifies the video (podcast, interview, tutorial, vlog, etc.) and density, so the prompt can be tuned per content style
- Long-video chunking: Videos > 30 min are split into 20-min overlapping chunks
- Highlight ranking: An LLM scans the transcript through a virality framework — hook moments, emotional peaks, opinion bombs, revelations, conflict, quotables, story peaks, practical value — and emits ranked candidates with scores 0–100
- Dedupe: Overlapping candidates are collapsed by score (>50% overlap → keep the higher score)
- Top-N selection: The top
--num-clipscandidates are selected - Auto-crop: Each highlight is rendered as a vertical short at the requested aspect ratio
Output: a list of mp4 URLs plus, for each clip, its title, viral score, hook sentence, and a one-line reason explaining why it should perform.
Console output looks like:
========================================================================
Highlights: 7 candidates → kept top 3
========================================================================
#1 score=92 124.3s → 187.6s
title: The one mistake that cost me $50K
hook: "Nobody talks about this, but it killed my first startup..."
clip: https://.../short_1.mp4
#2 score=88 ...
--output-json result.json produces:
{
"source_video_url": "...",
"transcript": { "duration": 1873.4, "segments": [...] },
"highlights": [ {...}, {...}, ... ],
"shorts": [
{
"title": "...",
"start_time": 124.3,
"end_time": 187.6,
"score": 92,
"hook_sentence": "...",
"virality_reason": "...",
"clip_url": "https://.../short_1.mp4"
}
]
}Edit shorts_generator/highlights.py:
- Virality framework:
VIRALITY_CRITERIA— the ranked list of signals the LLM optimizes for - System prompt:
HIGHLIGHT_SYSTEM_PROMPT— duration sweet spot, hook rules, JSON schema - Chunk size:
CHUNK_SIZE_SECONDS(default 1200) — chunk length for long videos - Long-video threshold:
LONG_VIDEO_THRESHOLD(default 1800) — videos longer than this are chunked - Chunk overlap:
CHUNK_OVERLAP_SECONDS(default 60) — overlap between chunks so cross-boundary clips aren't missed
Edit shorts_generator/config.py (or set env vars):
MUAPI_POLL_INTERVAL(default 5s) — seconds between job-status pollsMUAPI_POLL_TIMEOUT(default 1800s) — give up after this long
tiny/base— fast, English-leaning, fine for clean speechsmall/medium— better for accents, music backgroundslarge— highest accuracy, much slower; recommended only with a GPU
AI-Youtube-Shorts-Generator/
├── main.py CLI entry point
├── requirements.txt
├── .env.example
└── shorts_generator/
├── config.py env / settings
├── muapi.py generic submit + poll wrapper
├── downloader.py YouTube source download
├── transcriber.py local Whisper
├── highlights.py LLM virality ranking + chunking + dedupe
├── clipper.py vertical auto-crop
└── pipeline.py end-to-end orchestrator
Whisper needs ffmpeg for audio decoding. Install it with brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu).
The video may have no detectable speech, or it may be in a language Whisper struggles with. Try --whisper-model medium and --language en (or the correct language code).
The AI Clipping API uses an improved algorithm that produces higher-quality clips with better highlight detection.
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.
