Skip to content

SamurAIGPT/AI-Youtube-Shorts-Generator

Repository files navigation

AI YouTube Shorts Generator

AI-powered tool to automatically generate engaging YouTube Shorts from long-form videos. Uses GPT-class LLM highlight detection and Whisper to extract the most viral-worthy moments and crop them vertically for social media.

Looking for an api to build your own Opus-clip style product ? Check the api below

https://muapi.ai/playground/autocrop

https://muapi.ai/playground/ai-clipping

longshorts

Features

  • 🎬 YouTube In, Vertical Out: Hand it any YouTube URL — get back N viral-ready 9:16 mp4s
  • 🤖 Virality-Aware Highlight Selection: Clips ranked on hooks, emotional peaks, opinion bombs, revelation moments, conflict, quotable lines, story peaks, and practical value — not just generic "interesting"
  • 📈 Score + Hook + Reason for Every Clip: Each highlight comes with a viral score, an opening hook line, and a one-sentence explanation of why it works
  • 🎤 Local Whisper Transcription: No managed transcription service — your audio stays on your machine
  • 🧩 Long-Video Aware: Videos over 30 minutes are auto-chunked with overlap so nothing gets missed
  • ♻️ Smart Dedupe: Overlapping highlights are collapsed by score so you never get two near-duplicate clips
  • 🎯 Smart Vertical Crop: Auto-cropping handles face tracking and screen recordings automatically — no Haar cascades, no OpenCV setup
  • 📱 Any Aspect Ratio: 9:16 for TikTok/Reels/Shorts, 1:1 for square, anything else by flag
  • 🧰 CLI + Python Library: Use it from the shell or import generate_shorts(...) into your own pipeline
  • 📦 JSON Output: --output-json dumps the full result (transcript + every candidate highlight + final clip URLs) for downstream automation

Quick Start (No Setup)

Want better results without the setup? The AI Clipping API offers improved clip selection, faster processing, and no dependencies to manage.


Installation (Self-Hosted)

Prerequisites

  • Python 3.10+
  • FFmpeg (required by Whisper)
  • An API key for the clipping/highlight backend

Steps

  1. Clone the repository:

    git clone https://github.com/SamurAIGPT/AI-Youtube-Shorts-Generator.git
    cd AI-Youtube-Shorts-Generator
  2. Install system dependencies:

    Ubuntu/Debian:

    sudo apt install -y ffmpeg

    macOS:

    brew install ffmpeg

    Windows:

    • Install FFmpeg and add it to PATH
  3. Create and activate a virtual environment:

    python3.10 -m venv venv
    source venv/bin/activate
  4. Install Python dependencies:

    pip install -r requirements.txt
  5. Set up environment variables:

    Create a .env file in the project root (copy from .env.example):

    MUAPI_API_KEY=your_api_key_here

Usage

Single video

python main.py "https://www.youtube.com/watch?v=VIDEO_ID"

With options

python main.py "https://www.youtube.com/watch?v=VIDEO_ID" \
    --num-clips 5 \
    --aspect-ratio 9:16 \
    --whisper-model base \
    --output-json result.json

Local file

Drop in a hosted mp4 URL directly via the Python API (the CLI is YouTube-first):

from shorts_generator import generate_shorts

result = generate_shorts(
    "https://www.youtube.com/watch?v=...",
    num_clips=5,
    aspect_ratio="9:16",
)
for short in result["shorts"]:
    print(short["score"], short["title"], short["clip_url"])

Batch processing

Create a urls.txt file with one URL per line, then:

xargs -a urls.txt -I{} python main.py "{}"

CLI flags

Flag Default Notes
--num-clips 3 How many shorts to render
--aspect-ratio 9:16 Any ratio; 9:16 for TikTok/Reels, 1:1 for square
--format 720 Source download resolution: 360 / 480 / 720 / 1080
--whisper-model base tiny / base / small / medium / large
--language auto Force Whisper language code (e.g. en)
--output-json Dump the full result (transcript + all candidates) to a file

How It Works

  1. Download: Fetches the source video from YouTube
  2. Transcribe: Local Whisper produces a timestamped transcript
  3. Detect content type: An LLM classifies the video (podcast, interview, tutorial, vlog, etc.) and density, so the prompt can be tuned per content style
  4. Long-video chunking: Videos > 30 min are split into 20-min overlapping chunks
  5. Highlight ranking: An LLM scans the transcript through a virality framework — hook moments, emotional peaks, opinion bombs, revelations, conflict, quotables, story peaks, practical value — and emits ranked candidates with scores 0–100
  6. Dedupe: Overlapping candidates are collapsed by score (>50% overlap → keep the higher score)
  7. Top-N selection: The top --num-clips candidates are selected
  8. Auto-crop: Each highlight is rendered as a vertical short at the requested aspect ratio

Output: a list of mp4 URLs plus, for each clip, its title, viral score, hook sentence, and a one-line reason explaining why it should perform.

Output

Console output looks like:

========================================================================
Highlights:    7 candidates → kept top 3
========================================================================

#1  score=92  124.3s → 187.6s
     title:  The one mistake that cost me $50K
     hook:   "Nobody talks about this, but it killed my first startup..."
     clip:   https://.../short_1.mp4

#2  score=88  ...

--output-json result.json produces:

{
  "source_video_url": "...",
  "transcript": { "duration": 1873.4, "segments": [...] },
  "highlights": [ {...}, {...}, ... ],
  "shorts": [
    {
      "title": "...",
      "start_time": 124.3,
      "end_time": 187.6,
      "score": 92,
      "hook_sentence": "...",
      "virality_reason": "...",
      "clip_url": "https://.../short_1.mp4"
    }
  ]
}

Configuration

Highlight selection criteria

Edit shorts_generator/highlights.py:

  • Virality framework: VIRALITY_CRITERIA — the ranked list of signals the LLM optimizes for
  • System prompt: HIGHLIGHT_SYSTEM_PROMPT — duration sweet spot, hook rules, JSON schema
  • Chunk size: CHUNK_SIZE_SECONDS (default 1200) — chunk length for long videos
  • Long-video threshold: LONG_VIDEO_THRESHOLD (default 1800) — videos longer than this are chunked
  • Chunk overlap: CHUNK_OVERLAP_SECONDS (default 60) — overlap between chunks so cross-boundary clips aren't missed

Polling / timeout

Edit shorts_generator/config.py (or set env vars):

  • MUAPI_POLL_INTERVAL (default 5s) — seconds between job-status polls
  • MUAPI_POLL_TIMEOUT (default 1800s) — give up after this long

Whisper model size

  • tiny / base — fast, English-leaning, fine for clean speech
  • small / medium — better for accents, music backgrounds
  • large — highest accuracy, much slower; recommended only with a GPU

Project Structure

AI-Youtube-Shorts-Generator/
├── main.py                       CLI entry point
├── requirements.txt
├── .env.example
└── shorts_generator/
    ├── config.py                 env / settings
    ├── muapi.py                  generic submit + poll wrapper
    ├── downloader.py             YouTube source download
    ├── transcriber.py            local Whisper
    ├── highlights.py             LLM virality ranking + chunking + dedupe
    ├── clipper.py                vertical auto-crop
    └── pipeline.py               end-to-end orchestrator

Troubleshooting

ffmpeg not found on PATH

Whisper needs ffmpeg for audio decoding. Install it with brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu).

Whisper produced no segments

The video may have no detectable speech, or it may be in a language Whisper struggles with. Try --whisper-model medium and --language en (or the correct language code).

Looking for better results?

The AI Clipping API uses an improved algorithm that produces higher-quality clips with better highlight detection.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the MIT License.

Related Projects

Releases

No releases published

Packages

 
 
 

Contributors

Languages