Skip to content

hammerlyrodrigo/video_translator

Repository files navigation

Video to SRT Subtitle Translator

This application transcribes speech from a video or audio file (single file or whole folder), optionally translates it to a target language, and generates SRT subtitle files. It uses OpenAI Whisper for transcription and Hugging Face Transformers for translation.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/video_translator.git
    cd video_translator
  2. Install Conda (if not already installed):
    Miniconda download page

  3. Create and activate the environment:

    conda env create -f environment.yml
    conda activate video_translator_env

    Note: ensure ffmpeg is listed in environment.yml so ffmpeg and ffprobe are installed into the environment:

    dependencies:
      - python=3.9
      - ffmpeg
      # ...
  4. (Optional) If you want to use CUDA, ensure you have the appropriate NVIDIA drivers installed.

Usage

Run the CLI after activating the environment:

python src/main.py --input-video path/to/video.mp4 --output-srt path/to/output.srt --output-lang en

or use conda run:

conda run -n video_translator_env python src/main.py --input-video path/to/video.mp4 -o out.srt -ol es

You can also transcribe an entire folder of media files (batch mode) with --input-folder. See examples below.

Use --help for full option list:

python src/main.py --help

Command-Line Arguments (high level)

  • -i, --input-video PATH : Path to input video or audio file (single-file mode).
  • -I, --input-folder PATH : Path to a folder containing media files. All supported files in the folder will be processed (batch mode).
  • -o, --output-srt PATH : Path to save output SRT (single-file mode). If omitted, SRT will be placed next to input with the same name and .srt extension.
  • -od, --output-dir PATH : Target directory for SRT files when using --input-folder. If omitted, each SRT is placed next to its source file.
  • -il, --input-lang TEXT : Input language code (e.g., 'en', 'ru'). If not provided, Whisper will auto-detect.
  • -ol, --output-lang TEXT : Output language code for translation (e.g., 'es', 'fr'). Required unless --transcribe-only is set.
  • -t, --transcribe-only : Only transcribe the input, do not translate.
  • -wm, --whisper-model TEXT : Whisper model size (tiny, base, small, medium, large, ...).
  • -td, --translator-device TEXT : Device for translation model (cuda or cpu).
  • -trd, --transcriber-device TEXT : Device for transcription model (cuda or cpu).
  • --temp-audio-dir PATH : Directory for temporary audio files (default: temp_audio).
  • -ldo, --lang-detect-offset TEXT : Timestamp offset to start a short slice for language detection (e.g. 90, 01:30, 00:01:30).

Features

  • Transcription: Uses OpenAI Whisper for high-quality speech-to-text conversion
  • Translation: Leverages Hugging Face Transformers for accurate language translation
  • Batch Processing: Process entire folders of media files automatically
  • Language Detection: Automatic source language detection with optional manual hints
  • Multiple Formats: Supports common video and audio formats
  • SRT Output: Generates standard SRT subtitle files with proper timing
  • GPU Acceleration: CUDA support for faster processing (optional)
  • Repetition Filtering: Automatically detects and cleans up Whisper's repetitive transcription loops
  • GUI Integration: KDE context menu integration for easy right-click processing

Supported Formats

Supported common video formats: .mp4, .mkv, .mov, .avi, .wmv
Supported audio formats: .mp3, .wav, .flac, .aac, .m4a

The CLI first checks file extensions and will fall back to ffprobe to detect supported media when the extension is unknown. Ensure ffmpeg/ffprobe are available in your environment.

Examples

Single-file transcribe + translate:

python src/main.py -i myvideo.mp4 -o subtitles.srt -ol es

Single-file transcribe only:

python src/main.py -i myaudio.mp3 -o transcript.srt --transcribe-only

Batch: process all supported files in a folder, write SRTs next to each source:

python src/main.py -I /path/to/media_folder -td cpu -trd cpu -ol es

Batch: process folder and place all SRTs in a target directory:

python src/main.py -I /path/to/media_folder -od /path/to/output_srt_dir -ol es

Batch example for MKV files mixed with others (folder mode filters supported extensions):

python src/main.py -I /home/agentic/videos -od /home/agentic/videos/subtitles -ol en

Notes:

  • For single-file mode, if -o/--output-srt is omitted the SRT file will be created next to the source file with the .srt extension.
  • For folder mode, each processed file will produce an SRT with the original filename stem and .srt extension.

KDE Desktop Integration

For KDE users, the project includes a context menu integration that allows you to right-click on video files and translate them directly from Dolphin file manager.

Setup (automatic with the provided script):

  1. A service menu file is installed to ~/.local/share/kio/servicemenus/video-translator.desktop
  2. Right-click on any video/audio file → "Video Translator" submenu
  3. Choose your target language or "Transcribe Only"
  4. A terminal window shows progress and results

Available Options:

  • Translate to English
  • Translate to Spanish
  • Translate to French
  • Transcribe Only (no translation)

The context menu automatically handles file paths with spaces and provides visual feedback during processing.

Troubleshooting

  • If you get ffmpeg/ffprobe not found errors, ensure ffmpeg is installed and on PATH in the conda env:

    conda activate video_translator_env
    conda install ffmpeg
    ffmpeg -version
    ffprobe -version
  • If the tool refuses a file, run ffprobe path/to/file to confirm ffmpeg recognizes it.

  • Repetitive Text in Transcripts: The tool automatically detects and cleans up Whisper's repetitive loops (like "yeah, yeah, yeah..." repeated dozens of times). The cleaning process:

    • Limits consecutive word repetitions to a maximum of 5 instances
    • Detects segments where a single word makes up >70% of the content
    • Replaces obviously looped segments with "[unclear audio]"
    • Preserves natural speech patterns and reasonable repetitions
  • KDE Context Menu Issues: If the right-click menu doesn't appear or work:

    # Rebuild KDE service cache
    kbuildsycoca5
    # Check if the service file exists
    ls ~/.local/share/kio/servicemenus/video-translator.desktop

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors