This application transcribes speech from a video or audio file (single file or whole folder), optionally translates it to a target language, and generates SRT subtitle files. It uses OpenAI Whisper for transcription and Hugging Face Transformers for translation.
-
Clone the repository:
git clone https://github.com/yourusername/video_translator.git cd video_translator -
Install Conda (if not already installed):
Miniconda download page -
Create and activate the environment:
conda env create -f environment.yml conda activate video_translator_env
Note: ensure
ffmpegis listed inenvironment.ymlsoffmpegandffprobeare installed into the environment:dependencies: - python=3.9 - ffmpeg # ...
-
(Optional) If you want to use CUDA, ensure you have the appropriate NVIDIA drivers installed.
Run the CLI after activating the environment:
python src/main.py --input-video path/to/video.mp4 --output-srt path/to/output.srt --output-lang enor use conda run:
conda run -n video_translator_env python src/main.py --input-video path/to/video.mp4 -o out.srt -ol esYou can also transcribe an entire folder of media files (batch mode) with --input-folder. See examples below.
Use --help for full option list:
python src/main.py --help-i, --input-video PATH: Path to input video or audio file (single-file mode).-I, --input-folder PATH: Path to a folder containing media files. All supported files in the folder will be processed (batch mode).-o, --output-srt PATH: Path to save output SRT (single-file mode). If omitted, SRT will be placed next to input with the same name and.srtextension.-od, --output-dir PATH: Target directory for SRT files when using--input-folder. If omitted, each SRT is placed next to its source file.-il, --input-lang TEXT: Input language code (e.g., 'en', 'ru'). If not provided, Whisper will auto-detect.-ol, --output-lang TEXT: Output language code for translation (e.g., 'es', 'fr'). Required unless--transcribe-onlyis set.-t, --transcribe-only: Only transcribe the input, do not translate.-wm, --whisper-model TEXT: Whisper model size (tiny,base,small,medium,large, ...).-td, --translator-device TEXT: Device for translation model (cudaorcpu).-trd, --transcriber-device TEXT: Device for transcription model (cudaorcpu).--temp-audio-dir PATH: Directory for temporary audio files (default:temp_audio).-ldo, --lang-detect-offset TEXT: Timestamp offset to start a short slice for language detection (e.g.90,01:30,00:01:30).
- Transcription: Uses OpenAI Whisper for high-quality speech-to-text conversion
- Translation: Leverages Hugging Face Transformers for accurate language translation
- Batch Processing: Process entire folders of media files automatically
- Language Detection: Automatic source language detection with optional manual hints
- Multiple Formats: Supports common video and audio formats
- SRT Output: Generates standard SRT subtitle files with proper timing
- GPU Acceleration: CUDA support for faster processing (optional)
- Repetition Filtering: Automatically detects and cleans up Whisper's repetitive transcription loops
- GUI Integration: KDE context menu integration for easy right-click processing
Supported common video formats: .mp4, .mkv, .mov, .avi, .wmv
Supported audio formats: .mp3, .wav, .flac, .aac, .m4a
The CLI first checks file extensions and will fall back to ffprobe to detect supported media when the extension is unknown. Ensure ffmpeg/ffprobe are available in your environment.
Single-file transcribe + translate:
python src/main.py -i myvideo.mp4 -o subtitles.srt -ol esSingle-file transcribe only:
python src/main.py -i myaudio.mp3 -o transcript.srt --transcribe-onlyBatch: process all supported files in a folder, write SRTs next to each source:
python src/main.py -I /path/to/media_folder -td cpu -trd cpu -ol esBatch: process folder and place all SRTs in a target directory:
python src/main.py -I /path/to/media_folder -od /path/to/output_srt_dir -ol esBatch example for MKV files mixed with others (folder mode filters supported extensions):
python src/main.py -I /home/agentic/videos -od /home/agentic/videos/subtitles -ol enNotes:
- For single-file mode, if
-o/--output-srtis omitted the SRT file will be created next to the source file with the.srtextension. - For folder mode, each processed file will produce an SRT with the original filename stem and
.srtextension.
For KDE users, the project includes a context menu integration that allows you to right-click on video files and translate them directly from Dolphin file manager.
Setup (automatic with the provided script):
- A service menu file is installed to
~/.local/share/kio/servicemenus/video-translator.desktop - Right-click on any video/audio file → "Video Translator" submenu
- Choose your target language or "Transcribe Only"
- A terminal window shows progress and results
Available Options:
- Translate to English
- Translate to Spanish
- Translate to French
- Transcribe Only (no translation)
The context menu automatically handles file paths with spaces and provides visual feedback during processing.
-
If you get ffmpeg/ffprobe not found errors, ensure
ffmpegis installed and on PATH in the conda env:conda activate video_translator_env conda install ffmpeg ffmpeg -version ffprobe -version
-
If the tool refuses a file, run
ffprobe path/to/fileto confirm ffmpeg recognizes it. -
Repetitive Text in Transcripts: The tool automatically detects and cleans up Whisper's repetitive loops (like "yeah, yeah, yeah..." repeated dozens of times). The cleaning process:
- Limits consecutive word repetitions to a maximum of 5 instances
- Detects segments where a single word makes up >70% of the content
- Replaces obviously looped segments with "[unclear audio]"
- Preserves natural speech patterns and reasonable repetitions
-
KDE Context Menu Issues: If the right-click menu doesn't appear or work:
# Rebuild KDE service cache kbuildsycoca5 # Check if the service file exists ls ~/.local/share/kio/servicemenus/video-translator.desktop
MIT