Transcribe audio/video of any length on Apple Silicon — no API keys, no cloud, no repeated-output bugs.
Installable as an Agent Skill — works with Claude Code, Cursor, GitHub Copilot, Gemini CLI, and more.
This is for you if:
- You have an Apple Silicon Mac (M1 or newer)
- You need to transcribe long audio/video locally — podcasts, lectures, meetings, interviews
- You don't want to upload files to the cloud, or you're offline
- You've hit the mlx_whisper hallucination bug on long files
mlx_whisper is fast and accurate on Apple Silicon, but long files trigger a hallucination bug: the model gets stuck and repeats the same sentence for minutes at a time. There is no single flag that fully fixes this.
Split → Transcribe → Merge.
This script cuts the file into 20-minute segments, transcribes each one with hallucination-resistant parameters, and merges everything back into a single output file with continuous timestamps. Segments overlap by 10 seconds so no words are lost at boundaries. Intermediate files are kept on disk so interrupted runs can be resumed.
brew install ffmpeg
pip install mlx-whisperWorks with Claude Code, Cursor, GitHub Copilot, Gemini CLI, and any tool that supports the Agent Skills open standard.
Install:
git clone https://github.com/1c7/mlx-whisper-long
cd mlx-whisper-long
cp -r skills/mlx-whisper-long ~/.agents/skills/
ln -s ~/.agents/skills/mlx-whisper-long ~/.claude/skills/mlx-whisper-longThen just tell Claude:
Transcribe this file: lecture.mp4
Transcribe lecture.mp4, it's a medical lecture about cancer immunotherapy.
Transcribe lecture.mp4 and output all formats to ~/Desktop.
git clone https://github.com/1c7/mlx-whisper-long
cd mlx-whisper-long
# Basic
python scripts/transcribe_long.py lecture.mp4
# With domain prompt — improves accuracy for technical content
python scripts/transcribe_long.py lecture.mp4 --prompt "Medical lecture about cancer immunotherapy and CRISPR."
# Output to specific directory, all formats
python scripts/transcribe_long.py lecture.mp4 --output-dir ~/Desktop --output-format all
# Force language
python scripts/transcribe_long.py lecture.mp4 --language zh
# Pass any mlx_whisper argument directly
python scripts/transcribe_long.py lecture.mp4 -- --temperature 0 --beam-size 1| Option | Default | Description |
|---|---|---|
--model |
mlx-community/whisper-large-v3-turbo |
Model to use. Swap for mlx-community/whisper-large-v3 for maximum accuracy |
--output-dir |
Same directory as input | Where to write output files |
--output-format |
srt |
srt, txt, vtt, tsv, json, or all |
--prompt |
— | Domain hint passed as --initial-prompt. Improves proper noun accuracy |
--language |
auto-detect | Language code, e.g. zh, en, ja |
--segment-seconds |
1200 |
Segment length in seconds |
--overlap-seconds |
10 |
Overlap at segment boundaries |
The script sets these mlx_whisper flags by default:
--condition-on-previous-text False— prevents a bad segment from poisoning subsequent ones--no-speech-threshold 0.6— skips silent segments instead of hallucinating text--compression-ratio-threshold 2.4— detects and discards repeated-output segments--word-timestamps True— enables word-level timing in output
Segment files are saved to <filename>_segments/ next to the output. If the script is interrupted, re-running the same command will skip already-completed segments and pick up where it left off. Delete the _segments/ directory after confirming the output.
| Model | Speed | Accuracy | Recommended for |
|---|---|---|---|
whisper-large-v3-turbo |
~3-4× faster | Slightly lower | Most use cases |
whisper-large-v3 |
Baseline | Highest | Technical content, heavy accents |
Adding a good --prompt narrows the accuracy gap between turbo and v3 significantly.
| Tool | When to use it instead |
|---|---|
mlx_whisper directly |
Your file is short (< 10 min) — just use it, no need for this wrapper |
| whisper.cpp | You don't use Claude Code and prefer a C++ CLI with more tuning options |
| MacWhisper | You want a GUI, don't mind paying |
| OpenAI Whisper API | You don't have a Mac, or don't care about local processing |
Why this project: mlx_whisper is the fastest option on Apple Silicon, but breaks on long files. This fixes that, and wraps it as a Claude Code skill so you don't have to remember the command.
MIT