mlx-whisper-long

Transcribe audio/video of any length on Apple Silicon — no API keys, no cloud, no repeated-output bugs.

Installable as an Agent Skill — works with Claude Code, Cursor, GitHub Copilot, Gemini CLI, and more.

This is for you if:

You have an Apple Silicon Mac (M1 or newer)
You need to transcribe long audio/video locally — podcasts, lectures, meetings, interviews
You don't want to upload files to the cloud, or you're offline
You've hit the mlx_whisper hallucination bug on long files

The problem

mlx_whisper is fast and accurate on Apple Silicon, but long files trigger a hallucination bug: the model gets stuck and repeats the same sentence for minutes at a time. There is no single flag that fully fixes this.

The solution

Split → Transcribe → Merge.

This script cuts the file into 20-minute segments, transcribes each one with hallucination-resistant parameters, and merges everything back into a single output file with continuous timestamps. Segments overlap by 10 seconds so no words are lost at boundaries. Intermediate files are kept on disk so interrupted runs can be resumed.

Requirements

brew install ffmpeg
pip install mlx-whisper

Usage — Agent Skill (recommended)

Works with Claude Code, Cursor, GitHub Copilot, Gemini CLI, and any tool that supports the Agent Skills open standard.

Install:

git clone https://github.com/1c7/mlx-whisper-long
cd mlx-whisper-long
cp -r skills/mlx-whisper-long ~/.agents/skills/
ln -s ~/.agents/skills/mlx-whisper-long ~/.claude/skills/mlx-whisper-long

Then just tell Claude:

Transcribe this file: lecture.mp4

Transcribe lecture.mp4, it's a medical lecture about cancer immunotherapy.

Transcribe lecture.mp4 and output all formats to ~/Desktop.

Usage — command line

git clone https://github.com/1c7/mlx-whisper-long
cd mlx-whisper-long

# Basic
python scripts/transcribe_long.py lecture.mp4

# With domain prompt — improves accuracy for technical content
python scripts/transcribe_long.py lecture.mp4 --prompt "Medical lecture about cancer immunotherapy and CRISPR."

# Output to specific directory, all formats
python scripts/transcribe_long.py lecture.mp4 --output-dir ~/Desktop --output-format all

# Force language
python scripts/transcribe_long.py lecture.mp4 --language zh

# Pass any mlx_whisper argument directly
python scripts/transcribe_long.py lecture.mp4 -- --temperature 0 --beam-size 1

Options

Option	Default	Description
`--model`	`mlx-community/whisper-large-v3-turbo`	Model to use. Swap for `mlx-community/whisper-large-v3` for maximum accuracy
`--output-dir`	Same directory as input	Where to write output files
`--output-format`	`srt`	`srt`, `txt`, `vtt`, `tsv`, `json`, or `all`
`--prompt`	—	Domain hint passed as `--initial-prompt`. Improves proper noun accuracy
`--language`	auto-detect	Language code, e.g. `zh`, `en`, `ja`
`--segment-seconds`	`1200`	Segment length in seconds
`--overlap-seconds`	`10`	Overlap at segment boundaries

Why these parameters

The script sets these mlx_whisper flags by default:

--condition-on-previous-text False — prevents a bad segment from poisoning subsequent ones
--no-speech-threshold 0.6 — skips silent segments instead of hallucinating text
--compression-ratio-threshold 2.4 — detects and discards repeated-output segments
--word-timestamps True — enables word-level timing in output

Resume after interruption

Segment files are saved to <filename>_segments/ next to the output. If the script is interrupted, re-running the same command will skip already-completed segments and pick up where it left off. Delete the _segments/ directory after confirming the output.

Model comparison

Model	Speed	Accuracy	Recommended for
`whisper-large-v3-turbo`	~3-4× faster	Slightly lower	Most use cases
`whisper-large-v3`	Baseline	Highest	Technical content, heavy accents

Adding a good --prompt narrows the accuracy gap between turbo and v3 significantly.

Alternatives

Tool	When to use it instead
`mlx_whisper` directly	Your file is short (< 10 min) — just use it, no need for this wrapper
whisper.cpp	You don't use Claude Code and prefer a C++ CLI with more tuning options
MacWhisper	You want a GUI, don't mind paying
OpenAI Whisper API	You don't have a Mac, or don't care about local processing

Why this project: mlx_whisper is the fastest option on Apple Silicon, but breaks on long files. This fixes that, and wraps it as a Claude Code skill so you don't have to remember the command.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
skills/mlx-whisper-long		skills/mlx-whisper-long
.gitignore		.gitignore
README.md		README.md
README.zh.md		README.zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlx-whisper-long

The problem

The solution

Requirements

Usage — Agent Skill (recommended)

Usage — command line

Options

Why these parameters

Resume after interruption

Model comparison

Alternatives

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mlx-whisper-long

The problem

The solution

Requirements

Usage — Agent Skill (recommended)

Usage — command line

Options

Why these parameters

Resume after interruption

Model comparison

Alternatives

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages