Skip to content

1c7/mlx-whisper-long

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

mlx-whisper-long

中文

Transcribe audio/video of any length on Apple Silicon — no API keys, no cloud, no repeated-output bugs.

Installable as an Agent Skill — works with Claude Code, Cursor, GitHub Copilot, Gemini CLI, and more.

This is for you if:

  • You have an Apple Silicon Mac (M1 or newer)
  • You need to transcribe long audio/video locally — podcasts, lectures, meetings, interviews
  • You don't want to upload files to the cloud, or you're offline
  • You've hit the mlx_whisper hallucination bug on long files

The problem

mlx_whisper is fast and accurate on Apple Silicon, but long files trigger a hallucination bug: the model gets stuck and repeats the same sentence for minutes at a time. There is no single flag that fully fixes this.

The solution

Split → Transcribe → Merge.

This script cuts the file into 20-minute segments, transcribes each one with hallucination-resistant parameters, and merges everything back into a single output file with continuous timestamps. Segments overlap by 10 seconds so no words are lost at boundaries. Intermediate files are kept on disk so interrupted runs can be resumed.

Requirements

brew install ffmpeg
pip install mlx-whisper

Usage — Agent Skill (recommended)

Works with Claude Code, Cursor, GitHub Copilot, Gemini CLI, and any tool that supports the Agent Skills open standard.

Install:

git clone https://github.com/1c7/mlx-whisper-long
cd mlx-whisper-long
cp -r skills/mlx-whisper-long ~/.agents/skills/
ln -s ~/.agents/skills/mlx-whisper-long ~/.claude/skills/mlx-whisper-long

Then just tell Claude:

Transcribe this file: lecture.mp4

Transcribe lecture.mp4, it's a medical lecture about cancer immunotherapy.

Transcribe lecture.mp4 and output all formats to ~/Desktop.

Usage — command line

git clone https://github.com/1c7/mlx-whisper-long
cd mlx-whisper-long

# Basic
python scripts/transcribe_long.py lecture.mp4

# With domain prompt — improves accuracy for technical content
python scripts/transcribe_long.py lecture.mp4 --prompt "Medical lecture about cancer immunotherapy and CRISPR."

# Output to specific directory, all formats
python scripts/transcribe_long.py lecture.mp4 --output-dir ~/Desktop --output-format all

# Force language
python scripts/transcribe_long.py lecture.mp4 --language zh

# Pass any mlx_whisper argument directly
python scripts/transcribe_long.py lecture.mp4 -- --temperature 0 --beam-size 1

Options

Option Default Description
--model mlx-community/whisper-large-v3-turbo Model to use. Swap for mlx-community/whisper-large-v3 for maximum accuracy
--output-dir Same directory as input Where to write output files
--output-format srt srt, txt, vtt, tsv, json, or all
--prompt Domain hint passed as --initial-prompt. Improves proper noun accuracy
--language auto-detect Language code, e.g. zh, en, ja
--segment-seconds 1200 Segment length in seconds
--overlap-seconds 10 Overlap at segment boundaries

Why these parameters

The script sets these mlx_whisper flags by default:

  • --condition-on-previous-text False — prevents a bad segment from poisoning subsequent ones
  • --no-speech-threshold 0.6 — skips silent segments instead of hallucinating text
  • --compression-ratio-threshold 2.4 — detects and discards repeated-output segments
  • --word-timestamps True — enables word-level timing in output

Resume after interruption

Segment files are saved to <filename>_segments/ next to the output. If the script is interrupted, re-running the same command will skip already-completed segments and pick up where it left off. Delete the _segments/ directory after confirming the output.

Model comparison

Model Speed Accuracy Recommended for
whisper-large-v3-turbo ~3-4× faster Slightly lower Most use cases
whisper-large-v3 Baseline Highest Technical content, heavy accents

Adding a good --prompt narrows the accuracy gap between turbo and v3 significantly.

Alternatives

Tool When to use it instead
mlx_whisper directly Your file is short (< 10 min) — just use it, no need for this wrapper
whisper.cpp You don't use Claude Code and prefer a C++ CLI with more tuning options
MacWhisper You want a GUI, don't mind paying
OpenAI Whisper API You don't have a Mac, or don't care about local processing

Why this project: mlx_whisper is the fastest option on Apple Silicon, but breaks on long files. This fixes that, and wraps it as a Claude Code skill so you don't have to remember the command.

License

MIT

About

Speech-to-text for long files on Apple Silicon — runs locally, no cloud, no repeated-output glitches. Installable as an Agent Skill for Claude Code, Cursor, and more.

Resources

Stars

Watchers

Forks

Contributors

Languages