GitHub - Hantok/local-whisper-backend: Self-host an OpenAI-compatible /v1/audio/transcriptions API with Faster-Whisper. Deploy in minutes on any machine, serve Whisper large-v3-turbo with automatic compute fallbacks, and integrate instantly with n8n, FastAPI, or your existing pipelines. Includes systemd service, health checks, and a lightweight test suite.

Local Whisper Backend (large-v3-turbo)

Run whisper_server.py to self-host an OpenAI-compatible /v1/audio/transcriptions endpoint powered by faster-whisper. It defaults to the large-v3-turbo model, deploys on any machine in a few commands, and can be dropped into n8n or other workflows without code changes.

Install dependencies

Select the installation method for your architecture:

🍎 macOS (ARM64 / Apple Silicon)

Option A: Conda (Recommended) Handles architecture-specific libraries (FFmpeg) automatically.

conda create -y -n whisper-env python=3.12 ffmpeg
conda activate whisper-env
pip install -r requirements.txt

Option B: Standard .venv Note: If you get "architecture mismatch" errors, ensure your FFmpeg is installed via ARM64 Homebrew (/opt/homebrew).

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

🖥️ Linux/Windows/Intel Mac

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Start the Whisper server

Choose the optimal settings based on your hardware:

🍎 Apple Silicon (M1/M2/M3)

export LOCAL_WHISPER_MODEL=large-v3-turbo
export LOCAL_WHISPER_DEVICE=cpu
export LOCAL_WHISPER_COMPUTE_TYPE=int8
export LOCAL_WHISPER_BEAM_SIZE=1
uvicorn whisper_server:app --host 0.0.0.0 --port 9000

🟢 NVIDIA GPU (x86_64)

export LOCAL_WHISPER_MODEL=large-v3-turbo
export LOCAL_WHISPER_DEVICE=cuda
export LOCAL_WHISPER_COMPUTE_TYPE=float16
uvicorn whisper_server:app --host 0.0.0.0 --port 9000

🖥️ Generic CPU (x86_64)

export LOCAL_WHISPER_MODEL=large-v3-turbo
export LOCAL_WHISPER_DEVICE=cpu
export LOCAL_WHISPER_COMPUTE_TYPE=int8_float32
uvicorn whisper_server:app --host 0.0.0.0 --port 9000

If a compute type is unsupported (for example float16 on CPU), the server automatically retries with int8 or auto based modes. The service also aliases large-v3-turbo to large-v3 because faster-whisper uses the base model name for the turbo weights.

The server exposes:

POST /v1/audio/transcriptions – accepts multipart/form-data (file, optional model) or allows ?model= as a query parameter and returns OpenAI-style JSON with text and segments.
GET /healthz – readiness probe.

Example request (curl)

curl -sS -X POST http://127.0.0.1:9000/v1/audio/transcriptions \
  -H "Accept: application/json" \
  -F "file=@file_0.oga;filename=file_0.oga;type=audio/ogg" \
  -F "model=small"

Wire it up to the main API

In the environment that runs main.py, point the integration to the local server:

export WHISPER_BASE_URL=http://127.0.0.1:9000/v1
export WHISPER_MODEL=large-v3-turbo
export TMP_DIR=/tmp/yt-audio
uvicorn main:app --host 0.0.0.0 --port 5005

Your YouTube transcription endpoint will now fall back to the self-hosted Whisper large-v3-turbo server whenever subtitles are unavailable.

Run as a systemd service

Update User= and the absolute paths in whisper-local-api.service if your repo lives somewhere other than /mnt/ssd/whisper-local-api.
Optionally create /etc/default/whisper-local-api to override defaults (e.g. LOCAL_WHISPER_MODEL=large-v3-turbo, LOCAL_WHISPER_DEVICE=auto, LOCAL_WHISPER_COMPUTE_TYPE=float16).

Install and enable the service:

sudo cp whisper-local-api.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now whisper-local-api.service

Check status and logs:

systemctl status whisper-local-api.service
journalctl -u whisper-local-api.service -f

Use with n8n

The server is OpenAI-compatible, so you can drop it into n8n’s “OpenAI Audio Transcription” (or HTTP Request) node by pointing the base URL to your instance:

Base URL: http://<your-host>:9000/v1
Endpoint: /audio/transcriptions
Model: large-v3-turbo (or whichever you’ve preloaded)
Auth: none required (unless you front it with your own proxy)

Example n8n configuration:

Notes and troubleshooting

Models must already be present on disk; if a requested model is missing the API returns 503 with guidance to download it first.
Successful requests log a short transcript preview to help spot test results in the logs.
The server will fall back to a supported compute type if the preferred one (e.g., float16 on CPU) fails.

Run the test suite

Tests live in tests/test_whisper_server.py and use the bundled test-speech.mp3 fixture; they mock the transcription call so they run offline and quickly.

source .venv/bin/activate
pytest

OpenClaw Skill (ClawHub)

If you use OpenClaw and want one-command agent setup, use the packaged skill from this repo:

Skill source: openclaw-skill/whisper-local-api/
Privacy focused: Provides 100% offline, highly-accurate faster-whisper transcription for agent voice commands without cloud telemetry

Package command to deploy to ClawHub:

python ~/.npm-global/lib/node_modules/openclaw/skills/skill-creator/scripts/package_skill.py openclaw-skill/whisper-local-api

This generates a .skill bundle you can securely upload.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
openclaw-skill/whisper-local-api		openclaw-skill/whisper-local-api
tests		tests
.gitignore		.gitignore
README.md		README.md
n8n-configuration.jpeg		n8n-configuration.jpeg
requirements.txt		requirements.txt
run_server.sh		run_server.sh
test-speech.mp3		test-speech.mp3
whisper-local-api.service		whisper-local-api.service
whisper_server.py		whisper_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Whisper Backend (large-v3-turbo)

Install dependencies

🍎 macOS (ARM64 / Apple Silicon)

🖥️ Linux/Windows/Intel Mac

Start the Whisper server

🍎 Apple Silicon (M1/M2/M3)

🟢 NVIDIA GPU (x86_64)

🖥️ Generic CPU (x86_64)

Example request (curl)

Wire it up to the main API

Run as a systemd service

Use with n8n

Notes and troubleshooting

Run the test suite

OpenClaw Skill (ClawHub)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local Whisper Backend (large-v3-turbo)

Install dependencies

🍎 macOS (ARM64 / Apple Silicon)

🖥️ Linux/Windows/Intel Mac

Start the Whisper server

🍎 Apple Silicon (M1/M2/M3)

🟢 NVIDIA GPU (x86_64)

🖥️ Generic CPU (x86_64)

Example request (curl)

Wire it up to the main API

Run as a systemd service

Use with n8n

Notes and troubleshooting

Run the test suite

OpenClaw Skill (ClawHub)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages