openwakeword-trainer

Train custom wake word models with openWakeWord. A granular 13-step pipeline with compatibility patches for torchaudio 2.10+, Piper TTS, and speechbrain. Generates tiny ONNX models (~200 KB) for real-time keyword detection — like building your own "Hey Siri" trigger.

What It Does

This toolkit automates the entire openWakeWord training process:

Synthesizes thousands of speech clips using Piper TTS with varied voices and accents
Augments clips with real-world noise, music, and room impulse responses
Trains a small DNN classifier optimized for always-on, low-latency detection
Exports a tiny ONNX model you can deploy anywhere

The result is a ~200 KB model that runs on CPU in real-time with negligible resource usage.

Prerequisites

Requirement	Details
WSL2 or Linux	Ubuntu recommended (`wsl --install -d Ubuntu` on Windows)
NVIDIA GPU	CUDA drivers installed (WSL2 includes CUDA passthrough automatically)
Disk space	~15 GB free (temporary downloads; deletable after training)
Python 3.10+	Inside WSL2/Linux (`python3 --version`)
Time	~1–2 hours with GPU, 12–24 hours CPU-only

Verify CUDA (WSL2)

wsl
nvidia-smi

You should see your GPU listed. If not, update your NVIDIA Windows driver to the latest version.

Quick Start

Option A: One-liner

# From PowerShell (Windows) — cd to the repo first:
cd path\to\openwakeword-trainer
wsl -- bash train.sh

# Or from within WSL2/Linux:
cd /mnt/c/path/to/openwakeword-trainer
bash train.sh

This creates an isolated virtualenv, installs dependencies, downloads datasets, trains the model, and exports the result.

Option B: Step-by-step

# Enter WSL2 and navigate to the repo
wsl
cd /mnt/c/path/to/openwakeword-trainer

# Create & activate a training venv (use native filesystem, not /mnt/c/)
python3 -m venv ~/.oww-trainer-venv
source ~/.oww-trainer-venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the full pipeline
python train_wakeword.py

# Or resume from where you left off
python train_wakeword.py --from augment

Train Your Own Wake Word

Copy the example config:

cp configs/hey_echo.yaml configs/my_word.yaml

Edit configs/my_word.yaml:

model_name: "my_word"
target_phrase:
  - "hey computer"
custom_negative_phrases:
  - "hey commuter"
  - "computer"
  - "hey"

Train:

python train_wakeword.py --config configs/my_word.yaml

Find your model in export/my_word.onnx (and export/my_word.onnx.data).

Pipeline Steps

The pipeline runs 13 granular steps, each with built-in verification. If any step fails, it stops immediately and tells you exactly how to resume.

#	Step	Description	Time
1	`check-env`	Verify Python, CUDA, critical imports	instant
2	`apply-patches`	Patch torchaudio/speechbrain/piper compat	instant
3	`download`	Download datasets, Piper TTS model, tools	~30 min
4	`verify-data`	Check all data files present & sizes	instant
5	`resolve-config`	Resolve config paths to absolute	instant
6	`generate`	Generate clips via Piper TTS	~10 min (GPU)
7	`resample-clips`	Spot-check clip sample rates	instant
8	`verify-clips`	Verify clip counts and directories	instant
9	`augment`	Augment clips & extract mel features	~30 min
10	`verify-features`	Check `.npy` feature files & shapes	instant
11	`train`	Train DNN model + ONNX export	~30 min (GPU)
12	`verify-model`	Load-test with ONNX Runtime	instant
13	`export`	Copy model to `export/` directory	instant

If any step fails:

Pipeline stopped.  Fix the issue above, then resume:
  python train_wakeword.py --from <failed-step>

CLI Reference

# Full pipeline (all 13 steps)
python train_wakeword.py

# Use a custom config
python train_wakeword.py --config configs/my_word.yaml

# Resume from a specific step
python train_wakeword.py --from augment

# Run exactly one step
python train_wakeword.py --step verify-clips

# Check current state without side effects
python train_wakeword.py --verify-only

# Show all available steps
python train_wakeword.py --list-steps

Using Your Model

The export step produces two files that must be kept together:

hey_echo.onnx — the model graph (~14 KB)
hey_echo.onnx.data — external weights (~200 KB)

Copy both files to your project. The trained model works with any openWakeWord-compatible runtime:

from openwakeword.model import Model

oww = Model(wakeword_models=["export/hey_echo.onnx"])

# Feed 16 kHz audio frames
prediction = oww.predict(audio_frame)

Or with ONNX Runtime directly:

import onnxruntime as ort
import numpy as np

sess = ort.InferenceSession("export/hey_echo.onnx")
# Input shape: [1, 16, 96] (mel spectrogram features)
result = sess.run(None, {"x": features})

Configuration Reference

See configs/hey_echo.yaml for a fully commented example. Key settings:

Setting	Default	Description
`model_name`	—	Name for the model (used for filenames)
`target_phrase`	—	List of phrases to detect
`custom_negative_phrases`	`[]`	Phrases to explicitly reject
`n_samples`	`50000`	Number of positive training clips
`tts_batch_size`	`25`	Piper TTS batch size (reduce for low VRAM)
`model_type`	`"dnn"`	`"dnn"` or `"rnn"`
`layer_size`	`32`	Hidden layer size (32=fast, 64/128=higher capacity)
`steps`	`50000`	Training steps
`target_false_positives_per_hour`	`0.2`	Target false positive rate

Threshold Tuning

After training, tune the detection threshold for your use case:

Problem	Fix
False activations (triggers when you didn't say it)	Increase threshold: 0.5 → 0.6 → 0.7
Missed activations (need to over-pronounce)	Decrease threshold: 0.5 → 0.4 → 0.3
False triggers on similar words	Add to `custom_negative_phrases` and retrain

Compatibility Patches

This toolkit includes automatic patches for known breaking changes in modern dependency versions:

Issue	Affected	Patch
`torchaudio.load()` removed	torchaudio ≥2.10	Soundfile-based replacement with automatic 22050→16000 Hz resampling
`torchaudio.info()` removed	torchaudio ≥2.10	Soundfile-based metadata reader
`torchaudio.list_audio_backends()` removed	torchaudio ≥2.10	Returns `["soundfile"]` for speechbrain compat
`pkg_resources` removed	setuptools ≥82	Auto-installs setuptools<82
Piper API change	piper-sample-generator v2+	Auto-resolves `model=` kwarg

Patches are applied and verified automatically during the apply-patches step.

Cleanup

After training, reclaim disk space:

rm -rf data/          # ~12 GB of downloaded datasets
rm -rf output/        # intermediate training artifacts

Keep only the export/ directory with your trained model.

Troubleshooting

`piper-phonemize` fails to install

This package only has Linux wheels. Make sure you're running inside WSL2, not native Windows.

`nvidia-smi` not found in WSL2

Update your NVIDIA Windows driver to the latest version. WSL2 CUDA passthrough is included automatically.

Training is very slow

Verify CUDA is available: python -c "import torch; print(torch.cuda.is_available())". If False, everything falls back to CPU.

Out of GPU memory

Reduce tts_batch_size in your config (e.g., 25 → 10).

Download stalls

Re-run the script — all downloads are idempotent and resume where they left off.

License

MIT — see LICENSE.

Acknowledgments

openWakeWord by David Scripka
Piper by Rhasspy for synthetic TTS
Built with PyTorch, ONNX Runtime, and speechbrain

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compat.py		compat.py
oww_wrapper.py		oww_wrapper.py
requirements.txt		requirements.txt
train.sh		train.sh
train_wakeword.py		train_wakeword.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openwakeword-trainer

What It Does

Prerequisites

Verify CUDA (WSL2)

Quick Start

Option A: One-liner

Option B: Step-by-step

Train Your Own Wake Word

Pipeline Steps

CLI Reference

Using Your Model

Configuration Reference

Threshold Tuning

Compatibility Patches

Cleanup

Troubleshooting

`piper-phonemize` fails to install

`nvidia-smi` not found in WSL2

Training is very slow

Out of GPU memory

Download stalls

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

openwakeword-trainer

What It Does

Prerequisites

Verify CUDA (WSL2)

Quick Start

Option A: One-liner

Option B: Step-by-step

Train Your Own Wake Word

Pipeline Steps

CLI Reference

Using Your Model

Configuration Reference

Threshold Tuning

Compatibility Patches

Cleanup

Troubleshooting

piper-phonemize fails to install

nvidia-smi not found in WSL2

Training is very slow

Out of GPU memory

Download stalls

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`piper-phonemize` fails to install

`nvidia-smi` not found in WSL2

Packages