whisper-dict

whisper-dict is a Zig-based system-wide dictation daemon.

The goal is to let you hold a configured key, speak, and on release:

record audio from your microphone,
run whisper-cli to transcribe speech to text,
inject the transcription into the currently focused app as if typed on a keyboard.

Usage

Run with default model:

whisper-dict

Run with a custom model:

whisper-dict --model models/ggml-base.en.bin

Run with a custom output directory:

whisper-dict --recordings-dir recordings

Run with automatic language detection:

whisper-dict --language auto

Run with a fixed language (example: Czech):

whisper-dict --language cs

Run with a custom push-to-talk key (example: F8):

whisper-dict --trigger-key f8

Skip low-confidence transcriptions (example threshold: 0.40):

whisper-dict --min-confidence 0.40

Skip very short accidental taps (example: at least 250ms):

whisper-dict --min-recording-ms 250

Note: models ending in .en are English-only. For Czech or other languages, use a multilingual model (for example large-v3-turbo, medium, small, base).

Trigger key accepts key names (for example rightctrl, leftalt, f8, capslock) or a numeric evdev key code. Available named key aliases are defined in trigger_key_descriptors in src/config.zig.

To find a numeric evdev key code on Linux, run sudo evtest, select your keyboard, press the desired key, and use the reported code value from EV_KEY (for example code 97 (KEY_RIGHTCTRL) means --trigger-key 97). Reference key definitions are in Linux input-event-codes.h.

Default model path is ~/.cache/whisper-dict/models/ggml-large-v3-turbo.bin. Default recordings directory is /tmp/whisper-dict-recordings. Default language is auto. Default trigger key is rightctrl. Default minimum confidence is 0.00 (disabled). Default minimum recording length is 0ms (disabled).

Download models with whisper-cpp-download-ggml-model. The full list of available model names is documented in whisper.cpp: https://github.com/ggml-org/whisper.cpp/blob/master/models/README.md#available-models

Example (download large-v3-turbo into the default model directory):

whisper-cpp-download-ggml-model large-v3-turbo ~/.cache/whisper-dict/models

Required external commands (non-NixOS)

On NixOS these commands are wired in automatically. On other systems, install them manually and make sure they are available in PATH.

whisper-cli (required): from whisper.cpp
whisper-cpp-download-ggml-model (for model download command in this README): from whisper.cpp
Recording backend (install at least one):
- arecord from alsa-utils
- ffmpeg from FFmpeg
Text injection backend (install at least one):
- wtype for Wayland from wtype
- xdotool for X11 from xdotool
eww (optional, only for the recording overlay): from elkowar/eww

Linux permissions

For whisper-dict to run as a regular user (without root), the user running the daemon needs:

Read access to /dev/input/event* (required for global trigger key capture).
Write access to the recordings directory (default: /tmp/whisper-dict-recordings, or the path set by --recordings-dir).
Microphone access through your audio stack (used by arecord or ffmpeg).

Most distros gate /dev/input/event* behind the input group. A common setup is:

sudo usermod -aG input $USER

Then log out and log back in so new group membership takes effect.

Home Manager module

The flake exports a Home Manager module at homeManagerModules.whisper-dict (also aliased as homeManagerModules.default).

Example:

{
  imports = [ inputs.whisper-dict.homeManagerModules.whisper-dict ];

  services.whisper-dict = {
    enable = true;
    model = "large-v3-turbo";
    language = "auto";
    triggerKey = "f8";
    minConfidence = 0.4;
    minRecordingMs = 250;
    modelsDir = "${config.xdg.cacheHome}/whisper-dict/models";
    recordingsDir = "/tmp/whisper-dict-recordings";
  };
}

When enabled, it creates systemd --user service whisper-dict that:

runs whisper-dict in the background,
downloads the configured model with whisper-cpp-download-ggml-model before start (if missing),
stores models in services.whisper-dict.modelsDir,
sets transcription language with services.whisper-dict.language (default "auto"),
sets push-to-talk key with services.whisper-dict.triggerKey (default "rightctrl"),
filters low-confidence transcriptions with services.whisper-dict.minConfidence (default 0.0, disabled),
skips short taps with services.whisper-dict.minRecordingMs (default 0, disabled),
stores audio/transcription outputs in services.whisper-dict.recordingsDir (default: /tmp/whisper-dict-recordings).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
docs		docs
eww		eww
nix		nix
src		src
.envrc		.envrc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon
default.nix		default.nix
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper-dict

Usage

Required external commands (non-NixOS)

Linux permissions

Home Manager module

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whisper-dict

Usage

Required external commands (non-NixOS)

Linux permissions

Home Manager module

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages