Whisper Dictation

🎤 Acqua Voice-like local speech-to-text dictation for NixOS

Fast, accurate, privacy-first voice input powered by whisper.cpp. Press and hold a hotkey, speak, release to paste transcribed text anywhere.

Features

🔒 100% Local & Private - No cloud, no telemetry, works offline
⚡ Real-time Feedback - Live transcription with floating UI
🎯 Push-to-Talk - Hold Super+Period, speak, release to paste
🧠 Technical Accuracy - Optimized for developer/AI workflows
🌍 Multilingual - Support for 99 languages via Whisper
🎨 Native GNOME - GTK4 UI, Wayland-compatible

Installation

On NixOS (Recommended)

Add to your flake.nix:

{
  inputs.whisper-dictation.url = "github:jacopone/whisper-dictation";

  # In your configuration
  environment.systemPackages = [
    inputs.whisper-dictation.packages.${system}.default
  ];

  # Enable auto-start
  systemd.user.services.whisper-dictation = {
    enable = true;
    wantedBy = [ "graphical-session.target" ];
  };
}

Manual Installation

# Clone repository
git clone https://github.com/jacopone/whisper-dictation.git
cd whisper-dictation

# Enter development environment
nix develop

# Run directly
python -m whisper_dictation.daemon

Quick Start

First-Time Setup

Ensure you're in the input group (required for keyboard monitoring):

sudo usermod -aG input $USER
# ⚠️ Logout and login required (not just reboot!)

# Verify group membership
groups | grep input

Download Whisper model (first time only):

# Create models directory
mkdir -p ~/.local/share/whisper-models
cd ~/.local/share/whisper-models

# For fast dictation (recommended - 4-6s processing):
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin

# Or for better accuracy (20-30s processing):
curl -L -o ggml-medium.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin

Start ydotoold daemon (required for text pasting):

ydotoold --socket-path=/run/user/1000/.ydotool_socket --socket-perm=0600 &

# Verify it's running
pgrep -a ydotoold

Daily Usage

Enter development environment:

cd ~/whisper-dictation
nix develop  # Or use direnv if configured

Start the dictation daemon (choose one option):

Option A: Using just recipes (recommended):

just run             # Use config file settings
just run-auto        # Auto-detect language
just run-en          # English only
just run-fast        # Use base model (faster)
just run-verbose     # With debug output

Option B: Command-line flags (temporary override):

python -m whisper_dictation.daemon --verbose --language auto  # Auto-detect
python -m whisper_dictation.daemon --verbose --language en
python -m whisper_dictation.daemon --verbose --model base  # Override model

Option C: Edit config file (persistent setting):

vim ~/.config/whisper-dictation/config.yaml
# Change: language: auto  (or en, it, es, fr, etc.)
# Change: model: base     (or tiny, small, medium, large)
just run

Using Dictation

Click in any text field (browser, editor, terminal, etc.)
Press and hold Super+Period (⊞ + .)
Speak clearly in your chosen language
Release key → text appears instantly!

Tips:

Speak naturally, no need to pause between words
Works in any application (Wayland-compatible)
Auto-detect mode handles mixed Italian/English seamlessly
Use just run-verbose to troubleshoot hotkey detection

Configuration

Edit ~/.config/whisper-dictation/config.yaml:

# Hotkey configuration
hotkey:
  key: period           # Any key from evdev.ecodes (e.g., period, comma, space)
  modifiers:
    - super             # Can use: super, ctrl, alt, shift

# Whisper settings
whisper:
  model: base           # Options: tiny (1-2s), base (4-6s), small (10-15s), medium (20-30s), large (40-60s)
  language: auto        # Options: auto, en, it, es, fr, de, etc. (99+ languages supported)
  threads: 4            # CPU threads for transcription (adjust based on your system)

# UI settings
ui:
  show_waveform: false  # Show visual waveform during recording
  theme: dark           # Options: dark, light, auto

# Post-processing
processing:
  remove_filler_words: true   # Remove "um", "uh", etc.
  auto_capitalize: true       # Capitalize first letter of sentences
  auto_punctuate: false       # Auto-add punctuation (experimental)

Quick config changes:

# Change language to auto-detect
sed -i 's/language: .*/language: auto/' ~/.config/whisper-dictation/config.yaml

# Change model to base (faster)
sed -i 's/model: .*/model: base/' ~/.config/whisper-dictation/config.yaml

How It Works

Keyboard Monitoring - evdev captures low-level key events
Audio Recording - ffmpeg records mic input while key is held
Transcription - whisper.cpp processes audio locally
Text Insertion - ydotool pastes text into active window
UI Feedback - GTK4 window shows real-time status

Model Selection Guide

Model	Size	Speed	Accuracy	Use Case
tiny	39 MB	~1-2s ⚡	60%	Quick notes, testing
base	142 MB	~4-6s ⚡⚡	70%	Recommended for speed
small	466 MB	~10-15s	80%	Balanced performance
medium	1.5 GB	~20-30s	85%	High accuracy for LLMs
large	2.9 GB	~40-60s	90%	Maximum accuracy

Performance notes:

Times measured on CPU (4 threads)
GPU acceleration can reduce times by 5-10x
base model recommended for Aqua Voice-like speed
Switch models by editing model: in config.yaml

Requirements

OS: NixOS or any Linux with Nix
Desktop: GNOME (Wayland) or other Wayland compositor
Permissions: User must be in input group for keyboard monitoring
Audio: PulseAudio or PipeWire

Troubleshooting

No audio recording

# Check microphone
ffmpeg -f pulse -i default -t 1 test.wav

# Check PulseAudio/PipeWire
pactl list sources short

Keyboard events not detected

# Add user to input group
sudo usermod -aG input $USER
# Logout/login required (not just reboot)

# Verify input group membership
groups | grep input

ydotool not working

# Start ydotool daemon manually
ydotoold --socket-path=/run/user/1000/.ydotool_socket --socket-perm=0600 &

# Verify socket exists
ls -la /run/user/1000/.ydotool_socket

# Check if ydotoold is running
pgrep -a ydotoold

Slow transcription

# Option 1: Switch to faster model in config
vim ~/.config/whisper-dictation/config.yaml
# Change: model: base  (4-6s) or tiny (1-2s)

# Option 2: Override with command-line flag
python -m whisper_dictation.daemon --verbose --model base

# Option 3: Download faster model
cd ~/.local/share/whisper-models
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin

Virtual keyboard detected instead of real keyboard

The daemon automatically filters out virtual devices (ydotoold, xdotool). If issues persist, check logs with run-daemon-debug to see which devices are detected.

Language not detected correctly

# Option 1: Use auto-detection mode
run-daemon-auto

# Option 2: Specify language explicitly
run-daemon-en        # English
run-daemon-it        # Italian

# Option 3: Check config file
cat ~/.config/whisper-dictation/config.yaml
# Should show: language: auto (or en, it, etc.)

Hotkey not working

# Test with debug mode
run-daemon-debug

# Check for keybinding conflicts in GNOME
gnome-control-center keyboard

# Verify you see "HOTKEY COMBO DETECTED" when pressing Super+Period
# If you see "has_mods=False", verify Super key is being detected

Nix environment issues

# Rebuild environment
nix develop

# If packages missing, update
nix flake update

# Clear nix cache and rebuild
nix develop --refresh

Development

See DEVELOPMENT.md for comprehensive development guide.

Quick Commands

# Enter dev shell
nix develop

# Show all available commands
just

# Run the daemon
just run

# Run tests
just test

# Format code
just format

# Check all quality gates (lint + test)
just check

# Test dictation without keyboard (records 5 seconds)
just test-dictation

# Check setup status
just status

# Build nix package
just build

Available Recipes

Run just to see all available commands:

just run - Run daemon with default settings
just run-verbose - Run with verbose output
just run-auto - Run with auto language detection
just run-en - Run with English
just run-fast - Run with base model (faster)
just test - Run test suite
just test-cov - Run tests with coverage
just format - Auto-format code (Black + Ruff)
just lint - Check code style
just check - Run all quality checks
just test-dictation - Test recording/transcription
just download-model-base - Download base model (~142MB)
just download-model-medium - Download medium model (~1.5GB)
just start-ydotool - Start ydotool daemon
just status - Show setup status

Comparison to Other Tools

Feature	Whisper Dictation	Aqua Voice	Talon Voice
Privacy	✅ 100% Local	❌ Cloud	✅ Local
Cost	✅ Free	💲 $8/mo	💲 $15/mo
NixOS Support	✅ Native	❌ No	⚠️ Manual
Technical Terms	⚠️ 65-85%	✅ 97%	✅ 95%
Wayland	✅ Yes	⚠️ Limited	❌ X11 only
Real-time	✅ Yes	✅ Yes	✅ Yes

Roadmap

Streaming transcription (live preview while speaking)
Custom vocabulary training
Command mode (voice commands for actions)
Integration with LLM APIs (Claude, GPT-4)
Multi-backend support (Avalon API, Deepgram)
Voice profiles for different contexts

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE

Acknowledgments

whisper.cpp - Fast whisper implementation
Aqua Voice - UI/UX inspiration
ydotool - Wayland input automation

Made with ❤️ for the NixOS community

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.cursor/rules		.cursor/rules
scripts		scripts
src/whisper_dictation		src/whisper_dictation
systemd		systemd
tests		tests
.cursorignore		.cursorignore
.envrc		.envrc
.gitignore		.gitignore
.gitleaksignore		.gitleaksignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
FEATURES.md		FEATURES.md
LICENSE		LICENSE
README.md		README.md
devenv.lock		devenv.lock
devenv.nix		devenv.nix
devenv.yaml		devenv.yaml
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Whisper Dictation

Features

Installation

On NixOS (Recommended)

Manual Installation

Quick Start

First-Time Setup

Daily Usage

Using Dictation

Configuration

How It Works

Model Selection Guide

Requirements

Troubleshooting

No audio recording

Keyboard events not detected

ydotool not working

Slow transcription

Virtual keyboard detected instead of real keyboard

Language not detected correctly

Hotkey not working

Nix environment issues

Development

Quick Commands

Available Recipes

Comparison to Other Tools

Roadmap

Contributing

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages