Skip to content

getmissionctrl/whisper-dictation

Β 
Β 

Repository files navigation

Whisper Dictation

🎀 Acqua Voice-like local speech-to-text dictation for NixOS

Fast, accurate, privacy-first voice input powered by whisper.cpp. Press and hold a hotkey, speak, release to paste transcribed text anywhere.

Features

  • πŸ”’ 100% Local & Private - No cloud, no telemetry, works offline
  • ⚑ Real-time Feedback - Live transcription with floating UI
  • 🎯 Push-to-Talk - Hold Super+Period, speak, release to paste
  • 🧠 Technical Accuracy - Optimized for developer/AI workflows
  • 🌍 Multilingual - Support for 99 languages via Whisper
  • 🎨 Native GNOME - GTK4 UI, Wayland-compatible

Installation

On NixOS (Recommended)

Add to your flake.nix:

{
  inputs.whisper-dictation.url = "github:jacopone/whisper-dictation";

  # In your configuration
  environment.systemPackages = [
    inputs.whisper-dictation.packages.${system}.default
  ];

  # Enable auto-start
  systemd.user.services.whisper-dictation = {
    enable = true;
    wantedBy = [ "graphical-session.target" ];
  };
}

Manual Installation

# Clone repository
git clone https://github.com/jacopone/whisper-dictation.git
cd whisper-dictation

# Enter development environment
nix develop

# Run directly
python -m whisper_dictation.daemon

Quick Start

First-Time Setup

  1. Ensure you're in the input group (required for keyboard monitoring):

    sudo usermod -aG input $USER
    # ⚠️ Logout and login required (not just reboot!)
    
    # Verify group membership
    groups | grep input
  2. Download Whisper model (first time only):

    # Create models directory
    mkdir -p ~/.local/share/whisper-models
    cd ~/.local/share/whisper-models
    
    # For fast dictation (recommended - 4-6s processing):
    curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin
    
    # Or for better accuracy (20-30s processing):
    curl -L -o ggml-medium.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin
  3. Start ydotoold daemon (required for text pasting):

    ydotoold --socket-path=/run/user/1000/.ydotool_socket --socket-perm=0600 &
    
    # Verify it's running
    pgrep -a ydotoold

Daily Usage

Enter development environment:

cd ~/whisper-dictation
nix develop  # Or use direnv if configured

Start the dictation daemon (choose one option):

Option A: Using just recipes (recommended):

just run             # Use config file settings
just run-auto        # Auto-detect language
just run-en          # English only
just run-fast        # Use base model (faster)
just run-verbose     # With debug output

Option B: Command-line flags (temporary override):

python -m whisper_dictation.daemon --verbose --language auto  # Auto-detect
python -m whisper_dictation.daemon --verbose --language en
python -m whisper_dictation.daemon --verbose --model base  # Override model

Option C: Edit config file (persistent setting):

vim ~/.config/whisper-dictation/config.yaml
# Change: language: auto  (or en, it, es, fr, etc.)
# Change: model: base     (or tiny, small, medium, large)
just run

Using Dictation

  1. Click in any text field (browser, editor, terminal, etc.)
  2. Press and hold Super+Period (⊞ + .)
  3. Speak clearly in your chosen language
  4. Release key β†’ text appears instantly!

Tips:

  • Speak naturally, no need to pause between words
  • Works in any application (Wayland-compatible)
  • Auto-detect mode handles mixed Italian/English seamlessly
  • Use just run-verbose to troubleshoot hotkey detection

Configuration

Edit ~/.config/whisper-dictation/config.yaml:

# Hotkey configuration
hotkey:
  key: period           # Any key from evdev.ecodes (e.g., period, comma, space)
  modifiers:
    - super             # Can use: super, ctrl, alt, shift

# Whisper settings
whisper:
  model: base           # Options: tiny (1-2s), base (4-6s), small (10-15s), medium (20-30s), large (40-60s)
  language: auto        # Options: auto, en, it, es, fr, de, etc. (99+ languages supported)
  threads: 4            # CPU threads for transcription (adjust based on your system)

# UI settings
ui:
  show_waveform: false  # Show visual waveform during recording
  theme: dark           # Options: dark, light, auto

# Post-processing
processing:
  remove_filler_words: true   # Remove "um", "uh", etc.
  auto_capitalize: true       # Capitalize first letter of sentences
  auto_punctuate: false       # Auto-add punctuation (experimental)

Quick config changes:

# Change language to auto-detect
sed -i 's/language: .*/language: auto/' ~/.config/whisper-dictation/config.yaml

# Change model to base (faster)
sed -i 's/model: .*/model: base/' ~/.config/whisper-dictation/config.yaml

How It Works

  1. Keyboard Monitoring - evdev captures low-level key events
  2. Audio Recording - ffmpeg records mic input while key is held
  3. Transcription - whisper.cpp processes audio locally
  4. Text Insertion - ydotool pastes text into active window
  5. UI Feedback - GTK4 window shows real-time status

Model Selection Guide

Model Size Speed Accuracy Use Case
tiny 39 MB ~1-2s ⚑ 60% Quick notes, testing
base 142 MB ~4-6s ⚑⚑ 70% Recommended for speed
small 466 MB ~10-15s 80% Balanced performance
medium 1.5 GB ~20-30s 85% High accuracy for LLMs
large 2.9 GB ~40-60s 90% Maximum accuracy

Performance notes:

  • Times measured on CPU (4 threads)
  • GPU acceleration can reduce times by 5-10x
  • base model recommended for Aqua Voice-like speed
  • Switch models by editing model: in config.yaml

Requirements

  • OS: NixOS or any Linux with Nix
  • Desktop: GNOME (Wayland) or other Wayland compositor
  • Permissions: User must be in input group for keyboard monitoring
  • Audio: PulseAudio or PipeWire

Troubleshooting

No audio recording

# Check microphone
ffmpeg -f pulse -i default -t 1 test.wav

# Check PulseAudio/PipeWire
pactl list sources short

Keyboard events not detected

# Add user to input group
sudo usermod -aG input $USER
# Logout/login required (not just reboot)

# Verify input group membership
groups | grep input

ydotool not working

# Start ydotool daemon manually
ydotoold --socket-path=/run/user/1000/.ydotool_socket --socket-perm=0600 &

# Verify socket exists
ls -la /run/user/1000/.ydotool_socket

# Check if ydotoold is running
pgrep -a ydotoold

Slow transcription

# Option 1: Switch to faster model in config
vim ~/.config/whisper-dictation/config.yaml
# Change: model: base  (4-6s) or tiny (1-2s)

# Option 2: Override with command-line flag
python -m whisper_dictation.daemon --verbose --model base

# Option 3: Download faster model
cd ~/.local/share/whisper-models
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin

Virtual keyboard detected instead of real keyboard

The daemon automatically filters out virtual devices (ydotoold, xdotool). If issues persist, check logs with run-daemon-debug to see which devices are detected.

Language not detected correctly

# Option 1: Use auto-detection mode
run-daemon-auto

# Option 2: Specify language explicitly
run-daemon-en        # English
run-daemon-it        # Italian

# Option 3: Check config file
cat ~/.config/whisper-dictation/config.yaml
# Should show: language: auto (or en, it, etc.)

Hotkey not working

# Test with debug mode
run-daemon-debug

# Check for keybinding conflicts in GNOME
gnome-control-center keyboard

# Verify you see "HOTKEY COMBO DETECTED" when pressing Super+Period
# If you see "has_mods=False", verify Super key is being detected

Nix environment issues

# Rebuild environment
nix develop

# If packages missing, update
nix flake update

# Clear nix cache and rebuild
nix develop --refresh

Development

See DEVELOPMENT.md for comprehensive development guide.

Quick Commands

# Enter dev shell
nix develop

# Show all available commands
just

# Run the daemon
just run

# Run tests
just test

# Format code
just format

# Check all quality gates (lint + test)
just check

# Test dictation without keyboard (records 5 seconds)
just test-dictation

# Check setup status
just status

# Build nix package
just build

Available Recipes

Run just to see all available commands:

  • just run - Run daemon with default settings
  • just run-verbose - Run with verbose output
  • just run-auto - Run with auto language detection
  • just run-en - Run with English
  • just run-fast - Run with base model (faster)
  • just test - Run test suite
  • just test-cov - Run tests with coverage
  • just format - Auto-format code (Black + Ruff)
  • just lint - Check code style
  • just check - Run all quality checks
  • just test-dictation - Test recording/transcription
  • just download-model-base - Download base model (~142MB)
  • just download-model-medium - Download medium model (~1.5GB)
  • just start-ydotool - Start ydotool daemon
  • just status - Show setup status

Comparison to Other Tools

Feature Whisper Dictation Aqua Voice Talon Voice
Privacy βœ… 100% Local ❌ Cloud βœ… Local
Cost βœ… Free πŸ’² $8/mo πŸ’² $15/mo
NixOS Support βœ… Native ❌ No ⚠️ Manual
Technical Terms ⚠️ 65-85% βœ… 97% βœ… 95%
Wayland βœ… Yes ⚠️ Limited ❌ X11 only
Real-time βœ… Yes βœ… Yes βœ… Yes

Roadmap

  • Streaming transcription (live preview while speaking)
  • Custom vocabulary training
  • Command mode (voice commands for actions)
  • Integration with LLM APIs (Claude, GPT-4)
  • Multi-backend support (Avalon API, Deepgram)
  • Voice profiles for different contexts

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE

Acknowledgments


Made with ❀️ for the NixOS community

About

🎀 Privacy-first local speech-to-text dictation for NixOS - Whisper.cpp powered push-to-talk with real-time feedback

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 64.0%
  • Nix 28.9%
  • Just 6.6%
  • Shell 0.5%