π€ Acqua Voice-like local speech-to-text dictation for NixOS
Fast, accurate, privacy-first voice input powered by whisper.cpp. Press and hold a hotkey, speak, release to paste transcribed text anywhere.
- π 100% Local & Private - No cloud, no telemetry, works offline
- β‘ Real-time Feedback - Live transcription with floating UI
- π― Push-to-Talk - Hold Super+Period, speak, release to paste
- π§ Technical Accuracy - Optimized for developer/AI workflows
- π Multilingual - Support for 99 languages via Whisper
- π¨ Native GNOME - GTK4 UI, Wayland-compatible
Add to your flake.nix:
{
inputs.whisper-dictation.url = "github:jacopone/whisper-dictation";
# In your configuration
environment.systemPackages = [
inputs.whisper-dictation.packages.${system}.default
];
# Enable auto-start
systemd.user.services.whisper-dictation = {
enable = true;
wantedBy = [ "graphical-session.target" ];
};
}# Clone repository
git clone https://github.com/jacopone/whisper-dictation.git
cd whisper-dictation
# Enter development environment
nix develop
# Run directly
python -m whisper_dictation.daemon-
Ensure you're in the
inputgroup (required for keyboard monitoring):sudo usermod -aG input $USER # β οΈ Logout and login required (not just reboot!) # Verify group membership groups | grep input
-
Download Whisper model (first time only):
# Create models directory mkdir -p ~/.local/share/whisper-models cd ~/.local/share/whisper-models # For fast dictation (recommended - 4-6s processing): curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin # Or for better accuracy (20-30s processing): curl -L -o ggml-medium.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin
-
Start ydotoold daemon (required for text pasting):
ydotoold --socket-path=/run/user/1000/.ydotool_socket --socket-perm=0600 & # Verify it's running pgrep -a ydotoold
Enter development environment:
cd ~/whisper-dictation
nix develop # Or use direnv if configuredStart the dictation daemon (choose one option):
Option A: Using just recipes (recommended):
just run # Use config file settings
just run-auto # Auto-detect language
just run-en # English only
just run-fast # Use base model (faster)
just run-verbose # With debug outputOption B: Command-line flags (temporary override):
python -m whisper_dictation.daemon --verbose --language auto # Auto-detect
python -m whisper_dictation.daemon --verbose --language en
python -m whisper_dictation.daemon --verbose --model base # Override modelOption C: Edit config file (persistent setting):
vim ~/.config/whisper-dictation/config.yaml
# Change: language: auto (or en, it, es, fr, etc.)
# Change: model: base (or tiny, small, medium, large)
just run- Click in any text field (browser, editor, terminal, etc.)
- Press and hold Super+Period (β + .)
- Speak clearly in your chosen language
- Release key β text appears instantly!
Tips:
- Speak naturally, no need to pause between words
- Works in any application (Wayland-compatible)
- Auto-detect mode handles mixed Italian/English seamlessly
- Use
just run-verboseto troubleshoot hotkey detection
Edit ~/.config/whisper-dictation/config.yaml:
# Hotkey configuration
hotkey:
key: period # Any key from evdev.ecodes (e.g., period, comma, space)
modifiers:
- super # Can use: super, ctrl, alt, shift
# Whisper settings
whisper:
model: base # Options: tiny (1-2s), base (4-6s), small (10-15s), medium (20-30s), large (40-60s)
language: auto # Options: auto, en, it, es, fr, de, etc. (99+ languages supported)
threads: 4 # CPU threads for transcription (adjust based on your system)
# UI settings
ui:
show_waveform: false # Show visual waveform during recording
theme: dark # Options: dark, light, auto
# Post-processing
processing:
remove_filler_words: true # Remove "um", "uh", etc.
auto_capitalize: true # Capitalize first letter of sentences
auto_punctuate: false # Auto-add punctuation (experimental)Quick config changes:
# Change language to auto-detect
sed -i 's/language: .*/language: auto/' ~/.config/whisper-dictation/config.yaml
# Change model to base (faster)
sed -i 's/model: .*/model: base/' ~/.config/whisper-dictation/config.yaml- Keyboard Monitoring -
evdevcaptures low-level key events - Audio Recording -
ffmpegrecords mic input while key is held - Transcription -
whisper.cppprocesses audio locally - Text Insertion -
ydotoolpastes text into active window - UI Feedback - GTK4 window shows real-time status
| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| tiny | 39 MB | ~1-2s β‘ | 60% | Quick notes, testing |
| base | 142 MB | ~4-6s β‘β‘ | 70% | Recommended for speed |
| small | 466 MB | ~10-15s | 80% | Balanced performance |
| medium | 1.5 GB | ~20-30s | 85% | High accuracy for LLMs |
| large | 2.9 GB | ~40-60s | 90% | Maximum accuracy |
Performance notes:
- Times measured on CPU (4 threads)
- GPU acceleration can reduce times by 5-10x
- base model recommended for Aqua Voice-like speed
- Switch models by editing
model:in config.yaml
- OS: NixOS or any Linux with Nix
- Desktop: GNOME (Wayland) or other Wayland compositor
- Permissions: User must be in
inputgroup for keyboard monitoring - Audio: PulseAudio or PipeWire
# Check microphone
ffmpeg -f pulse -i default -t 1 test.wav
# Check PulseAudio/PipeWire
pactl list sources short# Add user to input group
sudo usermod -aG input $USER
# Logout/login required (not just reboot)
# Verify input group membership
groups | grep input# Start ydotool daemon manually
ydotoold --socket-path=/run/user/1000/.ydotool_socket --socket-perm=0600 &
# Verify socket exists
ls -la /run/user/1000/.ydotool_socket
# Check if ydotoold is running
pgrep -a ydotoold# Option 1: Switch to faster model in config
vim ~/.config/whisper-dictation/config.yaml
# Change: model: base (4-6s) or tiny (1-2s)
# Option 2: Override with command-line flag
python -m whisper_dictation.daemon --verbose --model base
# Option 3: Download faster model
cd ~/.local/share/whisper-models
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binThe daemon automatically filters out virtual devices (ydotoold, xdotool).
If issues persist, check logs with run-daemon-debug to see which devices are detected.
# Option 1: Use auto-detection mode
run-daemon-auto
# Option 2: Specify language explicitly
run-daemon-en # English
run-daemon-it # Italian
# Option 3: Check config file
cat ~/.config/whisper-dictation/config.yaml
# Should show: language: auto (or en, it, etc.)# Test with debug mode
run-daemon-debug
# Check for keybinding conflicts in GNOME
gnome-control-center keyboard
# Verify you see "HOTKEY COMBO DETECTED" when pressing Super+Period
# If you see "has_mods=False", verify Super key is being detected# Rebuild environment
nix develop
# If packages missing, update
nix flake update
# Clear nix cache and rebuild
nix develop --refreshSee DEVELOPMENT.md for comprehensive development guide.
# Enter dev shell
nix develop
# Show all available commands
just
# Run the daemon
just run
# Run tests
just test
# Format code
just format
# Check all quality gates (lint + test)
just check
# Test dictation without keyboard (records 5 seconds)
just test-dictation
# Check setup status
just status
# Build nix package
just buildRun just to see all available commands:
just run- Run daemon with default settingsjust run-verbose- Run with verbose outputjust run-auto- Run with auto language detectionjust run-en- Run with Englishjust run-fast- Run with base model (faster)just test- Run test suitejust test-cov- Run tests with coveragejust format- Auto-format code (Black + Ruff)just lint- Check code stylejust check- Run all quality checksjust test-dictation- Test recording/transcriptionjust download-model-base- Download base model (~142MB)just download-model-medium- Download medium model (~1.5GB)just start-ydotool- Start ydotool daemonjust status- Show setup status
| Feature | Whisper Dictation | Aqua Voice | Talon Voice |
|---|---|---|---|
| Privacy | β 100% Local | β Cloud | β Local |
| Cost | β Free | π² $8/mo | π² $15/mo |
| NixOS Support | β Native | β No | |
| Technical Terms | β 97% | β 95% | |
| Wayland | β Yes | β X11 only | |
| Real-time | β Yes | β Yes | β Yes |
- Streaming transcription (live preview while speaking)
- Custom vocabulary training
- Command mode (voice commands for actions)
- Integration with LLM APIs (Claude, GPT-4)
- Multi-backend support (Avalon API, Deepgram)
- Voice profiles for different contexts
Contributions welcome! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE
- whisper.cpp - Fast whisper implementation
- Aqua Voice - UI/UX inspiration
- ydotool - Wayland input automation
Made with β€οΈ for the NixOS community