A system application that transcribes your speech in real-time and types it wherever your cursor is positioned.
- Real-time transcription using whisper.cpp
- Automatic text injection at cursor position (works in browsers, text editors, terminals, etc.)
- Global hotkey support (Ctrl+Shift+V by default)
- Cross-platform (Linux X11 and Wayland)
- Low latency with configurable audio chunk sizes
- Linux (X11 or Wayland)
- SDL2 (
libsdl2-dev) - X11 libraries (
libx11-dev,libxtst-dev) - for X11 text injection - Wayland tools (optional):
ydotoolorwtype- for Wayland text injection - Clipboard tools (fallback):
xcliporxsel- for clipboard-based text injection
You need a whisper model file. Download one from whisper.cpp:
cd ../whisper.cpp/models
./download-ggml-model.sh base.enRecommended models:
tiny.en- Fastest, lower accuracybase.en- Good balance (recommended)small.en- Better accuracy, slower
This project requires whisper.cpp as a dependency. Ensure whisper.cpp is located as a sibling directory:
voice-ai/
├── whisper.cpp/ # whisper.cpp repository
└── live-type/ # This repository
cd live-type
cmake -B build
cmake --build build -jThe binary will be at: build/bin/live-type
Note: The build process will automatically:
- Find whisper.cpp in the parent directory
- Build whisper.cpp with SDL2 support enabled
- Link against whisper.cpp libraries
Toggle Mode (default):
./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin
# Press Ctrl+Shift+V to start/stop recordingPush-to-Talk Mode (Keyboard):
./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --push-to-talk "Ctrl+Shift" Insert
# Press and hold Ctrl+Shift+Insert to record, release to stopMouse Button Toggle:
./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --mouse-button BTN_SIDE
# Click your side mouse button to start/stop recordingMouse Button Push-to-Talk:
./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --mouse-ptt BTN_EXTRA
# Press and hold your extra mouse button to record, release to stopCombined (Keyboard + Mouse):
./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --push-to-talk "Ctrl+Shift" Insert --mouse-ptt BTN_SIDE
# Use either keyboard or mouse button to recordNote:
- Avoid using
Spaceas the push-to-talk key as it conflicts with many system shortcuts. - Recommended keyboard alternatives:
Insert,Grave(backtick),F12, orBackslash`. - Common mouse buttons:
BTN_SIDE,BTN_EXTRA,BTN_FORWARD,BTN_BACK - Use
./test_mouse_buttons.shto identify your mouse buttons
-m, --model PATH Path to whisper model (required)
-l, --language LANG Language code (default: en)
-t, --threads N Number of threads (default: auto)
Keyboard Hotkeys:
--toggle-key MODS KEY Toggle hotkey (default: Ctrl+Shift V)
--push-to-talk MODS KEY Push-to-talk hotkey (press and hold)
Mouse Buttons:
--mouse-button BUTTON Toggle mouse button (click to start/stop)
--mouse-ptt BUTTON Push-to-talk mouse button (hold to record)
-h, --help Show help message
- Ctrl+Shift+V - Toggle recording (start/stop) - default keyboard hotkey
- Mouse Button - Toggle or push-to-talk (if configured)
- Ctrl+C - Exit application
To find out which button names correspond to your physical mouse buttons, use the included test script:
./test_mouse_buttons.shThis will show you the Linux button names (e.g., BTN_SIDE, BTN_EXTRA) as you press each button on your mouse.
- Audio Capture: Uses SDL2 to capture microphone input at 16kHz
- Transcription: Processes audio chunks (3-5 seconds) through whisper.cpp
- Text Injection: Types transcribed text at current cursor position using:
- X11: XTest extension for direct key simulation
- Wayland:
ydotoolorwtypecommand-line tools - Fallback: Clipboard + Ctrl+V simulation
Uses X11 XTest extension to simulate keyboard input. Works in most applications.
Requires ydotool or wtype to be installed:
# Install ydotool (recommended)
sudo apt install ydotool # Ubuntu/Debian
# or
sudo dnf install ydotool # Fedora
# Or install wtype
sudo apt install wtype # Ubuntu/DebianNote: On Wayland, you may need to run ydotool daemon first:
sudo ydotool daemonIf direct text injection fails, the application falls back to:
- Copying text to clipboard (
xcliporxsel) - Simulating Ctrl+V to paste
- Check display server: Run
echo $XDG_SESSION_TYPEto see if you're on X11 or Wayland - X11: Ensure
libxtst-devis installed - Wayland: Install and run
ydotooldaemon:sudo ydotool daemon - Permissions: Some applications may block simulated keyboard input
- Check microphone permissions
- List audio devices: The app will show available capture devices on startup
- Ensure SDL2 audio is working: Test with whisper.cpp examples
- Ensure no other application is using Ctrl+Shift+V
- On Wayland, hotkey detection may be limited - consider using mouse buttons instead
- Check that you have permission to read the mouse device file
- Recommended: Install the device-specific udev rule (more secure):
Then replug your mouse
sudo cp 99-logitech-g502x-mouse-button.rules /etc/udev/rules.d/ sudo udevadm control --reload-rules sudo udevadm trigger
- Alternative (less secure): Add yourself to the
inputgroup:sudo usermod -a -G input $USER⚠️ This grants access to ALL input devices (potential keylogging risk) - Use
./test_mouse_buttons.shto verify your button names - Make sure the mouse is connected and recognized by the system
- Use a smaller model (
tiny.eninstead ofbase.en) - Reduce transcription frequency by adjusting chunk sizes in code
- Reduce number of threads:
-t 2
live-type/
├── CMakeLists.txt # Build configuration
├── README.md # This file
├── .gitignore # Git ignore rules
├── src/ # Source files (.cpp)
│ ├── main.cpp
│ ├── live_type_app.cpp
│ ├── transcription.cpp
│ ├── text_injector.cpp
│ └── hotkey_manager.cpp
├── include/ # Header files (.h)
│ ├── live_type_app.h
│ ├── transcription.h
│ ├── text_injector.h
│ └── hotkey_manager.h
└── docs/ # Documentation
├── QUICKSTART.md
└── INSTALL_DEPS.md
You can modify the following in src/live_type_app.cpp:
m_step_ms: Audio chunk size (default: 5000ms)m_length_ms: Audio buffer length (default: 8000ms)- Hotkey: Modify in
src/main.cpporsrc/live_type_app.cpp
- Text injection may not work in all applications (some security-focused apps block it)
- Wayland support requires external tools (
ydotool/wtype) - Real-time transcription has inherent latency (1-3 seconds depending on model)
- Best results with clear speech and minimal background noise
- GUI with system tray icon
- Configurable hotkeys via GUI
- Audio level visualization
- Multiple language support
- Custom word replacements/corrections
Same as whisper.cpp (MIT License)