Live Type - Real-time Speech-to-Text Typing

A system application that transcribes your speech in real-time and types it wherever your cursor is positioned.

Features

Real-time transcription using whisper.cpp
Automatic text injection at cursor position (works in browsers, text editors, terminals, etc.)
Global hotkey support (Ctrl+Shift+V by default)
Cross-platform (Linux X11 and Wayland)
Low latency with configurable audio chunk sizes

Requirements

System Dependencies

Linux (X11 or Wayland)
SDL2 (libsdl2-dev)
X11 libraries (libx11-dev, libxtst-dev) - for X11 text injection
Wayland tools (optional): ydotool or wtype - for Wayland text injection
Clipboard tools (fallback): xclip or xsel - for clipboard-based text injection

Whisper Model

You need a whisper model file. Download one from whisper.cpp:

cd ../whisper.cpp/models
./download-ggml-model.sh base.en

Recommended models:

tiny.en - Fastest, lower accuracy
base.en - Good balance (recommended)
small.en - Better accuracy, slower

Building

Prerequisites

This project requires whisper.cpp as a dependency. Ensure whisper.cpp is located as a sibling directory:

voice-ai/
├── whisper.cpp/     # whisper.cpp repository
└── live-type/        # This repository

Build Instructions

cd live-type
cmake -B build
cmake --build build -j

The binary will be at: build/bin/live-type

Note: The build process will automatically:

Find whisper.cpp in the parent directory
Build whisper.cpp with SDL2 support enabled
Link against whisper.cpp libraries

Usage

Basic Usage

Toggle Mode (default):

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin
# Press Ctrl+Shift+V to start/stop recording

Push-to-Talk Mode (Keyboard):

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --push-to-talk "Ctrl+Shift" Insert
# Press and hold Ctrl+Shift+Insert to record, release to stop

Mouse Button Toggle:

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --mouse-button BTN_SIDE
# Click your side mouse button to start/stop recording

Mouse Button Push-to-Talk:

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --mouse-ptt BTN_EXTRA
# Press and hold your extra mouse button to record, release to stop

Combined (Keyboard + Mouse):

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --push-to-talk "Ctrl+Shift" Insert --mouse-ptt BTN_SIDE
# Use either keyboard or mouse button to record

Note:

Avoid using Space as the push-to-talk key as it conflicts with many system shortcuts.
Recommended keyboard alternatives: Insert, Grave (backtick ), F12, or Backslash`.
Common mouse buttons: BTN_SIDE, BTN_EXTRA, BTN_FORWARD, BTN_BACK
Use ./test_mouse_buttons.sh to identify your mouse buttons

Command Line Options

-m, --model PATH           Path to whisper model (required)
-l, --language LANG        Language code (default: en)
-t, --threads N            Number of threads (default: auto)

Keyboard Hotkeys:
--toggle-key MODS KEY      Toggle hotkey (default: Ctrl+Shift V)
--push-to-talk MODS KEY    Push-to-talk hotkey (press and hold)

Mouse Buttons:
--mouse-button BUTTON      Toggle mouse button (click to start/stop)
--mouse-ptt BUTTON         Push-to-talk mouse button (hold to record)

-h, --help                 Show help message

Controls

Ctrl+Shift+V - Toggle recording (start/stop) - default keyboard hotkey
Mouse Button - Toggle or push-to-talk (if configured)
Ctrl+C - Exit application

Identifying Mouse Buttons

To find out which button names correspond to your physical mouse buttons, use the included test script:

./test_mouse_buttons.sh

This will show you the Linux button names (e.g., BTN_SIDE, BTN_EXTRA) as you press each button on your mouse.

How It Works

Audio Capture: Uses SDL2 to capture microphone input at 16kHz
Transcription: Processes audio chunks (3-5 seconds) through whisper.cpp
Text Injection: Types transcribed text at current cursor position using:
- X11: XTest extension for direct key simulation
- Wayland: ydotool or wtype command-line tools
- Fallback: Clipboard + Ctrl+V simulation

Text Injection Methods

X11 (Default on most Linux systems)

Uses X11 XTest extension to simulate keyboard input. Works in most applications.

Wayland

Requires ydotool or wtype to be installed:

# Install ydotool (recommended)
sudo apt install ydotool  # Ubuntu/Debian
# or
sudo dnf install ydotool  # Fedora

# Or install wtype
sudo apt install wtype  # Ubuntu/Debian

Note: On Wayland, you may need to run ydotool daemon first:

sudo ydotool daemon

Fallback Method

If direct text injection fails, the application falls back to:

Copying text to clipboard (xclip or xsel)
Simulating Ctrl+V to paste

Troubleshooting

Text Not Typing

Check display server: Run echo $XDG_SESSION_TYPE to see if you're on X11 or Wayland
X11: Ensure libxtst-dev is installed
Wayland: Install and run ydotool daemon: sudo ydotool daemon
Permissions: Some applications may block simulated keyboard input

Audio Not Capturing

Check microphone permissions
List audio devices: The app will show available capture devices on startup
Ensure SDL2 audio is working: Test with whisper.cpp examples

Hotkey Not Working

Ensure no other application is using Ctrl+Shift+V
On Wayland, hotkey detection may be limited - consider using mouse buttons instead

Mouse Button Not Working

Check that you have permission to read the mouse device file

Recommended: Install the device-specific udev rule (more secure):

sudo cp 99-logitech-g502x-mouse-button.rules /etc/udev/rules.d/
sudo udevadm control --reload-rules
sudo udevadm trigger

Then replug your mouse

Alternative (less secure): Add yourself to the input group:
```
sudo usermod -a -G input $USER
```
⚠️ This grants access to ALL input devices (potential keylogging risk)
Use ./test_mouse_buttons.sh to verify your button names
Make sure the mouse is connected and recognized by the system

High CPU Usage

Use a smaller model (tiny.en instead of base.en)
Reduce transcription frequency by adjusting chunk sizes in code
Reduce number of threads: -t 2

Project Structure

live-type/
├── CMakeLists.txt      # Build configuration
├── README.md           # This file
├── .gitignore          # Git ignore rules
├── src/                # Source files (.cpp)
│   ├── main.cpp
│   ├── live_type_app.cpp
│   ├── transcription.cpp
│   ├── text_injector.cpp
│   └── hotkey_manager.cpp
├── include/            # Header files (.h)
│   ├── live_type_app.h
│   ├── transcription.h
│   ├── text_injector.h
│   └── hotkey_manager.h
└── docs/               # Documentation
    ├── QUICKSTART.md
    └── INSTALL_DEPS.md

Configuration

You can modify the following in src/live_type_app.cpp:

m_step_ms: Audio chunk size (default: 5000ms)
m_length_ms: Audio buffer length (default: 8000ms)
Hotkey: Modify in src/main.cpp or src/live_type_app.cpp

Limitations

Text injection may not work in all applications (some security-focused apps block it)
Wayland support requires external tools (ydotool/wtype)
Real-time transcription has inherent latency (1-3 seconds depending on model)
Best results with clear speech and minimal background noise

Future Enhancements

GUI with system tray icon
Configurable hotkeys via GUI
Audio level visualization
Multiple language support
Custom word replacements/corrections

License

Same as whisper.cpp (MIT License)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
include		include
src		src
.gitignore		.gitignore
99-logitech-g502x-mouse-button.rules		99-logitech-g502x-mouse-button.rules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
test_mouse_buttons.sh		test_mouse_buttons.sh

Folders and files

Latest commit

History

Repository files navigation

Live Type - Real-time Speech-to-Text Typing

Features

Requirements

System Dependencies

Whisper Model

Building

Prerequisites

Build Instructions

Usage

Basic Usage

Command Line Options

Controls

Identifying Mouse Buttons

How It Works

Text Injection Methods

X11 (Default on most Linux systems)

Wayland

Fallback Method

Troubleshooting

Text Not Typing

Audio Not Capturing

Hotkey Not Working

Mouse Button Not Working

High CPU Usage

Project Structure

Configuration

Limitations

Future Enhancements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages