Skip to content

deksprime/live-type

Repository files navigation

Live Type - Real-time Speech-to-Text Typing

A system application that transcribes your speech in real-time and types it wherever your cursor is positioned.

Features

  • Real-time transcription using whisper.cpp
  • Automatic text injection at cursor position (works in browsers, text editors, terminals, etc.)
  • Global hotkey support (Ctrl+Shift+V by default)
  • Cross-platform (Linux X11 and Wayland)
  • Low latency with configurable audio chunk sizes

Requirements

System Dependencies

  • Linux (X11 or Wayland)
  • SDL2 (libsdl2-dev)
  • X11 libraries (libx11-dev, libxtst-dev) - for X11 text injection
  • Wayland tools (optional): ydotool or wtype - for Wayland text injection
  • Clipboard tools (fallback): xclip or xsel - for clipboard-based text injection

Whisper Model

You need a whisper model file. Download one from whisper.cpp:

cd ../whisper.cpp/models
./download-ggml-model.sh base.en

Recommended models:

  • tiny.en - Fastest, lower accuracy
  • base.en - Good balance (recommended)
  • small.en - Better accuracy, slower

Building

Prerequisites

This project requires whisper.cpp as a dependency. Ensure whisper.cpp is located as a sibling directory:

voice-ai/
├── whisper.cpp/     # whisper.cpp repository
└── live-type/        # This repository

Build Instructions

cd live-type
cmake -B build
cmake --build build -j

The binary will be at: build/bin/live-type

Note: The build process will automatically:

  • Find whisper.cpp in the parent directory
  • Build whisper.cpp with SDL2 support enabled
  • Link against whisper.cpp libraries

Usage

Basic Usage

Toggle Mode (default):

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin
# Press Ctrl+Shift+V to start/stop recording

Push-to-Talk Mode (Keyboard):

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --push-to-talk "Ctrl+Shift" Insert
# Press and hold Ctrl+Shift+Insert to record, release to stop

Mouse Button Toggle:

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --mouse-button BTN_SIDE
# Click your side mouse button to start/stop recording

Mouse Button Push-to-Talk:

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --mouse-ptt BTN_EXTRA
# Press and hold your extra mouse button to record, release to stop

Combined (Keyboard + Mouse):

./build/bin/live-type -m ../whisper.cpp/models/ggml-base.en.bin --push-to-talk "Ctrl+Shift" Insert --mouse-ptt BTN_SIDE
# Use either keyboard or mouse button to record

Note:

  • Avoid using Space as the push-to-talk key as it conflicts with many system shortcuts.
  • Recommended keyboard alternatives: Insert, Grave (backtick ), F12, or Backslash`.
  • Common mouse buttons: BTN_SIDE, BTN_EXTRA, BTN_FORWARD, BTN_BACK
  • Use ./test_mouse_buttons.sh to identify your mouse buttons

Command Line Options

-m, --model PATH           Path to whisper model (required)
-l, --language LANG        Language code (default: en)
-t, --threads N            Number of threads (default: auto)

Keyboard Hotkeys:
--toggle-key MODS KEY      Toggle hotkey (default: Ctrl+Shift V)
--push-to-talk MODS KEY    Push-to-talk hotkey (press and hold)

Mouse Buttons:
--mouse-button BUTTON      Toggle mouse button (click to start/stop)
--mouse-ptt BUTTON         Push-to-talk mouse button (hold to record)

-h, --help                 Show help message

Controls

  • Ctrl+Shift+V - Toggle recording (start/stop) - default keyboard hotkey
  • Mouse Button - Toggle or push-to-talk (if configured)
  • Ctrl+C - Exit application

Identifying Mouse Buttons

To find out which button names correspond to your physical mouse buttons, use the included test script:

./test_mouse_buttons.sh

This will show you the Linux button names (e.g., BTN_SIDE, BTN_EXTRA) as you press each button on your mouse.

How It Works

  1. Audio Capture: Uses SDL2 to capture microphone input at 16kHz
  2. Transcription: Processes audio chunks (3-5 seconds) through whisper.cpp
  3. Text Injection: Types transcribed text at current cursor position using:
    • X11: XTest extension for direct key simulation
    • Wayland: ydotool or wtype command-line tools
    • Fallback: Clipboard + Ctrl+V simulation

Text Injection Methods

X11 (Default on most Linux systems)

Uses X11 XTest extension to simulate keyboard input. Works in most applications.

Wayland

Requires ydotool or wtype to be installed:

# Install ydotool (recommended)
sudo apt install ydotool  # Ubuntu/Debian
# or
sudo dnf install ydotool  # Fedora

# Or install wtype
sudo apt install wtype  # Ubuntu/Debian

Note: On Wayland, you may need to run ydotool daemon first:

sudo ydotool daemon

Fallback Method

If direct text injection fails, the application falls back to:

  1. Copying text to clipboard (xclip or xsel)
  2. Simulating Ctrl+V to paste

Troubleshooting

Text Not Typing

  1. Check display server: Run echo $XDG_SESSION_TYPE to see if you're on X11 or Wayland
  2. X11: Ensure libxtst-dev is installed
  3. Wayland: Install and run ydotool daemon: sudo ydotool daemon
  4. Permissions: Some applications may block simulated keyboard input

Audio Not Capturing

  1. Check microphone permissions
  2. List audio devices: The app will show available capture devices on startup
  3. Ensure SDL2 audio is working: Test with whisper.cpp examples

Hotkey Not Working

  1. Ensure no other application is using Ctrl+Shift+V
  2. On Wayland, hotkey detection may be limited - consider using mouse buttons instead

Mouse Button Not Working

  1. Check that you have permission to read the mouse device file
  2. Recommended: Install the device-specific udev rule (more secure):
    sudo cp 99-logitech-g502x-mouse-button.rules /etc/udev/rules.d/
    sudo udevadm control --reload-rules
    sudo udevadm trigger
    Then replug your mouse
  3. Alternative (less secure): Add yourself to the input group:
    sudo usermod -a -G input $USER
    ⚠️ This grants access to ALL input devices (potential keylogging risk)
  4. Use ./test_mouse_buttons.sh to verify your button names
  5. Make sure the mouse is connected and recognized by the system

High CPU Usage

  1. Use a smaller model (tiny.en instead of base.en)
  2. Reduce transcription frequency by adjusting chunk sizes in code
  3. Reduce number of threads: -t 2

Project Structure

live-type/
├── CMakeLists.txt      # Build configuration
├── README.md           # This file
├── .gitignore          # Git ignore rules
├── src/                # Source files (.cpp)
│   ├── main.cpp
│   ├── live_type_app.cpp
│   ├── transcription.cpp
│   ├── text_injector.cpp
│   └── hotkey_manager.cpp
├── include/            # Header files (.h)
│   ├── live_type_app.h
│   ├── transcription.h
│   ├── text_injector.h
│   └── hotkey_manager.h
└── docs/               # Documentation
    ├── QUICKSTART.md
    └── INSTALL_DEPS.md

Configuration

You can modify the following in src/live_type_app.cpp:

  • m_step_ms: Audio chunk size (default: 5000ms)
  • m_length_ms: Audio buffer length (default: 8000ms)
  • Hotkey: Modify in src/main.cpp or src/live_type_app.cpp

Limitations

  • Text injection may not work in all applications (some security-focused apps block it)
  • Wayland support requires external tools (ydotool/wtype)
  • Real-time transcription has inherent latency (1-3 seconds depending on model)
  • Best results with clear speech and minimal background noise

Future Enhancements

  • GUI with system tray icon
  • Configurable hotkeys via GUI
  • Audio level visualization
  • Multiple language support
  • Custom word replacements/corrections

License

Same as whisper.cpp (MIT License)

About

Real-time speech-to-text that types anywhere on Linux

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors