Skip to content

Snehit70/hyprvox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

272 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hyprvox

Build Status

Voice input for AI workflows on Linux.

The Problem

You're deep in a session with a coding agent. You know exactly what you want to ask — a complex refactor, a debugging question, a feature request. But now you have to type it all out.

By the time you're done, you've lost the thread.

Context switching kills flow. And typing at 40 WPM when you can speak at 150 WPM is a bottleneck you don't need.

The Solution

Press a key. Speak. Press again. Paste.

hyprvox is a voice-to-text daemon for Linux. It runs in the background, transcribes when you need it, and puts the result on your clipboard — ready to paste into Claude, Copilot, or whatever agent you're working with.

Built for Hyprland/Wayland first. Works on X11 too.

Quick Start

Prerequisites

# Install Bun (if not already installed)
curl -fsSL https://bun.sh/install | bash

# Install ffmpeg (required for Opus audio conversion)
# Arch:   sudo pacman -S ffmpeg
# Ubuntu: sudo apt install ffmpeg
# Fedora: sudo dnf install ffmpeg

Installation

git clone https://github.com/Snehit70/hyprvox.git
cd hyprvox
bun install

bun run index.ts config init   # Set up API keys (Groq + Deepgram)
bun run index.ts install       # Install as systemd service

Press Right Ctrl to record. Press again to stop. Paste anywhere.

Works on both Wayland and X11. On X11/GNOME/KDE, the built-in hotkey works out of the box. On Wayland (Hyprland, Sway), see compositor keybind setup for reliable system-wide hotkeys.

For AI Agents

Click to expand setup prompt

Copy this prompt to your coding agent:

Install and configure hyprvox on this Linux system:

1. Clone: git clone https://github.com/Snehit70/hyprvox.git
2. Install: cd hyprvox && bun install
3. Run `bun run index.ts config init` — I'll provide API keys when prompted:
   - Groq API key (get from console.groq.com)
   - Deepgram API key (get from console.deepgram.com)
4. Install service: bun run index.ts install
5. For Hyprland, add keybind to ~/.config/hypr/hyprland.conf:
    bind = , code:105, exec, bun run /path/to/hyprvox/index.ts toggle
    # code:105 = Right Control (use `wev` to find other key codes)
6. For Hyprland overlay, add to ~/.config/hypr/UserConfigs/WindowRules.conf:
    windowrule = match:class hyprvox-overlay, float on
    windowrule = match:class hyprvox-overlay, pin on
    windowrule = match:class hyprvox-overlay, no_focus on
    windowrule = match:class hyprvox-overlay, no_shadow on
    windowrule = match:class hyprvox-overlay, no_anim on
    windowrule = match:class hyprvox-overlay, move ((monitor_w-window_w)*0.5) (monitor_h-window_h-50)
7. Reload: hyprctl reload
8. Verify: bun run index.ts health

How It Works

Dual-engine transcription. Audio goes to both Groq (Whisper V3) and Deepgram (Nova-3) in parallel. Results are merged with an LLM for better accuracy. If one fails, the other continues.

Streaming or batch. ~500ms latency in streaming mode. Higher accuracy in batch mode. Your choice.

Runs as a daemon. Systemd service starts on login. Always ready when you need it.

Performance

Metric Value
Median latency 882ms
Real-time factor 39x faster than real-time
Dual-engine success 93.5%
Filler words removed 12.3% (by LLM cleanup)
LLM merge overhead ~280ms

The LLM doesn't just merge — it removes filler words ("um", "uh"), false starts, and self-corrections automatically.

The Overlay

A small waveform appears at the bottom of your screen while recording — visual feedback that it's listening.

Overlay showing waveform during recording

For Hyprland, add these window rules:

# ~/.config/hypr/UserConfigs/WindowRules.conf
windowrule = match:class hyprvox-overlay, float on
windowrule = match:class hyprvox-overlay, pin on
windowrule = match:class hyprvox-overlay, no_focus on
windowrule = match:class hyprvox-overlay, no_shadow on
windowrule = match:class hyprvox-overlay, no_anim on
windowrule = match:class hyprvox-overlay, move ((monitor_w-window_w)*0.5) (monitor_h-window_h-50)

Installation

Dependencies

Click to expand

Audioalsa-utils

  • Arch: sudo pacman -S alsa-utils
  • Ubuntu: sudo apt install alsa-utils
  • Fedora: sudo dnf install alsa-utils

Clipboard

  • Wayland: wl-clipboard
  • X11: xclip or xsel

Permissions

sudo usermod -aG audio,input $USER
# Log out and back in

API Keys

Provider Purpose Link
Groq Whisper V3 (fast) console.groq.com
Deepgram Nova-3 (accurate) console.deepgram.com

Run bun run index.ts config init to set them up.

Usage

bun run index.ts status      # Check daemon status
bun run index.ts health      # Test system setup
bun run index.ts toggle      # Start/stop recording
bun run index.ts history     # View past transcriptions
bun run index.ts logs        # Tail daemon logs
bun run index.ts errors      # Show last error
bun run index.ts config init # Set up API keys
bun run index.ts boost add   # Add custom vocabulary

Configuration

Config file: ~/.config/hypr/vox/config.json

{
  "apiKeys": { "groq": "...", "deepgram": "..." },
  "transcription": {
    "streaming": true,
    "boostWords": ["Hyprland", "WebSocket", "refactor"]
  }
}

Streaming mode — ~500ms latency, slightly lower accuracy. Batch mode — 2-8 seconds, higher accuracy. Boost words — Improve recognition for technical terms.

Full options: Configuration Guide

Hyprland Setup

Add keybind for global hotkey:

# ~/.config/hypr/hyprland.conf
bind = , code:105, exec, bun run /path/to/hyprvox/index.ts toggle
# code:105 = Right Control

Use wev | grep -A5 "key event" to find key codes.

This bypasses XWayland limitations.

Full guide: Wayland Support

Troubleshooting

Problem Fix
Hotkey not working Add user to input group; use compositor binds on Wayland
No audio Add user to audio group
Clipboard issues Install wl-clipboard (Wayland) or xclip (X11)
Service won't start Check logs: journalctl --user -u hyprvox -f

Full guide: Troubleshooting

Documentation

Release Workflow

  • Use Conventional Commits on branches merged into main; feat: triggers a minor bump and fix: triggers a patch bump.
  • .github/workflows/release-please.yml opens or updates the release PR, and .github/workflows/release.yml publishes tagged releases after tests pass.
  • Release Please uses release-please-config.json and .release-please-manifest.json to track the root package version.
  • Set repository Actions permissions to Read and write, and enable Allow GitHub Actions to create and approve pull requests or provide a RELEASE_PLEASE_TOKEN secret with repo scope.

License

MIT

About

Voice-to-text CLI with dual STT engines, systemd service, and clipboard integration

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages