Voice-to-Text macOS App

A macOS application that captures microphone input, transcribes speech using a local Whisper model (whisper.cpp), and automatically types the transcribed text into the currently focused text input field.

Features

Hotkey Control: Press Right Command to toggle recording (Start/Stop)
Audio Feedback: Distinct tones when recording starts/stops
Local Transcription: Uses pywhispercpp with configurable Whisper models for offline speech-to-text
Automatic Text Injection: Types transcribed text directly into any focused text field using AppleScript (macOS native)
Privacy-First: All processing happens locally on your Mac
Startup Info: Shows selected audio input device and model on launch

Installation

Prerequisites

Python 3.12+
uv package manager (recommended)
macOS (for text injection and optimizations)

Setup

Clone the repository
```
git clone <repo-url>
cd v2t
```
Install dependencies
```
uv sync
```
Model Setup (Important) Due to network restrictions in some environments, the model might not download automatically. You need a GGML-converted Whisper model.
- Option A (Automatic): The app attempts to download the model on first run.
- Option B (Manual): If automatic download fails:
  1. Download or convert a Whisper model to GGML format (e.g., ggml-small.en.bin).
  2. Place it at: models/whisper-cpp/ggml-model.bin

Run the App

# Using the launcher
./start.sh

# OR directly with uv (from the project directory)
uv run python ./main.py

Permissions

The app requires the following macOS permissions:

Microphone Access: Allow when prompted.
Accessibility Access: Required for input monitoring (hotkeys) and text injection.
- Go to System Settings > Privacy & Security > Accessibility
- Add/Enable your Terminal app (e.g., iTerm, Terminal, VS Code)
Input Monitoring: Required for global hotkey listening.
Automation (System Events): Required for AppleScript text injection.

On startup, the app now performs a best-effort permission preflight and requests missing access where macOS allows prompting.
If permission was previously denied, macOS may not show the prompt again; grant it manually in System Settings.

Configuration

You can configure the Whisper model using the V2T_MODEL environment variable:

# Use a different model size
V2T_MODEL=tiny.en ./start.sh
V2T_MODEL=medium.en ./start.sh
V2T_MODEL=large-v3-turbo ./start.sh

# Or use a custom model path
V2T_MODEL=/path/to/your/model.bin ./start.sh

# With uv run directly
V2T_MODEL=large-v3 uv run python ./main.py

# Export for the session
export V2T_MODEL=medium.en
./start.sh

Available models:

Model	Size	Speed	Best for
`tiny.en`	39M	Fastest	Quick drafts, low latency
`base.en`	74M	Fast	Good balance for English
`small.en`	244M	Moderate	Default - accurate English
`medium.en`	769M	Slow	High accuracy English
`large-v3`	1.5G	Slowest	Multilingual, accents
`large-v3-turbo`	1.5G	Slow	Faster large model

The .en models are English-only but faster and more accurate for English speech.

Sound Type

You can configure the audio feedback sounds using the V2T_SOUND environment variable:

# Use bloop sound effects (default)
./start.sh

# Use warm bloop tones with rich harmonics
V2T_SOUND=warm ./start.sh

# Use simple sine wave tones (880Hz/440Hz)
V2T_SOUND=simple ./start.sh

# Use short click sounds
V2T_SOUND=click ./start.sh

Value	Description
`bloop`	Bloop sound effects from wav files (default)
`warm`	Warm bloop tones with rich harmonics
`simple`	Simple sine wave tones
`click`	Short click sounds

Usage

Launch the app.
Press Right Command once to start recording.
Speak your text.
Press Right Command again to stop.
Wait a moment for transcription; the text will appear in your active window.

Technical Details

Language: Python 3.12
Transcription: pywhispercpp (Bindings for whisper.cpp)
Model: small.en (GGML format)
Audio: sounddevice + numpy
Input/Output: pynput (monitoring), AppleScript (injection)

License

[Add License Here]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
sounds		sounds
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.py		config.py
gui_overlay.py		gui_overlay.py
injector.py		injector.py
main.py		main.py
permissions.py		permissions.py
pyproject.toml		pyproject.toml
recorder.py		recorder.py
start.sh		start.sh
transcriber.py		transcriber.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-to-Text macOS App

Features

Installation

Prerequisites

Setup

Permissions

Configuration

Sound Type

Usage

Technical Details

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Text macOS App

Features

Installation

Prerequisites

Setup

Permissions

Configuration

Sound Type

Usage

Technical Details

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages