Echo - Voice Input Application

Offline voice input desktop application optimized for Apple Silicon

Features

High-Accuracy Speech Recognition: Powered by MLX-Audio with Whisper/Qwen3-ASR models
Model Switching: Switch between Whisper and Qwen3-ASR models in Settings
LLM-Based Post-Processing: Optional cleanup of filler words and self-corrections using on-device LLM
Context-Aware Formatting: Detects active application for context-appropriate output
Customizable Prompts: Edit post-processing behavior via Advanced Settings
Always-On Transcription: Continuous listening mode with automatic speech detection (Silero VAD)
Global Hotkey: System-wide keyboard shortcut for instant recording
Real-time Transcription: Immediate transcription after recording ends
Auto Text Insertion: Automatically paste transcription into active applications
Transcription History: SQLite-backed history with full-text search for always-on mode
Fully Offline: No internet connection required - all processing happens locally

System Requirements

OS: macOS 14.0 (Sonoma) or later
CPU: Apple Silicon (M1/M2/M3/M4)
Memory: 8GB+ recommended
Storage: 2GB+ free space (varies by model size)

Supported Models

Qwen3-ASR (Recommended)

Qwen3-ASR-0.6B-8bit - Lightweight, faster inference (default)
Qwen3-ASR-1.7B-8bit - High accuracy, 52 languages supported

Whisper (OpenAI)

whisper-large-v3-turbo - Balanced performance
whisper-large-v3 - Highest accuracy
whisper-medium / small / base / tiny - Lightweight models

Installation

Download the latest release from the Releases page:

Download the .dmg file
Open the DMG and drag Echo to your Applications folder
Launch Echo from Applications
Grant microphone permissions when prompted

The app is self-contained - no additional Python installation required!

Usage

Launch the app
Press and hold Cmd+Shift+Space to start recording
Speak your message
Release the key when finished
Transcription appears automatically
Text is auto-inserted into the active application (if enabled in Settings)

Always-On Transcription

Echo also supports continuous listening mode, which runs in the background and automatically detects speech:

Click the "Start Listening" button in the app
Echo continuously monitors the microphone using Silero VAD (Voice Activity Detection)
When speech is detected, it automatically records the segment
After a silence pause (1.5s), the segment is transcribed and saved to history
Click "Stop Listening" to end the session

Transcription history is stored in a local SQLite database and can be searched via the History panel.

Settings

Customize Echo via the Settings panel:

ASR Model: Choose between Qwen3-ASR or Whisper models
Hotkey: Customize the recording keyboard shortcut
Recognition Language: Auto-detect or manually specify language
Input Device: Select your preferred microphone
Auto Insert: Enable/disable automatic paste after transcription
Post-Processing: Enable LLM-based cleanup to remove filler words and self-corrections
Advanced Settings: Customize the post-processing prompt for specialized use cases

Development Setup

Prerequisites

Node.js 20+
Rust 1.83+
Python 3.11 (ARM native) - Required for building the ASR engine

Installation

# Install Node.js dependencies
npm install

# Install Rust dependencies (handled automatically by Tauri)
cd src-tauri && cargo build

# Build Python ASR engine binary (required for development)
cd python-engine
./build.sh  # Creates venv automatically and builds binary

Development Server

npm run tauri:dev

Building

# Build frontend only
npm run build

# Build full Tauri application
npm run tauri:build

# Rebuild Python engine binary (after engine.py changes)
cd python-engine && ./build.sh

Tech Stack

Frontend: React 18, TypeScript, Vite, Tailwind CSS
Backend: Tauri 2.x, Rust
Speech Recognition: MLX-Audio, Whisper, Qwen3-ASR (bundled with PyInstaller)
Post-Processing: MLX LLM (Qwen3-1.7B-4bit) for transcription cleanup
Platform: macOS 14.0+ on Apple Silicon

Architecture

Echo uses a multi-process architecture:

Tauri App (Rust): Main application, hotkey handling, audio capture, VAD, active app detection
React Frontend: User interface, settings management, transcription history
Python ASR Engine (Sidecar): Standalone PyInstaller binary running MLX-Audio for speech recognition
JSON-RPC Communication: Rust backend communicates with Python engine via stdin/stdout
LLM Post-Processor: Optional on-device cleanup using Qwen3-1.7B-4bit with context awareness
Continuous Pipeline (Rust): Streaming audio → Silero VAD (ONNX) → segment detection → ASR → SQLite

The ASR engine is lazily loaded - models download on first use and remain cached locally. The post-processor LLM auto-loads on startup if enabled.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - see LICENSE for details

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
docs		docs
echo-cli		echo-cli
public		public
python-engine		python-engine
scripts		scripts
src-tauri		src-tauri
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
design_doc.md		design_doc.md
float.html		float.html
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Echo - Voice Input Application

Features

System Requirements

Supported Models

Qwen3-ASR (Recommended)

Whisper (OpenAI)

Installation

Usage

Always-On Transcription

Settings

Development Setup

Prerequisites

Installation

Development Server

Building

Tech Stack

Architecture

Contributing

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Echo - Voice Input Application

Features

System Requirements

Supported Models

Qwen3-ASR (Recommended)

Whisper (OpenAI)

Installation

Usage

Always-On Transcription

Settings

Development Setup

Prerequisites

Installation

Development Server

Building

Tech Stack

Architecture

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages