Offline voice input desktop application optimized for Apple Silicon
- High-Accuracy Speech Recognition: Powered by MLX-Audio with Whisper/Qwen3-ASR models
- Model Switching: Switch between Whisper and Qwen3-ASR models in Settings
- LLM-Based Post-Processing: Optional cleanup of filler words and self-corrections using on-device LLM
- Context-Aware Formatting: Detects active application for context-appropriate output
- Customizable Prompts: Edit post-processing behavior via Advanced Settings
- Always-On Transcription: Continuous listening mode with automatic speech detection (Silero VAD)
- Global Hotkey: System-wide keyboard shortcut for instant recording
- Real-time Transcription: Immediate transcription after recording ends
- Auto Text Insertion: Automatically paste transcription into active applications
- Transcription History: SQLite-backed history with full-text search for always-on mode
- Fully Offline: No internet connection required - all processing happens locally
- OS: macOS 14.0 (Sonoma) or later
- CPU: Apple Silicon (M1/M2/M3/M4)
- Memory: 8GB+ recommended
- Storage: 2GB+ free space (varies by model size)
Qwen3-ASR-0.6B-8bit- Lightweight, faster inference (default)Qwen3-ASR-1.7B-8bit- High accuracy, 52 languages supported
whisper-large-v3-turbo- Balanced performancewhisper-large-v3- Highest accuracywhisper-medium/small/base/tiny- Lightweight models
Download the latest release from the Releases page:
- Download the
.dmgfile - Open the DMG and drag Echo to your Applications folder
- Launch Echo from Applications
- Grant microphone permissions when prompted
The app is self-contained - no additional Python installation required!
- Launch the app
- Press and hold
Cmd+Shift+Spaceto start recording - Speak your message
- Release the key when finished
- Transcription appears automatically
- Text is auto-inserted into the active application (if enabled in Settings)
Echo also supports continuous listening mode, which runs in the background and automatically detects speech:
- Click the "Start Listening" button in the app
- Echo continuously monitors the microphone using Silero VAD (Voice Activity Detection)
- When speech is detected, it automatically records the segment
- After a silence pause (1.5s), the segment is transcribed and saved to history
- Click "Stop Listening" to end the session
Transcription history is stored in a local SQLite database and can be searched via the History panel.
Customize Echo via the Settings panel:
- ASR Model: Choose between Qwen3-ASR or Whisper models
- Hotkey: Customize the recording keyboard shortcut
- Recognition Language: Auto-detect or manually specify language
- Input Device: Select your preferred microphone
- Auto Insert: Enable/disable automatic paste after transcription
- Post-Processing: Enable LLM-based cleanup to remove filler words and self-corrections
- Advanced Settings: Customize the post-processing prompt for specialized use cases
- Node.js 20+
- Rust 1.83+
- Python 3.11 (ARM native) - Required for building the ASR engine
# Install Node.js dependencies
npm install
# Install Rust dependencies (handled automatically by Tauri)
cd src-tauri && cargo build
# Build Python ASR engine binary (required for development)
cd python-engine
./build.sh # Creates venv automatically and builds binarynpm run tauri:dev# Build frontend only
npm run build
# Build full Tauri application
npm run tauri:build
# Rebuild Python engine binary (after engine.py changes)
cd python-engine && ./build.sh- Frontend: React 18, TypeScript, Vite, Tailwind CSS
- Backend: Tauri 2.x, Rust
- Speech Recognition: MLX-Audio, Whisper, Qwen3-ASR (bundled with PyInstaller)
- Post-Processing: MLX LLM (Qwen3-1.7B-4bit) for transcription cleanup
- Platform: macOS 14.0+ on Apple Silicon
Echo uses a multi-process architecture:
- Tauri App (Rust): Main application, hotkey handling, audio capture, VAD, active app detection
- React Frontend: User interface, settings management, transcription history
- Python ASR Engine (Sidecar): Standalone PyInstaller binary running MLX-Audio for speech recognition
- JSON-RPC Communication: Rust backend communicates with Python engine via stdin/stdout
- LLM Post-Processor: Optional on-device cleanup using Qwen3-1.7B-4bit with context awareness
- Continuous Pipeline (Rust): Streaming audio → Silero VAD (ONNX) → segment detection → ASR → SQLite
The ASR engine is lazily loaded - models download on first use and remain cached locally. The post-processor LLM auto-loads on startup if enabled.
Contributions are welcome! Please feel free to submit issues or pull requests.
MIT License - see LICENSE for details