VoxSherpa TTS is listed in the official README of k2-fsa/sherpa-onnx β the core inference library powering this app.
Most TTS apps make you choose between quality and privacy. Cloud-based tools like ElevenLabs sound incredible β but they require internet, send your text to remote servers, and charge per character.
VoxSherpa breaks that tradeoff.
It runs two professional-grade neural engines entirely on your device:
| Engine | Quality | Speed | Best For |
|---|---|---|---|
| π§ Kokoro-82M | Studio-grade Β· rivals ElevenLabs | Slower on budget hardware | Audiobooks, voiceovers, professional content |
| β‘ Piper / VITS | Natural Β· clear | Fast on any device | Daily use, quick synthesis |
- Kokoro-82M β 82 million parameter neural model. Multilingual support including Hindi, English, British English, French, Spanish, Chinese, Japanese and 50+ more languages. Same architecture used by top-tier commercial TTS services.
- Piper / VITS β Fast, lightweight, natural. Generates speech in seconds on any Android device.
- All processing happens on your device
- No internet required after model download
- No account, no telemetry, no data collection
- Your text never leaves your phone
- Download models directly from the app
- Import your own
.onnxmodels from local storage - Multiple models installed simultaneously
- Smart storage tracking
- Real-time waveform visualization
- Adjustable speed and pitch
- Play, pause, and replay generated audio
- Export as WAV with correct sample rate per model
- Save all generated audio locally
- Favorites system for quick access
- View generation history with timestamps
- Voice model attribution per recording
- Smart Punctuation β natural pauses after sentence breaks
- Emotion Tags β
[whisper],[angry],[happy]support - Per-model voice selection (Kokoro supports 100+ speakers)
- Theme-aware UI
User Text
β
ββββ Kokoro Engine (KokoroEngine.java)
β βββ Sherpa-ONNX JNI β ONNX Runtime β CPU/NNAPI
β βββ kokoro-multi-lang-v1_0 (82M params, FP32)
β
ββββ Piper / VITS Engine (VoiceEngine.java)
βββ Sherpa-ONNX JNI β ONNX Runtime β CPU
βββ VITS model (language-specific)
Built with:
- Sherpa-ONNX β on-device neural inference
- Kokoro-82M β multilingual neural TTS model
- Piper β fast local TTS
- Android AudioTrack API β low-latency PCM playback
Generation speed depends entirely on your device's processor:
| Device Tier | Kokoro | Piper |
|---|---|---|
| π’ Flagship (Snapdragon 8 Gen 3) | ~20β40 sec/min audio | ~5 sec/min audio |
| π‘ Mid-range (8-core) | ~60β90 sec/min audio | ~10 sec/min audio |
| π΄ Budget (6-core) | ~2β3 min/min audio | ~20 sec/min audio |
Kokoro prioritizes quality over speed by design. It uses the same 82M parameter architecture that powers premium commercial TTS β running it entirely offline on a mobile CPU is genuinely pushing the hardware limits.
Update: Thanks to the amazing support from this community, the 14-day closed testing is complete, and VoxSherpa TTS is currently under Production Review by Google Play! π
While we wait for the app to go publicly live, you can still get Early Access to the stable V2.5 directly from the Play Store.
What's new in V2.5 (Stable):
- π System-wide TTS engine β use VoxSherpa in any app (Chrome, WhatsApp, etc.)
- π PDF to Audio
- π TXT to Audio
- β¨ Interactive mini-player, smoother UI, and improved audio generation
How to join Early Access:
- Fill out the form below with your Google Play email.
- I will manually add you to the early access list.
- You will receive a direct Play Store link to install the app.
Source code for V2.5 will be pushed to the GitHub Main branch once the production version is officially live on the Play Store.
VoxSherpa supports importing custom .onnx models without any server:
- Place your
.onnxmodel +tokens.txton device storage - Open Models tab β tap + β Import Local Model
- Select your files
Compatible with any Sherpa-ONNX compatible TTS model.
VoxSherpa is open source. Contributions welcome:
- π Bug reports via Issues
- π‘ Feature requests via Discussions
- π§ Pull requests for fixes and improvements
Copyright (C) 2025 CodeBySonu95
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
https://www.gnu.org/licenses/gpl-3.0.html
- k2-fsa/sherpa-onnx β the inference engine that makes this possible
- hexgrad/Kokoro-82M β the neural model behind studio-quality synthesis
- rhasspy/piper β fast local TTS engine
Built with obsession. Runs without internet.
VoxSherpa β Because your voice deserves to stay yours.




