Clone a voice in 5 seconds to generate arbitrary speech in real-time
-
Updated
Mar 9, 2026 - Python
Clone a voice in 5 seconds to generate arbitrary speech in real-time
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Generate audiobooks from e-books, voice cloning & 1158+ languages!
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
A simple, high-quality voice conversion tool focused on ease of use and performance.
MARS5 speech model (TTS) from CAMB.AI
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
GPT-SoVITS ONNX Inference Engine & Model Converter
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
A Python/Pytorch app for easily synthesising human voices
Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
A webui for different audio related Neural Networks
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
Add a description, image, and links to the voice-cloning topic page so that developers can more easily learn about it.
To associate your repository with the voice-cloning topic, visit your repo's landing page and select "manage topics."