A PyTorch-based Speech Toolkit
-
Updated
Mar 26, 2026 - Python
A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Reading list for research topics in multimodal machine learning
Foundation Architecture for (M)LLMs
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WaveNet vocoder
AI powered speech denoising and enhancement
Controllable and fast Text-to-Speech for over 7000 languages!
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Become a cracked AI/ML Research Engineer
General Speech Restoration
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
SincNet is a neural architecture for efficiently processing raw audio samples.
Open source audio annotation tool for humans
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Add a description, image, and links to the speech-processing topic page so that developers can more easily learn about it.
To associate your repository with the speech-processing topic, visit your repo's landing page and select "manage topics."