Build software better, together

EmulationAI / awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

music-information-retrieval automatic-speech-recognition speech-to-text audio-processing music-ai music-processing large-language-models foundational-models speech-ai audio-ai large-audio-models speech-llms large-language-model-speech

Updated Oct 16, 2025

rapidaai / voice-ai

Star

Rapida is an open-source, end-to-end voice AI orchestration platform for building real-time conversational voice agents with audio streaming, STT, TTS, VAD, multi-channel integration, agent state management, and observability.

Updated Mar 23, 2026
Go

chentuochao / Sound_Bubble

Star

Project for speech bubble

speech speech-separation speech-ai

Updated Aug 15, 2025
Python

egorsmkv / ukrainian-tts-datasets

Star

🇺🇦 Open Source Ukrainian Text-to-Speech datasets

text-to-speech tts ukrainian speech-ai

Updated Feb 24, 2025
Python

dwain-barnes / kyutai-tts-openai-api

Star

A Docker-based OpenAI-compatible Text-to-Speech API server powered by Kyutai's TTS models with GPU acceleration support.

docker text-to-speech tts tts-engines fastapi voice-generation openai-api speech-ai openai-compatible kyutai

Updated Jul 12, 2025
Python

sensein / riverst

Star

Just a simple multimodal avatar interaction platform

avatar embodied-agent speech-ai agentic-ai speech-agent

Updated Mar 11, 2026
JavaScript

co-dev0909 / call-with-ai-agent

Star

A voice-based AI chat interface built with Next.js and ElevenLabs. Start and stop real-time conversations with an animated UI that reflects agent status. Fully responsive and deployable via Vercel with environment-based agent configuration.

typescript nextjs web-application voice-chat realtimechat tailwind-css voice-interaction ai-voice animated-ui vercel conversation-agent speech-ai elevenlabs ai-assist microphone-access client-side-agent

Updated Jul 14, 2025
TypeScript

egorsmkv / radtts-uk

Star

🇺🇦 Ukrainian RAD-TTS++ models (decoder + models with 3 voices) and HiFiGAN model

text-to-speech tts speech-synthesis ukrainian vocoder conversational-ai hifigan speech-ai

Updated Feb 27, 2025

aimaster-dev / call-with-ai-agent

Star

A voice-based AI chat interface built with Next.js and ElevenLabs. Start and stop real-time conversations with an animated UI that reflects agent status. Fully responsive and deployable via Vercel with environment-based agent configuration.

typescript nextjs web-application voice-chat real-time-chat tailwind-css voice-interaction ai-voice animated-ui vercel conversation-agent ai-assistant speech-ai elevenlabs microphone-access client-side-agent

Updated Jun 24, 2025
TypeScript

superU-ai / voice-agent-QA

Star

A unified benchmarking framework for evaluating Voice AI agents across conversational quality, audio realism, latency metrics, and safety guardrails with scalable multi-language stress testing.

benchmarking text-to-speech webrtc stress-testing speech-recognition ai-safety conversational-ai vapi voice-ai latency-testing retell livekit speech-ai ai-testing llm-evaluation real-time-ai asr-tts qa-framework voice-ai-benchmarking

Updated Feb 26, 2026
Python

fasuizu-br / speech-ai-examples

Star

Production-ready examples for Brainiall Speech AI APIs — Pronunciation Assessment, STT, TTS. Python, JavaScript, curl, and MCP configs.

text-to-speech mcp language-learning pronunciation speech-to-text ai-agents api-examples speech-ai

Updated Mar 10, 2026
Python

SunPCSolutions / DiarASR

Star

Enterprise-Grade Secure ASR Diarization Pipeline - HIPAA-compliant speech processing service combining automatic speech recognition with speaker diarization. Features modular architecture, comprehensive security, and production-ready deployment.

machine-learning speech-recognition transcription speech-processing audio-processing asr speaker-diarization fastapi speech-ai

Updated Mar 23, 2026
Python

MehraDevesh2022 / text-to-speech

Star

text-to-speech tts azure-cognitive-services google-tts azure-tts speech-ai

Updated Mar 22, 2025
JavaScript

fasuizu-br / brainiall-mcp-server

Star

MCP Server for Brainiall Speech AI - pronunciation assessment, speech-to-text, and text-to-speech

text-to-speech mcp pronunciation tts speech-to-text stt speech-ai mcp-server fastmcp brainiall

Updated Mar 11, 2026
Python

khuship57 / SpeakerAware-Transcription

Star

End-to-end speaker diarization and transcription pipeline using Whisper, VAD, and clustering in Python.

python speech-to-text whisper asr speaker-diarization voice-activity-detection streamlit speech-ai

Updated Feb 7, 2026
Jupyter Notebook

Iamsdt / audiocall

Star

Open source AI voice calling agent for Twilio phone calls, built with FastAPI, Google ADK, and Gemini Live API

python twilio conversational-ai fastapi voice-ai realtime-voice llm speech-ai llm-agent ai-phone-agent google-adk gemini-live-api outbound-calling inbound-calling phone-agent twilio-media-streams

Updated Mar 24, 2026
Python

liu-dongfang / clinical-interview-voice-agent

Star

Voice agent prototype for structured clinical interviewing, with VAD-based interruption handling, modular ASR/LLM/TTS backends, and dialogue workflow control.

python tts vad asr dialogue-system ai-product clinical-ai voice-agent llm speech-ai realtime-ai structured-interview interruptible-ui

Updated Mar 14, 2026
Python

pro6692abou / llm-audio

Star

Provide Whisper-based audio transcription and translation with lightweight C++ libraries for easy integration into LLM projects.

text music-information-retrieval neural-networks speech-to-text text-to-image music-ai large-language-models foundational-models speech-ai vision-language-model audio-language large-vision-language-models large-audio-models speech-llms audio-understanding

Updated Mar 24, 2026
C++

Voxray-AI / Voxray

Star

Open-source real-time Voice AI infrastructure in Go. Stream audio via WebRTC or WebSocket, connect STT → LLM → TTS pipelines, and build scalable voice agents and conversational AI applications.

text-to-speech real-time websocket webrtc audio-streaming speech-recognition sdp speech-to-text turn-server conversational-ai real-time-audio voice-ai voice-agent llm generative-ai speech-ai voice-agents rtvi

Updated Mar 20, 2026
Go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speech-ai

Here are 19 public repositories matching this topic...

EmulationAI / awesome-large-audio-models

rapidaai / voice-ai

chentuochao / Sound_Bubble

egorsmkv / ukrainian-tts-datasets

dwain-barnes / kyutai-tts-openai-api

sensein / riverst

co-dev0909 / call-with-ai-agent

egorsmkv / radtts-uk

aimaster-dev / call-with-ai-agent

superU-ai / voice-agent-QA

fasuizu-br / speech-ai-examples

SunPCSolutions / DiarASR

MehraDevesh2022 / text-to-speech

fasuizu-br / brainiall-mcp-server

khuship57 / SpeakerAware-Transcription

Iamsdt / audiocall

liu-dongfang / clinical-interview-voice-agent

pro6692abou / llm-audio

Voxray-AI / Voxray

Improve this page

Add this topic to your repo