Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
-
Updated
Oct 16, 2025
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Rapida is an open-source, end-to-end voice AI orchestration platform for building real-time conversational voice agents with audio streaming, STT, TTS, VAD, multi-channel integration, agent state management, and observability.
🇺🇦 Open Source Ukrainian Text-to-Speech datasets
A Docker-based OpenAI-compatible Text-to-Speech API server powered by Kyutai's TTS models with GPU acceleration support.
Just a simple multimodal avatar interaction platform
A voice-based AI chat interface built with Next.js and ElevenLabs. Start and stop real-time conversations with an animated UI that reflects agent status. Fully responsive and deployable via Vercel with environment-based agent configuration.
🇺🇦 Ukrainian RAD-TTS++ models (decoder + models with 3 voices) and HiFiGAN model
A voice-based AI chat interface built with Next.js and ElevenLabs. Start and stop real-time conversations with an animated UI that reflects agent status. Fully responsive and deployable via Vercel with environment-based agent configuration.
A unified benchmarking framework for evaluating Voice AI agents across conversational quality, audio realism, latency metrics, and safety guardrails with scalable multi-language stress testing.
Production-ready examples for Brainiall Speech AI APIs — Pronunciation Assessment, STT, TTS. Python, JavaScript, curl, and MCP configs.
Enterprise-Grade Secure ASR Diarization Pipeline - HIPAA-compliant speech processing service combining automatic speech recognition with speaker diarization. Features modular architecture, comprehensive security, and production-ready deployment.
MCP Server for Brainiall Speech AI - pronunciation assessment, speech-to-text, and text-to-speech
End-to-end speaker diarization and transcription pipeline using Whisper, VAD, and clustering in Python.
Open source AI voice calling agent for Twilio phone calls, built with FastAPI, Google ADK, and Gemini Live API
Voice agent prototype for structured clinical interviewing, with VAD-based interruption handling, modular ASR/LLM/TTS backends, and dialogue workflow control.
Provide Whisper-based audio transcription and translation with lightweight C++ libraries for easy integration into LLM projects.
Open-source real-time Voice AI infrastructure in Go. Stream audio via WebRTC or WebSocket, connect STT → LLM → TTS pipelines, and build scalable voice agents and conversational AI applications.
Add a description, image, and links to the speech-ai topic page so that developers can more easily learn about it.
To associate your repository with the speech-ai topic, visit your repo's landing page and select "manage topics."