Moshi: open-source speech-text foundation model for real-time full-duplex voice dialogue. Uses Mimi neural audio codec. PyTorch, MLX (Apple Silicon) and Rust backends. Moshika & Moshiko voices.
rust pytorch moshi speech-to-text mimi mlx voice-assistant streaming-audio full-duplex huggingface voice-ai apple-silicon foundation-models text-to-text neural-codec real-time-diagnosis
-
Updated
Jan 31, 2026 - Python