Introduction

FluidAudio is a Swift SDK for fully local, low-latency audio AI on Apple devices. All inference runs on the Apple Neural Engine (ANE), keeping CPU and GPU free for your app.

At a Glance

Capability	Model	Speed	Accuracy	Languages
Transcription	Parakeet TDT 0.6B	210x RTFx	2.5% WER (en), 14.7% avg (25 lang)	25 European
Streaming ASR	Parakeet EOU 120M	12x RTFx	4.9% WER (en)	English
Speaker Diarization	Pyannote CoreML	122x RTFx	15% DER (offline)	Language-agnostic
Streaming Diarization	Sortformer	127x RTFx	31.7% DER	Language-agnostic
Voice Activity	Silero VAD v6	1230x RTFx	96% accuracy	Language-agnostic
Text-to-Speech	Kokoro 82M	23x RTFx	48 voices	English
Text-to-Speech	PocketTTS 155M	Streaming	~80ms first audio	English

All benchmarks on M4 Pro. ASR on LibriSpeech / FLEURS, diarization on VoxConverse / AMI, VAD on VOiCES / MUSAN. See full benchmarks for per-language breakdowns and device comparisons.

When to Use Which

Transcription

Need	Use	Why
Transcribe recordings/files	Parakeet TDT v3	Fastest, 25 languages, 210x real-time
English-only, best accuracy	Parakeet TDT v2	2.1% WER vs 2.5% on LibriSpeech
Live captions as user speaks	Parakeet EOU	160ms chunks, end-of-utterance detection
Domain-specific terms (names, jargon)	TDT + CTC vocabulary boosting	99.3% precision, 85.2% recall on earnings calls

Speaker Diarization

Need	Use	Why
Best accuracy (post-recording)	Offline pipeline (VBx)	15% DER, full pyannote-compatible pipeline
Real-time “who’s speaking now”	Streaming pipeline	26% DER at 5s chunks, speaker tracking across chunks
Simple 2-4 speaker meetings	Sortformer	Single model, no clustering, 32% DER

Voice Activity Detection

Need	Use	Why
Segment audio before ASR	Offline segmentation	Clean segments with min/max duration controls
Real-time speech detection	Streaming VAD	Per-chunk events with hysteresis

Text-to-Speech

Need	Use	Why
Highest quality, full generation	Kokoro	48 voices, SSML support, flow matching
Streaming audio (start playing fast)	PocketTTS	~80ms to first audio, no espeak dependency

Platform Support

Platform	Package
Swift (iOS / macOS)	FluidAudio
React Native / Expo	@fluidinference/react-native-fluidaudio
Rust / Tauri	fluidaudio-rs

Showcase

40+ apps use FluidAudio for local speech recognition, speaker diarization, and text-to-speech.

App	Description
Voice Ink	Local AI for instant, private transcription with near-perfect accuracy. Uses Parakeet ASR.
Spokenly	Mac dictation app for fast, accurate voice-to-text; supports real-time dictation and file transcription. Uses Parakeet ASR and speaker diarization.
Slipbox	Privacy-first meeting assistant for real-time conversation intelligence. Uses Parakeet ASR (iOS) and speaker diarization across platforms.
Talat	Privacy-focused AI meeting notes app. Featured in TechCrunch. Uses Parakeet ASR.
Paraspeech	AI powered voice to text. Fully offline. No subscriptions.
OpenOats	Open-source meeting note-taker that transcribes conversations in real time and surfaces relevant notes from your knowledge base.
Senko	A very fast and accurate speaker diarization pipeline. A good example for Python integration.
macos-speech-server	OpenAI compatible STT/transcription and TTS/speech API server.
Whisper Mate	Transcribes movies and audio locally; records and transcribes in real time from speakers or system apps. Uses speaker diarization.
BoltAI	Write content 10x faster using parakeet models.
Voxeoflow	Mac dictation app with real-time translation. Lightning-fast transcription in over 100 languages.
WhisKey	Privacy-first voice dictation keyboard for iOS and macOS. On-device transcription with 12+ languages, AI meeting summaries, and mindmap generation.
Summit AI Notes	Local meeting transcription and summarization with speaker identification. Supports 100+ languages.
Snaply	Free, Fast, 100% local AI dictation for Mac.
Enconvo	AI Agent Launcher for macOS with voice input, live captions, and text-to-speech.
Speakmac	Mac app that lets you type anywhere on your Mac using your voice. Fully local, private dictation built on FluidAudio.
Starling	Open Source, fully local voice-to-text transcription with auto-paste at your cursor.
Altic/Fluid Voice	Lightweight, fully free and Open Source Voice to Text dictation for macOS.
SamScribe	Open-source macOS app that captures and transcribes audio from your microphone and meeting apps in real-time.
Dictate Anywhere	Native macOS dictation app with global Fn key activation. Dictate into any app with 25 language support.
Hex	macOS app that lets you press-and-hold a hotkey to record your voice, transcribe it, and paste into any application.
Super Voice Assistant	Open-source macOS voice assistant with local transcription.
VoiceTypr	Open-source voice-to-text dictation for macOS and Windows.
Ora	Local voice assistant for macOS with speech recognition and text-to-speech.
Flowstay	Easy text-to-speech, local post-processing and Claude Code integration for macOS. Free forever.
Meeting Transcriber	macOS menu bar app that auto-detects, records, and transcribes meetings with dual-track speaker diarization.
Hitoku Draft	A local, private, voice writing assistant on your macOS menu bar.
Audite	macOS menu-bar app that records meetings and transcribes them locally into Markdown notes for Obsidian.
Muesli	Native macOS dictation and meeting transcription with ~0.13s latency. Automatic speaker diarization.
NanoVoice	Free iOS voice keyboard for fast, private dictation in any app.
MiniWhisper	Open-source macOS menu bar for quick local voice-to-text with minimal setup.
Volocal	Fully local voice AI on iOS. Uses streaming Parakeet EOU ASR and streaming PocketTTS.
VivaDicta	Open-source iOS voice-to-text app with system-wide AI voice keyboard. 15+ AI providers, 40+ AI presets.
hongbomiao.com	A personal R&D lab that facilitates knowledge sharing.
mac-whisper-speedtest	Comparison of different local ASR, including one of the first versions of FluidAudio’s ASR models.

Requirements

macOS 14+ / iOS 17+
Swift 5.10+
Apple Silicon recommended

Model Conversion

All FluidAudio models are converted through möbius, our open-source model conversion framework. It handles export, numerical validation, and quantization for CoreML and other edge runtimes. See the möbius docs to convert your own models.

Getting Started

Speech Recognition (ASR)

Speaker Diarization

Voice Activity Detection

Text-to-Speech (TTS)

Guides

Reference

At a Glance

When to Use Which

Transcription

Speaker Diarization

Voice Activity Detection

Text-to-Speech

Platform Support

Showcase

Requirements

Model Conversion

Getting Started

Speech Recognition (ASR)

Speaker Diarization

Voice Activity Detection

Text-to-Speech (TTS)

Guides

Reference

​At a Glance

​When to Use Which

​Transcription

​Speaker Diarization

​Voice Activity Detection

​Text-to-Speech

​Platform Support

​Showcase

​Requirements

​Model Conversion

At a Glance

When to Use Which

Transcription

Speaker Diarization

Voice Activity Detection

Text-to-Speech

Platform Support

Showcase

Requirements

Model Conversion