Skip to main content
FluidAudio is a Swift SDK for fully local, low-latency audio AI on Apple devices. All inference runs on the Apple Neural Engine (ANE), keeping CPU and GPU free for your app.

At a Glance

CapabilityModelSpeedAccuracyLanguages
TranscriptionParakeet TDT 0.6B210x RTFx2.5% WER (en), 14.7% avg (25 lang)25 European
Streaming ASRParakeet EOU 120M12x RTFx4.9% WER (en)English
Speaker DiarizationPyannote CoreML122x RTFx15% DER (offline)Language-agnostic
Streaming DiarizationSortformer127x RTFx31.7% DERLanguage-agnostic
Voice ActivitySilero VAD v61230x RTFx96% accuracyLanguage-agnostic
Text-to-SpeechKokoro 82M23x RTFx48 voicesEnglish
Text-to-SpeechPocketTTS 155MStreaming~80ms first audioEnglish
All benchmarks on M4 Pro. ASR on LibriSpeech / FLEURS, diarization on VoxConverse / AMI, VAD on VOiCES / MUSAN. See full benchmarks for per-language breakdowns and device comparisons.

When to Use Which

Transcription

NeedUseWhy
Transcribe recordings/filesParakeet TDT v3Fastest, 25 languages, 210x real-time
English-only, best accuracyParakeet TDT v22.1% WER vs 2.5% on LibriSpeech
Live captions as user speaksParakeet EOU160ms chunks, end-of-utterance detection
Domain-specific terms (names, jargon)TDT + CTC vocabulary boosting99.3% precision, 85.2% recall on earnings calls

Speaker Diarization

NeedUseWhy
Best accuracy (post-recording)Offline pipeline (VBx)15% DER, full pyannote-compatible pipeline
Real-time “who’s speaking now”Streaming pipeline26% DER at 5s chunks, speaker tracking across chunks
Simple 2-4 speaker meetingsSortformerSingle model, no clustering, 32% DER

Voice Activity Detection

NeedUseWhy
Segment audio before ASROffline segmentationClean segments with min/max duration controls
Real-time speech detectionStreaming VADPer-chunk events with hysteresis

Text-to-Speech

NeedUseWhy
Highest quality, full generationKokoro48 voices, SSML support, flow matching
Streaming audio (start playing fast)PocketTTS~80ms to first audio, no espeak dependency

Platform Support

PlatformPackage
Swift (iOS / macOS)FluidAudio
React Native / Expo@fluidinference/react-native-fluidaudio
Rust / Taurifluidaudio-rs

Showcase

40+ apps use FluidAudio for local speech recognition, speaker diarization, and text-to-speech.
AppDescription
Voice InkLocal AI for instant, private transcription with near-perfect accuracy. Uses Parakeet ASR.
SpokenlyMac dictation app for fast, accurate voice-to-text; supports real-time dictation and file transcription. Uses Parakeet ASR and speaker diarization.
SlipboxPrivacy-first meeting assistant for real-time conversation intelligence. Uses Parakeet ASR (iOS) and speaker diarization across platforms.
TalatPrivacy-focused AI meeting notes app. Featured in TechCrunch. Uses Parakeet ASR.
ParaspeechAI powered voice to text. Fully offline. No subscriptions.
OpenOatsOpen-source meeting note-taker that transcribes conversations in real time and surfaces relevant notes from your knowledge base.
SenkoA very fast and accurate speaker diarization pipeline. A good example for Python integration.
macos-speech-serverOpenAI compatible STT/transcription and TTS/speech API server.
Whisper MateTranscribes movies and audio locally; records and transcribes in real time from speakers or system apps. Uses speaker diarization.
BoltAIWrite content 10x faster using parakeet models.
VoxeoflowMac dictation app with real-time translation. Lightning-fast transcription in over 100 languages.
WhisKeyPrivacy-first voice dictation keyboard for iOS and macOS. On-device transcription with 12+ languages, AI meeting summaries, and mindmap generation.
Summit AI NotesLocal meeting transcription and summarization with speaker identification. Supports 100+ languages.
SnaplyFree, Fast, 100% local AI dictation for Mac.
EnconvoAI Agent Launcher for macOS with voice input, live captions, and text-to-speech.
SpeakmacMac app that lets you type anywhere on your Mac using your voice. Fully local, private dictation built on FluidAudio.
StarlingOpen Source, fully local voice-to-text transcription with auto-paste at your cursor.
Altic/Fluid VoiceLightweight, fully free and Open Source Voice to Text dictation for macOS.
SamScribeOpen-source macOS app that captures and transcribes audio from your microphone and meeting apps in real-time.
Dictate AnywhereNative macOS dictation app with global Fn key activation. Dictate into any app with 25 language support.
HexmacOS app that lets you press-and-hold a hotkey to record your voice, transcribe it, and paste into any application.
Super Voice AssistantOpen-source macOS voice assistant with local transcription.
VoiceTyprOpen-source voice-to-text dictation for macOS and Windows.
OraLocal voice assistant for macOS with speech recognition and text-to-speech.
FlowstayEasy text-to-speech, local post-processing and Claude Code integration for macOS. Free forever.
Meeting TranscribermacOS menu bar app that auto-detects, records, and transcribes meetings with dual-track speaker diarization.
Hitoku DraftA local, private, voice writing assistant on your macOS menu bar.
AuditemacOS menu-bar app that records meetings and transcribes them locally into Markdown notes for Obsidian.
MuesliNative macOS dictation and meeting transcription with ~0.13s latency. Automatic speaker diarization.
NanoVoiceFree iOS voice keyboard for fast, private dictation in any app.
MiniWhisperOpen-source macOS menu bar for quick local voice-to-text with minimal setup.
VolocalFully local voice AI on iOS. Uses streaming Parakeet EOU ASR and streaming PocketTTS.
VivaDictaOpen-source iOS voice-to-text app with system-wide AI voice keyboard. 15+ AI providers, 40+ AI presets.
hongbomiao.comA personal R&D lab that facilitates knowledge sharing.
mac-whisper-speedtestComparison of different local ASR, including one of the first versions of FluidAudio’s ASR models.

Requirements

  • macOS 14+ / iOS 17+
  • Swift 5.10+
  • Apple Silicon recommended

Model Conversion

All FluidAudio models are converted through möbius, our open-source model conversion framework. It handles export, numerical validation, and quantization for CoreML and other edge runtimes. See the möbius docs to convert your own models.