Diaricat

Private, local-only desktop app for audio transcription with speaker diarization and AI-powered summarization.

What is Diaricat?

Diaricat is a Windows desktop application that transcribes audio/video files, identifies who said what (speaker diarization), and generates AI-powered summaries — all running locally on your machine. No data ever leaves your computer.

Diaricat is part of a broader vision for local-first AI systems focused on privacy, autonomy, and offline intelligence.

Key Features

Accurate transcription powered by Faster Whisper (large-v3 model with CUDA acceleration)
Speaker diarization using SpeechBrain ECAPA-TDNN embeddings with custom agglomerative clustering
AI correction & summarization with local LLM inference via llama.cpp, running GGUF models such as Qwen 2.5 7B Instruct (Q4_K_M)
100% offline & private — no API keys, no cloud services, no data upload
Bilingual UI — Spanish and English with one-click toggle
Multiple export formats — TXT, SRT, DOCX, PDF, JSON
Modern dark UI — glassmorphism design built with React + Tailwind CSS

Design Philosophy

Diaricat follows a design language I call Purple Space Glass.

It blends glassmorphism, deep-space aesthetics and soft neon reflections to create interfaces that feel both modern and fluid — almost like interacting with an intelligent system rather than a static tool.

The goal is not just visual appeal, but to make AI systems feel:

responsive
ambient
alive, without being intrusive

This design direction is part of a broader vision where local AI systems are not only powerful and private, but also intuitive and pleasant to use.

Architecture

+---------------------------------------------------+
|                  Desktop Shell                     |
|              (pywebview + .NET/Edge)               |
+---------------------------------------------------+
|                  Frontend (UI)                     |
|        React  TypeScript  Vite  Tailwind           |
|           Radix UI  Lucide  shadcn/ui              |
+---------------------------------------------------+
|                 REST API Layer                     |
|           FastAPI  Uvicorn  Pydantic               |
+--------------+------------+-----------+-----------+
| Transcription|Diarization | LLM Post- |  Export   |
|   Service    |  Service   | process   |  Service  |
|  (Whisper)   |(SpeechBrain)|(llama.cpp)|(DOCX/PDF) |
+--------------+------------+-----------+-----------+
|               Pipeline Orchestrator                |
|        Job queue  Progress  Cancellation           |
+---------------------------------------------------+

Component	Technology
Desktop shell	pywebview 5.x (Edge WebView2)
Frontend	React 18 + TypeScript + Vite + Tailwind CSS
Backend API	FastAPI + Uvicorn
ASR (Speech-to-Text)	Faster Whisper (CTranslate2 backend)
Speaker Diarization	SpeechBrain ECAPA-TDNN + custom clustering
LLM Inference	llama-cpp-python (GGUF models)
Packaging	PyInstaller (onedir mode)

Requirements

System Requirements

	Minimum	Recommended
OS	Windows 10 64-bit	Windows 11
RAM	8 GB	16 GB+
GPU	—	NVIDIA GPU with 6+ GB VRAM
Disk	5 GB (app + models)	10 GB
Runtime	Edge WebView2	Edge WebView2

For Development

Python 3.11+ (tested with 3.14)
Node.js 18+ (for frontend)
NVIDIA CUDA Toolkit 12.x (for GPU acceleration)
Visual Studio Build Tools 2022 (for building llama-cpp-python)

Quick Start (Pre-built)

Download the latest release from Releases
Extract the Diarcat/ folder
(Optional) Place a GGUF model in workspace/models/ for AI summaries
Run Diarcat.exe

Development Setup

# Clone the repository
git clone https://github.com/nia-huck/Diaricat.git
cd Diaricat

# Create Python virtual environment
python -m venv .venv
.venv\Scripts\activate

# Install Python dependencies
pip install -e ".[dev]"

# Install torch with CUDA (optional, for GPU acceleration)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

# Install faster-whisper and speechbrain
pip install faster-whisper speechbrain

# Install llama-cpp-python (requires Visual Studio Build Tools)
pip install llama-cpp-python

# Install frontend dependencies
cd frontend
npm install

# Start frontend dev server
npm run dev

# In another terminal, start the backend
cd ..
python -m diaricat api --host 127.0.0.1 --port 8765

Building the Executable

# Build the frontend
cd frontend
npm run build
cd ..

# Run PyInstaller
python -m PyInstaller packaging/diaricat.spec --distpath dist --workpath build --noconfirm

# Output: dist/Diarcat/Diarcat.exe

The build uses onedir mode for fast startup (~1 second vs minutes for onefile).

LLM Models

Diaricat uses llama.cpp as the local inference engine for transcript correction and summarization, running GGUF-compatible models. Without a configured model, it falls back to rule-based processing.

Default / Recommended model

Qwen 2.5 7B Instruct (Q4_K_M) — configured by default for the local post-processing pipeline

Runtime details

Default model path: models/qwen2.5-7b-instruct-q4_k_m.gguf
Context size: 4096
Inference engine: llama.cpp

Recommended Models

Model	Size	Min RAM	Quality
Qwen 2.5 1.5B (Q4_K_M)	~1 GB	4 GB	Basic
Qwen 2.5 3B (Q4_K_M)	~2 GB	6 GB	Good
Qwen 2.5 7B (Q4_K_M)	~4.7 GB	10 GB	Best

Place the .gguf file in workspace/models/ and configure the path in Settings. Model path and runtime parameters can be updated from the application Settings screen.

Pipeline Stages

Validation — Verify source file exists and is a supported format
Audio normalization — Extract audio, convert to 16kHz mono WAV via FFmpeg
Transcription — Speech-to-text with Faster Whisper (chunked for long audio)
Speaker diarization — Identify speakers using ECAPA-TDNN embeddings + agglomerative clustering
Segment merge — Align ASR segments with speaker turns
Correction (optional) — Fix ASR errors using local LLM
Summarization (optional) — Generate structured summary with key points, decisions, and topics

Project Structure

Diaricat/
├── src/diaricat/           # Python backend
│   ├── api/                # FastAPI routes and middleware
│   ├── core/               # Orchestrator, job queue, alignment
│   ├── services/           # Transcription, diarization, postprocess, export
│   ├── models/             # Pydantic domain and API models
│   ├── utils/              # Device detection, logging, compatibility
│   ├── desktop.py          # pywebview desktop shell
│   ├── main.py             # CLI entry point
│   └── settings.py         # Configuration management
├── frontend/               # React/TypeScript UI
│   └── src/
│       ├── components/     # UI components (screens, ui primitives)
│       ├── context/        # React context (AppContext, I18nContext)
│       ├── lib/            # API client, i18n translations
│       └── types/          # TypeScript type definitions
├── config/                 # Default configuration (YAML)
├── packaging/              # PyInstaller spec and runtime hooks
├── scripts/                # Build scripts
├── tests/                  # Unit tests
├── vendor/                 # Bundled FFmpeg binaries
└── pyproject.toml          # Project metadata and dependencies

Configuration

Settings are stored in config/default.yaml and can be modified through the Settings screen in the app:

Setting	Default	Description
`whisper_model`	`large-v3`	Whisper model size
`whisper_compute_type`	`float16`	Compute precision (float16/int8)
`diarization_profile`	`quality`	Diarization quality (fast/balanced/quality)
`llama_model_path`	`models/qwen2.5-7b-instruct-q4_k_m.gguf`	Path to GGUF model
`llama_n_ctx`	`4096`	LLM context window size
`device_mode`	`auto`	Device selection (auto/cpu/cuda)

Tech Stack

Backend: Python 3.14 · FastAPI · Uvicorn · Pydantic · PyYAML · SpeechBrain · Faster Whisper · CTranslate2 · llama-cpp-python · PyInstaller

Frontend: React 18 · TypeScript · Vite · Tailwind CSS · Radix UI · shadcn/ui · Lucide Icons

AI Models: Whisper large-v3 (ASR) · pyannote/speaker-diarization-3.1 + ECAPA-TDNN embeddings (diarization) · Qwen 2.5 7B Instruct Q4_K_M (correction/summary via llama.cpp)

License

MIT License — see LICENSE file for details.

Built with privacy in mind. Your audio never leaves your machine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diaricat

What is Diaricat?

Key Features

Design Philosophy

Architecture

Requirements

System Requirements

For Development

Quick Start (Pre-built)

Development Setup

Building the Executable

LLM Models

Default / Recommended model

Runtime details

Recommended Models

Pipeline Stages

Project Structure

Configuration

Tech Stack

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
config		config
docs		docs
frontend		frontend
packaging		packaging
scripts		scripts
src/diaricat		src/diaricat
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Diaricat

What is Diaricat?

Key Features

Design Philosophy

Architecture

Requirements

System Requirements

For Development

Quick Start (Pre-built)

Development Setup

Building the Executable

LLM Models

Default / Recommended model

Runtime details

Recommended Models

Pipeline Stages

Project Structure

Configuration

Tech Stack

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages