Skip to content

nia-huck/Diaricat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diaricat

Diaricat

Private, local-only desktop app for audio transcription with speaker diarization and AI-powered summarization.

Version Platform Python License


What is Diaricat?

Diaricat is a Windows desktop application that transcribes audio/video files, identifies who said what (speaker diarization), and generates AI-powered summaries — all running locally on your machine. No data ever leaves your computer.

Diaricat is part of a broader vision for local-first AI systems focused on privacy, autonomy, and offline intelligence.

Key Features

  • Accurate transcription powered by Faster Whisper (large-v3 model with CUDA acceleration)
  • Speaker diarization using SpeechBrain ECAPA-TDNN embeddings with custom agglomerative clustering
  • AI correction & summarization with local LLM inference via llama.cpp, running GGUF models such as Qwen 2.5 7B Instruct (Q4_K_M)
  • 100% offline & private — no API keys, no cloud services, no data upload
  • Bilingual UI — Spanish and English with one-click toggle
  • Multiple export formats — TXT, SRT, DOCX, PDF, JSON
  • Modern dark UI — glassmorphism design built with React + Tailwind CSS

Design Philosophy

Diaricat follows a design language I call Purple Space Glass.

It blends glassmorphism, deep-space aesthetics and soft neon reflections to create interfaces that feel both modern and fluid — almost like interacting with an intelligent system rather than a static tool.

The goal is not just visual appeal, but to make AI systems feel:

  • responsive
  • ambient
  • alive, without being intrusive

This design direction is part of a broader vision where local AI systems are not only powerful and private, but also intuitive and pleasant to use.


Architecture

+---------------------------------------------------+
|                  Desktop Shell                     |
|              (pywebview + .NET/Edge)               |
+---------------------------------------------------+
|                  Frontend (UI)                     |
|        React  TypeScript  Vite  Tailwind           |
|           Radix UI  Lucide  shadcn/ui              |
+---------------------------------------------------+
|                 REST API Layer                     |
|           FastAPI  Uvicorn  Pydantic               |
+--------------+------------+-----------+-----------+
| Transcription|Diarization | LLM Post- |  Export   |
|   Service    |  Service   | process   |  Service  |
|  (Whisper)   |(SpeechBrain)|(llama.cpp)|(DOCX/PDF) |
+--------------+------------+-----------+-----------+
|               Pipeline Orchestrator                |
|        Job queue  Progress  Cancellation           |
+---------------------------------------------------+
Component Technology
Desktop shell pywebview 5.x (Edge WebView2)
Frontend React 18 + TypeScript + Vite + Tailwind CSS
Backend API FastAPI + Uvicorn
ASR (Speech-to-Text) Faster Whisper (CTranslate2 backend)
Speaker Diarization SpeechBrain ECAPA-TDNN + custom clustering
LLM Inference llama-cpp-python (GGUF models)
Packaging PyInstaller (onedir mode)

Requirements

System Requirements

Minimum Recommended
OS Windows 10 64-bit Windows 11
RAM 8 GB 16 GB+
GPU NVIDIA GPU with 6+ GB VRAM
Disk 5 GB (app + models) 10 GB
Runtime Edge WebView2 Edge WebView2

For Development

  • Python 3.11+ (tested with 3.14)
  • Node.js 18+ (for frontend)
  • NVIDIA CUDA Toolkit 12.x (for GPU acceleration)
  • Visual Studio Build Tools 2022 (for building llama-cpp-python)

Quick Start (Pre-built)

  1. Download the latest release from Releases
  2. Extract the Diarcat/ folder
  3. (Optional) Place a GGUF model in workspace/models/ for AI summaries
  4. Run Diarcat.exe

Development Setup

# Clone the repository
git clone https://github.com/nia-huck/Diaricat.git
cd Diaricat

# Create Python virtual environment
python -m venv .venv
.venv\Scripts\activate

# Install Python dependencies
pip install -e ".[dev]"

# Install torch with CUDA (optional, for GPU acceleration)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128

# Install faster-whisper and speechbrain
pip install faster-whisper speechbrain

# Install llama-cpp-python (requires Visual Studio Build Tools)
pip install llama-cpp-python

# Install frontend dependencies
cd frontend
npm install

# Start frontend dev server
npm run dev

# In another terminal, start the backend
cd ..
python -m diaricat api --host 127.0.0.1 --port 8765

Building the Executable

# Build the frontend
cd frontend
npm run build
cd ..

# Run PyInstaller
python -m PyInstaller packaging/diaricat.spec --distpath dist --workpath build --noconfirm

# Output: dist/Diarcat/Diarcat.exe

The build uses onedir mode for fast startup (~1 second vs minutes for onefile).


LLM Models

Diaricat uses llama.cpp as the local inference engine for transcript correction and summarization, running GGUF-compatible models. Without a configured model, it falls back to rule-based processing.

Default / Recommended model

  • Qwen 2.5 7B Instruct (Q4_K_M) — configured by default for the local post-processing pipeline

Runtime details

  • Default model path: models/qwen2.5-7b-instruct-q4_k_m.gguf
  • Context size: 4096
  • Inference engine: llama.cpp

Recommended Models

Model Size Min RAM Quality
Qwen 2.5 1.5B (Q4_K_M) ~1 GB 4 GB Basic
Qwen 2.5 3B (Q4_K_M) ~2 GB 6 GB Good
Qwen 2.5 7B (Q4_K_M) ~4.7 GB 10 GB Best

Place the .gguf file in workspace/models/ and configure the path in Settings. Model path and runtime parameters can be updated from the application Settings screen.


Pipeline Stages

  1. Validation — Verify source file exists and is a supported format
  2. Audio normalization — Extract audio, convert to 16kHz mono WAV via FFmpeg
  3. Transcription — Speech-to-text with Faster Whisper (chunked for long audio)
  4. Speaker diarization — Identify speakers using ECAPA-TDNN embeddings + agglomerative clustering
  5. Segment merge — Align ASR segments with speaker turns
  6. Correction (optional) — Fix ASR errors using local LLM
  7. Summarization (optional) — Generate structured summary with key points, decisions, and topics

Project Structure

Diaricat/
├── src/diaricat/           # Python backend
│   ├── api/                # FastAPI routes and middleware
│   ├── core/               # Orchestrator, job queue, alignment
│   ├── services/           # Transcription, diarization, postprocess, export
│   ├── models/             # Pydantic domain and API models
│   ├── utils/              # Device detection, logging, compatibility
│   ├── desktop.py          # pywebview desktop shell
│   ├── main.py             # CLI entry point
│   └── settings.py         # Configuration management
├── frontend/               # React/TypeScript UI
│   └── src/
│       ├── components/     # UI components (screens, ui primitives)
│       ├── context/        # React context (AppContext, I18nContext)
│       ├── lib/            # API client, i18n translations
│       └── types/          # TypeScript type definitions
├── config/                 # Default configuration (YAML)
├── packaging/              # PyInstaller spec and runtime hooks
├── scripts/                # Build scripts
├── tests/                  # Unit tests
├── vendor/                 # Bundled FFmpeg binaries
└── pyproject.toml          # Project metadata and dependencies

Configuration

Settings are stored in config/default.yaml and can be modified through the Settings screen in the app:

Setting Default Description
whisper_model large-v3 Whisper model size
whisper_compute_type float16 Compute precision (float16/int8)
diarization_profile quality Diarization quality (fast/balanced/quality)
llama_model_path models/qwen2.5-7b-instruct-q4_k_m.gguf Path to GGUF model
llama_n_ctx 4096 LLM context window size
device_mode auto Device selection (auto/cpu/cuda)

Tech Stack

Backend: Python 3.14 · FastAPI · Uvicorn · Pydantic · PyYAML · SpeechBrain · Faster Whisper · CTranslate2 · llama-cpp-python · PyInstaller

Frontend: React 18 · TypeScript · Vite · Tailwind CSS · Radix UI · shadcn/ui · Lucide Icons

AI Models: Whisper large-v3 (ASR) · pyannote/speaker-diarization-3.1 + ECAPA-TDNN embeddings (diarization) · Qwen 2.5 7B Instruct Q4_K_M (correction/summary via llama.cpp)


License

MIT License — see LICENSE file for details.


Built with privacy in mind. Your audio never leaves your machine.

About

Private, local-only desktop app for audio transcription with speaker diarization and AI-powered summarization. Windows. Python + React.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors