This repository contains a multi-stage pipeline for processing, transcribing, and deduplicating Discord voice session recordings into clean text transcripts. It includes tools for audio capture, filtering, transcription, clustering-based deduplication, and final text output.
The pipeline operates in the following phases:
-
Phase 0 – Discord Audio Capture
Captures user audio streams as individual.wavfiles and generates session logs. -
Phase 1 – Audio Validation and Filtering
Filters audio for silence, duration constraints, and rescues bursty utterances with VAD. -
Phase 2 – Whisper Transcription
Transcribes accepted audio files to text using a CTranslate2-based Whisper model. -
Phase 3 – Deduplication by Clustering
Clusters transcriptions and deduplicates based on similarity, canonical form, and scoring. -
Output – A cleaned
.txttranscript preserving character, flow, and session integrity.
| Script | Purpose |
|---|---|
index.ts |
Captures Discord voice as per-user .wav files |
dedupe_audit.py |
Filters raw audio: silence, noise, duplicates, duration |
burst_scope.py |
Rescues short sharp utterances from false VAD rejection |
transcribe_accepted.py |
Transcribes accepted .wav files into enriched JSONL |
dedupe_transcript.py |
Deduplicates transcribed JSONL using clustering |
- Clone the repo and install required Python and Node.js dependencies.
- Configure
.envwith your Discord bot credentials. - Run each phase in sequence:
index.tsto capture audio.dedupe_audit.pyto filter audio.transcribe_accepted.pyto transcribe.dedupe_transcript.pyto deduplicate.
- Review the final transcript output.
- Built around faster-whisper with standard CTranslate2 binary releases from PyPI.
- Supports both GPU (CUDA) and CPU transcription paths via runtime flags in
transcribe_accepted.py. - Still tested heavily on RTX-class GPUs, but no longer documented as requiring a custom local CTranslate2 build.
pip install -r requirements.txt
(further dependencies may be required)
CTranslate2 is available on PyPI (for example, pip index versions ctranslate2 currently reports 4.7.1 and historical releases).
npm install
(further dependencies may be required)
AI Transparency Statement - Mostly built with the aid of ChatGPT. Author is a sysadmin and project manager with some decades of experience. Author believes he can appropriately supervise the "dev team", nevertheless wishes to be honest and upfront for anyone who worries about such things.
See also Pipeline Document