Skip to content

Tromador/Discord-Transcription-Stack

Repository files navigation

Discord Audio Transcript Deduplication Pipeline

This repository contains a multi-stage pipeline for processing, transcribing, and deduplicating Discord voice session recordings into clean text transcripts. It includes tools for audio capture, filtering, transcription, clustering-based deduplication, and final text output.

📚 Overview

The pipeline operates in the following phases:

  1. Phase 0Discord Audio Capture
    Captures user audio streams as individual .wav files and generates session logs.

  2. Phase 1Audio Validation and Filtering
    Filters audio for silence, duration constraints, and rescues bursty utterances with VAD.

  3. Phase 2Whisper Transcription
    Transcribes accepted audio files to text using a CTranslate2-based Whisper model.

  4. Phase 3Deduplication by Clustering
    Clusters transcriptions and deduplicates based on similarity, canonical form, and scoring.

  5. Output – A cleaned .txt transcript preserving character, flow, and session integrity.

🛠 Scripts

Script Purpose
index.ts Captures Discord voice as per-user .wav files
dedupe_audit.py Filters raw audio: silence, noise, duplicates, duration
burst_scope.py Rescues short sharp utterances from false VAD rejection
transcribe_accepted.py Transcribes accepted .wav files into enriched JSONL
dedupe_transcript.py Deduplicates transcribed JSONL using clustering

🚀 Quick Start

  1. Clone the repo and install required Python and Node.js dependencies.
  2. Configure .env with your Discord bot credentials.
  3. Run each phase in sequence:
    • index.ts to capture audio.
    • dedupe_audit.py to filter audio.
    • transcribe_accepted.py to transcribe.
    • dedupe_transcript.py to deduplicate.
  4. Review the final transcript output.

⚡ Key Notes

  • Built around faster-whisper with standard CTranslate2 binary releases from PyPI.
  • Supports both GPU (CUDA) and CPU transcription paths via runtime flags in transcribe_accepted.py.
  • Still tested heavily on RTX-class GPUs, but no longer documented as requiring a custom local CTranslate2 build.

📦 Installation

Python Dependencies

pip install -r requirements.txt

(further dependencies may be required)

CTranslate2 is available on PyPI (for example, pip index versions ctranslate2 currently reports 4.7.1 and historical releases).

Node.js Dependencies

npm install 

(further dependencies may be required)


AI Transparency Statement - Mostly built with the aid of ChatGPT. Author is a sysadmin and project manager with some decades of experience. Author believes he can appropriately supervise the "dev team", nevertheless wishes to be honest and upfront for anyone who worries about such things.


See also Pipeline Document


GPU Accelerated Backend: CTranslate2 Transcription: Whisper Voice Activity Detection Clustering Burst Rescue Audio Format Session Logs Python Node.js BSD 3-Clause


About

A streamlined stack for Discord audio capture, transcription, and text deduplication.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors