Audio Transcriber

A Python-based audio transcription tool using Faster Whisper, an optimized implementation of OpenAI's Whisper model. This tool processes audio files with GPU acceleration for fast, accurate transcriptions.

Features

GPU-accelerated transcription using NVIDIA CUDA
Supports multiple audio formats (MP3, WAV, M4A, FLAC, OGG, WMA, AAC)
Automatic organization of audio files and transcripts
Uses Whisper's best model (large-v3) for maximum accuracy
Efficient processing with faster-whisper and CTranslate2

Requirements

Python 3.10-3.13
NVIDIA GPU with CUDA support
CUDA 12.x
cuDNN libraries

Installation

1. Install dependencies

This project uses uv for dependency management:

uv sync

2. Install CUDA libraries

For GPU acceleration, install cuDNN:

uv pip install nvidia-cudnn-cu12

Project Structure

audio-transcriber/
├── .venv/              # Virtual environment (created by uv)
├── audio/              # Input directory - place audio files here
├── transcript/         # Output directory - organized results
├── main.py             # Main transcription script
├── run.sh              # Wrapper script to run with CUDA libraries
├── pyproject.toml      # Project configuration and dependencies
└── .gitignore          # Git ignore rules

Usage

Basic Usage

Place your audio files in the audio/ directory
Run the transcription script:

./run.sh

Important: Always use ./run.sh to run the script. This wrapper sets up the required CUDA library paths before starting the transcription. Running directly with uv run python main.py or python main.py will fail with cuDNN errors.

Output Structure

The script processes each audio file and creates an organized folder structure:

transcript/
├── audio_file_1/
│   ├── audio_file_1.mp3
│   └── audio_file_1_transcript.txt
└── audio_file_2/
    ├── audio_file_2.wav
    └── audio_file_2_transcript.txt

Each audio file gets its own folder named after the original file (without extension), containing:

The original audio file (moved from audio/)
The generated transcript as a text file

Supported Audio Formats

MP3 (.mp3)
WAV (.wav)
M4A (.m4a)
FLAC (.flac)
OGG (.ogg)
WMA (.wma)
AAC (.aac)

Configuration

The script uses the following defaults (can be modified in main.py):

Model: large-v3 (Whisper's most accurate model)
Device: cuda (GPU acceleration)
Compute Type: float16 (balanced speed/accuracy)

Troubleshooting

GPU Not Working

If you see CUDA/cuDNN errors, ensure you have installed the required libraries:

uv pip install nvidia-cudnn-cu12

Verify your GPU is detected:

nvidia-smi

No Audio Files Found

Make sure your audio files are:

Placed in the audio/ directory
In a supported format (see list above)
Not hidden files

Memory Issues

If you encounter out-of-memory errors, the large-v3 model requires significant VRAM. You can modify main.py to use a smaller model:

model = WhisperModel("medium", device="cuda", compute_type="float16")

Available models (in order of size/accuracy):

tiny, base, small, medium, large-v2, large-v3

License

This project uses the following open-source components:

faster-whisper (MIT License)
OpenAI Whisper model (MIT License)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Transcriber

Features

Requirements

Installation

1. Install dependencies

2. Install CUDA libraries

Project Structure

Usage

Basic Usage

Output Structure

Supported Audio Formats

Configuration

Troubleshooting

GPU Not Working

No Audio Files Found

Memory Issues

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

Audio Transcriber

Features

Requirements

Installation

1. Install dependencies

2. Install CUDA libraries

Project Structure

Usage

Basic Usage

Output Structure

Supported Audio Formats

Configuration

Troubleshooting

GPU Not Working

No Audio Files Found

Memory Issues

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages