A Python-based audio transcription tool using Faster Whisper, an optimized implementation of OpenAI's Whisper model. This tool processes audio files with GPU acceleration for fast, accurate transcriptions.
- GPU-accelerated transcription using NVIDIA CUDA
- Supports multiple audio formats (MP3, WAV, M4A, FLAC, OGG, WMA, AAC)
- Automatic organization of audio files and transcripts
- Uses Whisper's best model (large-v3) for maximum accuracy
- Efficient processing with faster-whisper and CTranslate2
- Python 3.10-3.13
- NVIDIA GPU with CUDA support
- CUDA 12.x
- cuDNN libraries
This project uses uv for dependency management:
uv syncFor GPU acceleration, install cuDNN:
uv pip install nvidia-cudnn-cu12audio-transcriber/
├── .venv/ # Virtual environment (created by uv)
├── audio/ # Input directory - place audio files here
├── transcript/ # Output directory - organized results
├── main.py # Main transcription script
├── run.sh # Wrapper script to run with CUDA libraries
├── pyproject.toml # Project configuration and dependencies
└── .gitignore # Git ignore rules
-
Place your audio files in the
audio/directory -
Run the transcription script:
./run.shImportant: Always use ./run.sh to run the script. This wrapper sets up the required CUDA library paths before starting the transcription. Running directly with uv run python main.py or python main.py will fail with cuDNN errors.
The script processes each audio file and creates an organized folder structure:
transcript/
├── audio_file_1/
│ ├── audio_file_1.mp3
│ └── audio_file_1_transcript.txt
└── audio_file_2/
├── audio_file_2.wav
└── audio_file_2_transcript.txt
Each audio file gets its own folder named after the original file (without extension), containing:
- The original audio file (moved from
audio/) - The generated transcript as a text file
- MP3 (.mp3)
- WAV (.wav)
- M4A (.m4a)
- FLAC (.flac)
- OGG (.ogg)
- WMA (.wma)
- AAC (.aac)
The script uses the following defaults (can be modified in main.py):
- Model:
large-v3(Whisper's most accurate model) - Device:
cuda(GPU acceleration) - Compute Type:
float16(balanced speed/accuracy)
If you see CUDA/cuDNN errors, ensure you have installed the required libraries:
uv pip install nvidia-cudnn-cu12Verify your GPU is detected:
nvidia-smiMake sure your audio files are:
- Placed in the
audio/directory - In a supported format (see list above)
- Not hidden files
If you encounter out-of-memory errors, the large-v3 model requires significant VRAM. You can modify main.py to use a smaller model:
model = WhisperModel("medium", device="cuda", compute_type="float16")Available models (in order of size/accuracy):
tiny,base,small,medium,large-v2,large-v3
This project uses the following open-source components:
- faster-whisper (MIT License)
- OpenAI Whisper model (MIT License)