Fast Audio Annotate

Fast Audio Annotate is a lightweight FastHTML interface built for collective audio transcription review.

Workflow

Prepare your audio: drop files into the configured folder (default: audio/) and, if needed, add a metadata.json with extra context.
Run it locally: install dependencies with pip install -r requirements.txt and launch python main.py to serve the app at http://localhost:5001.
Draft with Whisper & segments: use the included scripts (for example segments_audios.py) to generate initial transcriptions and pre-split your audio into timestamped segments that keep their original timing.
Deploy and share: deploy the app using your preferred platform (see deployment options) and share the link so your collaborators can correct and approve clips from anywhere.

The tool keeps track of every contribution, renders waveforms with WaveSurfer.js, and focuses on being simple so anyone can jump in and improve the transcripts. In a typical setup, segments_audios.py is used once to populate the database with short, machine-transcribed audio segments, and the web UI is then used by the community to correct both the text and the segment boundaries.

Installation

Clone and enter the project

git clone https://github.com/aastroza/fast_audio_annotate.git
cd fast_audio_annotate

Create an isolated environment (optional but recommended)
```
python -m venv .venv
source .venv/bin/activate
```
Install Python dependencies
```
pip install -r requirements.txt
```
Prepare the audio directory
- Place your .wav/.mp3 files under the folder configured by audio_folder (defaults to audio/).
- Optionally create metadata.json (see Configuration) before starting the server so metadata is ingested on boot.

Configuration

You can customize defaults through config.yaml (parsed on startup) or CLI arguments (python main.py --help shows available flags). The most relevant options are:

Key	Default	Description
`audio_folder`	`audio`	Directory containing source audio and the local annotations database. For deployed apps, use a full URL to cloud storage (e.g., `https://storage.googleapis.com/your-bucket-name`).
`metadata_filename`	`metadata.json`	File automatically scanned to populate clip metadata.
`database_url`	`null`	PostgreSQL/Neon URL to use instead of the local SQLite database.
`title` / `description`	App defaults	Copy shown in the page header.
`whisper_model`	`openai/whisper-large-v3`	Base model used by preprocessing scripts when generating drafts.

Remote Audio Storage

When deploying the app to a cloud platform (instead of running locally), you'll need to host your audio files in cloud storage and configure CORS (Cross-Origin Resource Sharing) to allow browser access:

Google Cloud Storage Example

Upload your audio files to a Google Cloud Storage bucket

Configure CORS to allow your deployed app's domain to access the files:

# Create a cors.json file
echo '[{"origin": ["https://your-app-domain.com"], "method": ["GET"], "responseHeader": ["Content-Type"], "maxAgeSeconds": 3600}]' > cors.json

# Apply CORS configuration to your bucket
gsutil cors set cors.json gs://your-bucket-name

Update config.yaml to point to your bucket:

audio_folder: "https://storage.googleapis.com/your-bucket-name"

Other Storage Providers

Similar CORS configuration is needed for other cloud storage providers:

AWS S3: Configure CORS in the S3 bucket settings
Azure Blob Storage: Set CORS rules in the storage account
Cloudflare R2: Configure CORS policies in the R2 dashboard

Make sure your storage allows GET requests from your deployed app's origin domain.

Environment variables

Environment variables can be provided directly or through a .env file (automatically loaded by python-dotenv). Relevant keys:

DATABASE_URL: PostgreSQL connection string used to store clips remotely.
NEON_DATABASE_URL: Alternate name for the same setting when using Neon-hosted Postgres.
USER / USERNAME: Optional contributor name fallback that is logged when reviewers skip the explicit form field.

If neither DATABASE_URL nor NEON_DATABASE_URL is set, the application stores everything in annotations.db inside the audio directory.

Interface overview

The review UI is designed to make correcting clips fast while providing useful context:

Randomized queue: every reviewer receives a random pending clip; empty queues render a friendly "All caught up" message.
Waveform player: WaveSurfer.js provides playback, zoom, keyboard shortcuts (Space/Q/W), speed controls, and draggable region handles.
Timing inputs: numeric fields mirror the selected region so you can fine-tune start/end times manually if needed.
Transcription editor: a multiline text area seeded with the current draft ready for corrections.
Contributor credit: reviewers can enter their name, and the sidebar highlights top contributors with contribution counts.
Metadata panel: contextual information from metadata.json (or your database) is rendered alongside the clip so annotators know what they are hearing.
HTMX actions: the buttons at the bottom save work, mark clips as reviewed, or flag problematic audio without reloading the page.

Data flow and storage

Each audio file is split into clips and stored in the clips table managed by db_backend.py.
By default the app initializes a SQLite database (annotations.db) under the audio folder; setting DATABASE_URL/NEON_DATABASE_URL switches to PostgreSQL while keeping the same schema.
When reviewers interact with the interface, every action updates the clip record with the latest timestamps, text, reviewer name, and review status.
Flagging a clip flips its marked flag so it leaves the active queue, while completing a clip marks it as human_reviewed and ready for export.
metadata.json entries are loaded into the database at startup so additional fields (speaker, language, tags, etc.) can be displayed in the sidebar.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
scripts		scripts
src/fast_audio_annotate		src/fast_audio_annotate
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CaptureAnnotation.PNG		CaptureAnnotation.PNG
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
cors-config.json		cors-config.json
cors.json		cors.json
db_backend.py		db_backend.py
main.py		main.py
print_db.py		print_db.py
requirements.txt		requirements.txt
shiaaa.png		shiaaa.png
shiaaa2.png		shiaaa2.png
shiaaa3.png		shiaaa3.png
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast Audio Annotate

Workflow

Installation

Configuration

Remote Audio Storage

Google Cloud Storage Example

Other Storage Providers

Environment variables

Interface overview

Data flow and storage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fast Audio Annotate

Workflow

Installation

Configuration

Remote Audio Storage

Google Cloud Storage Example

Other Storage Providers

Environment variables

Interface overview

Data flow and storage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages