whisper-asr-worker

ASR Worker that uses faster-whisper as the backend, to be used for transcribing AV material from B&G.

This is still a WIP, so it is subject to change.

There are 2 ways in which the whisper-asr-worker can be tested (ON THE CPU):

1. Docker CPU run (recommended)

Check if Docker is installed
Make sure you have the .env.override file in your local repo folder
In .env.override, change W_DEVICE from cuda to cpu
Comment out the lines indicated in docker-compose.yml
Open your preferred terminal and navigate to the local repository folder
To build the image, execute the following command:

docker build . -t whisper-asr-worker

To run the worker, execute the following command:

docker compose up

2. Local CPU run

All commands should be run within WSL if on Windows or within your terminal if on Linux.

Follow the steps here (under "Adding pyproject.toml and generating a poetry.lock based on it") to install Poetry and the dependencies required to run the worker
Make sure you have the .env.override file in your local repo folder
In .env.override, change W_DEVICE from cuda to cpu
Install ffmpeg. You can run this command, for example:

apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg

Navigate to scripts, then execute the following command:

./run.sh

Running the worker using a CUDA-compatible GPU

To run the worker with a CUDA-compatible GPU instead of the CPU, either:

skip steps 3 & 4 from "Docker CPU run"
skip step 3 from "Local run"

(OUTDATED BUT STILL MIGHT BE RELEVANT) To run it using a GPU via Docker, check the instructions from the dane-example-worker.

Make sure to replace dane-example-worker in the docker run command with dane-whisper-asr-worker.

Using the API

Accessing the endpoint

To access the worker and schedule runs, go to the following link: http://localhost:5333/docs.

In there, you have the following options:

GET /tasks: returns a list of all tasks that have run or are currently running/scheduled to run
POST /tasks: schedule a new task to transcribe the input URI and export it to the output URI. The format of a task is the following:

{
  "input_uri": "string",
  "output_uri": "string",
  "status": "CREATED | PROCESSING | DONE | ERROR",
  "id": "string",
  "error_msg": "string",
  "response": {}
}

The only thing you need to absolutely have is the input_uri. The output_uri can stay empty in which case the generated transcripts will be stored locally, and the rest of the fields will be automatically generated or updated throughout the task's process.

GET /status: returns the status of the worker:

503 if the worker is currently executing a task
200 if the worker is available to run new tasks

GET /tasks/{task_id}: returns the task details of the given task_id
DELETE /tasks/{task_id}: deletes the task with the given task_id
GET /ping: returns pong (can be ignored, not relevant to the main functionality of the worker)

Expected run when scheduling a new task

The expected run of this worker (whose pipeline is defined in asr.py) should

download the input file if it isn't downloaded already in /data/input/ via download.py
download the model if not present via model_download.py
run transcode.py if the input file is a video to convert it to audio format (though there are plans to remove this and instead use the audio-extraction-worker to extract the audio)
run whisper.py to transcribe the audio and save it in /data/output/ if a transcription doesn't already exist
convert Whisper's output to DAAN index format using daan_transcript.py
(optional) transfer the output to an S3 bucket.

Model options

If you prefer to use your own model that is stored locally, make sure to set MODEL_BASE_DIR to the path where the model files can be found.

The pre-trained Whisper model version can be adjusted in the .env file by editing the W_MODEL parameter. Possible options are:

Size	Parameters
`tiny`	39 M
`base`	74 M
`small`	244 M
`medium`	769 M
`large`	1550 M
`large-v2`	1550 M
`large-v3`	1550 M

We recommend version large-v2 as it performs better than large-v3 in our benchmarks.

You can also specify an S3/HTTP URI if you want to load your own (custom) model (by modifying the W_MODEL parameter).

Config

The parameters used to configure the application can be found under .env file. You will also need to create a .env.override file that contains secrets related to the S3 connection that should normally not be exposed in the .env file. The parameters that should be updated with valid values in the .env.override are:

S3_ENDPOINT_URL
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Name		Name	Last commit message	Last commit date
Latest commit History 420 Commits
.github		.github
data/whisper-test		data/whisper-test
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env		.env
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
api.py		api.py
asr.py		asr.py
base_util.py		base_util.py
config.py		config.py
daan_transcript.py		daan_transcript.py
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
download.py		download.py
gpu_measure.py		gpu_measure.py
main.py		main.py
model_download.py		model_download.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
s3_util.py		s3_util.py
transcode.py		transcode.py
whisper.py		whisper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper-asr-worker

1. Docker CPU run (recommended)

2. Local CPU run

Running the worker using a CUDA-compatible GPU

Using the API

Accessing the endpoint

Expected run when scheduling a new task

Model options

Config

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whisper-asr-worker

1. Docker CPU run (recommended)

2. Local CPU run

Running the worker using a CUDA-compatible GPU

Using the API

Accessing the endpoint

Expected run when scheduling a new task

Model options

Config

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages