Users can configure any custom applications that use local GenAI models, to run with ConsumerBench. The process of adding a new application to ConsumerBench is:
- Create a new sub-directory in this folder for the application
- Install the application in the sub-directory
- Implement the
Applicationinterface. (Please see the existing applications, such as DeepResearch:DeepResearch/DeepResearch.py.) - Register the application with ConsumerBench: Please create an instance of the application in
src/scripts/run_consumerbench.py. Look for existing applications and similarly register the new application. - You can then add your own applications to the workflows (specified in
configs/), and the application will be monitored automatically with ConsumerBench
Currently, the ConsumerBench repository contains with 4 applications: Chatbot, DeepResearch, LiveCaptions and Imagegen. We have already added their classes in the corresponding directories.
Following are the steps to install the applications, setup the inference backend with the model and the datasets specified in the paper. While we specify the model and dataset here which are used in the paper, users are free to download their own models and datasets to use with the applications.
Installing application involves setting up llama.cpp server.
cd <repo-dir>/inference_backends/llama.cpp
cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_F16=1 -DCMAKE_CUDA_ARCHITECTURES="75" -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.8/bin/nvcc
cd build
make -j32
Chatbot client then directly sends http requests to the llama.cpp server for each request.
Create a new conda environment with python 3.10. Activate the environment.
conda create -n deepresearch python=3.10
conda activate deepresearch
cd DeepResearch/smolagents/examples/open_deep_research
pip install -r requirements.txt
pip install -e ../../.[dev]
Download the Llama-3.2-3B model from huggingface. Note that you may need a huggingface account, and permission to download the gated llama model.
wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-f16.gguf
mv Llama-3.2-3B-Instruct-f16.gguf <repo-base>/models/
Create a conda environment with python 3.10. Activate the environment.
conda create -n imagegen python=3.10
conda activate imagegen
pip install -r requirements.txt
pip install diffusers
pip install transformers==4.50.3
Download the stable-diffusion-3.5-large model from huggingface. Note that you may need a huggingface account
git lfs install
git clone https://huggingface.co/tensorart/stable-diffusion-3.5-medium-turbo
mv stable-diffusion-3.5-medium-turbo <repo-base>/models/
Create a conda environment with python 3.10. Activate the environment.
conda create -n whisper python=3.10
conda activate whisper
conda install nvidia::cudnn cuda-version=12
pip install librosa soundfile
pip install faster-whisper
pip install torch torchaudio
pip install transformers
pip install datasets
pip install torchcodec
Download the Whisper-Large-V3-Turbo model from huggingface. Note that you may need a huggingface account
git lfs install
git clone https://huggingface.co/openai/whisper-large-v3-turbo
mv whisper-large-v3-turbo ../models/
LiveCaptions shows live audio captioning. In the paper, in order to simulate live captioning for multiple requests, we store the distil-whisper/earnings21 dataset into wav files, and use each wav file as a single request for this application.
conda activate whisper
cd <repo-base>/applications/LiveCaptions/
python whisper_streaming/generate_wav_dataset.py
python whisper_streaming/split_wav_file.py --input_file ./whisper-earnings21/4320211.wav --output-dir ./whisper-earnings21
Make sure conda path in whisper_online_client.sh and whisper_online_server.sh are setup correctly.
Make sure --warmup-file in whisper_online_server.sh is pointed to a correct warmup audio.
sudo nvidia-cuda-mps-control -d
sudo nvidia-cuda-mps-control
quit