Preliminary implementation of the inference engine for OpenAssistant.
The services of the inference stack are prefixed with "inference-" in the
unified compose descriptor.
Prior to building
those, please ensure that you have Docker's new
BuildKit backend enabled. See the
FAQ
for more info.
To build the services, run:
docker compose --profile inference buildSpin up the stack:
docker compose --profile inference up -dTail the logs:
docker compose logs -f \
inference-server \
inference-worker \
inference-text-client \
inference-text-generation-serverAttach to the text-client, and start chatting:
docker attach open-assistant-inference-text-client-1Note: In the last step,
open-assistant-inference-text-client-1refers to the name of thetext-clientcontainer started in step 2.
Note: The compose file contains the bind mounts enabling you to develop on the modules of the inference stack, and the
oasst-sharedpackage, without rebuilding.
Note: You can spin up any number of workers by adjusting the number of replicas of the
inference-workerservice to your liking.
Note: Please wait for the
inference-text-generation-serverservice to output{"message":"Connected"}before starting to chat.
Ensure you have tmux installed on you machine and the following packages
installed into the Python environment;
uvicornworker/requirements.txtserver/requirements.txttext-client/requirements.txtoasst_shared
You can run development setup to start the full development setup.
cd inference
./full-dev-setup.shMake sure to wait until the 2nd terminal is ready and says
{"message":"Connected"}before entering input into the last terminal.
Run a postgres container:
docker run --rm -it -p 5432:5432 -e POSTGRES_PASSWORD=postgres --name postgres postgresRun a redis container (or use the one of the general docker compose file):
docker run --rm -it -p 6379:6379 --name redis redisRun the inference server:
cd server
pip install -r requirements.txt
DEBUG_API_KEYS='0000,0001,0002' uvicorn main:app --reloadRun one (or more) workers:
cd worker
pip install -r requirements.txt
API_KEY=0000 python __main__.py
# to add another worker, simply run
API_KEY=0001 python __main__.pyFor the worker, you'll also want to have the text-generation-inference server running:
docker run --rm -it -p 8001:80 -e MODEL_ID=distilgpt2 \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
--name text-generation-inference ghcr.io/yk/text-generation-inferenceRun the text client:
cd text-client
pip install -r requirements.txt
python __main__.pyWe run distributed load tests using the
locust Python package.
pip install locust
cd tests/locust
locustNavigate to http://0.0.0.0:8089/ to view the locust UI.