Cloudtype deployment repo for a standalone vLLM OpenAI-compatible server.
- Create a new app from this GitHub repository.
- Use
dockerfileas build file. - (Recommended) Attach a persistent volume and mount it to
/data. - Set environment variables:
MODEL_NAME(required), example:meta-llama/Meta-Llama-3-8B-InstructPORT(optional), default:8000HF_HOME(optional), default:/data/huggingfaceHUGGING_FACE_HUB_TOKEN(optional): required for gated/private models
- Deploy and copy the app URL.
Model weights are large (often multiple GB). This repo intentionally does not bake weights into the Docker image.
Instead, vLLM downloads the model at runtime and caches it under HF_HOME.
Mounting /data as a persistent volume prevents re-downloading on every deploy.
Set the bot environment variable:
VLLM_BASE_URL=https://<your-cloudtype-vllm-url>/v1
Then redeploy the bot service.