Name		Name	Last commit message	Last commit date
parent directory ..
crds		crds
templates		templates
.helmignore		.helmignore
Chart.yaml		Chart.yaml
README.md		README.md
ct.yaml		ct.yaml
lintconf.yaml		lintconf.yaml
test.sh		test.sh
values-example.yaml		values-example.yaml
values.schema.json		values.schema.json
values.yaml		values.yaml

README.md

vLLM Production Stack helm chart

This helm chart lets users deploy multiple serving engines and a router into the Kubernetes cluster.

Key features

Support running multiple serving engines with multiple different models
Load the model weights directly from the existing PersistentVolumes

Prerequisites

A running Kubernetes cluster with GPU. (You can set it up through minikube: https://minikube.sigs.k8s.io/docs/tutorials/nvidia/)
Helm

Install the helm chart

helm install llmstack . -f values-example.yaml

Uninstall the deployment

run helm uninstall llmstack

Configure the deployments

See helm/values.yaml for mode details.