This helm chart lets users deploy multiple serving engines and a router into the Kubernetes cluster.
- Support running multiple serving engines with multiple different models
- Load the model weights directly from the existing PersistentVolumes
- A running Kubernetes cluster with GPU. (You can set it up through
minikube: https://minikube.sigs.k8s.io/docs/tutorials/nvidia/) - Helm
helm install llmstack . -f values-example.yamlrun helm uninstall llmstack
See helm/values.yaml for mode details.