vllm-huggingface

Make vllm-openai Docker container compatible with HuggingFace Inference Endpoints. Specifically, the most recent VLLM version supports vision language models like Phi-3-vision that Text Generation Inference does not yet support, so this repo is useful for deploying those VLM models not supported by TGI.

This repo was heavily inspired by https://github.com/philschmid/vllm-huggingface, but is simpler because it does not fork from vllm.

General Setup

Install dependencies with poetry install. If using poetry as your environment manager, run poetry shell to activate your environment.
Add a .env file in the root directory with HF_TOKEN defined as a read/write token from huggingface. See .env.example for how to format.

Deploy to HuggingFace Endpoint

View/Edit the details in examples/deploy.py. It is set up to deploy a HuggingFace Inference Endpoint for the Phi-3-vision model. Once you have set up the necessary variables, run python examples/deploy.py.
Go to the link printed by the previous deploy.py script to watch the endpoint deployment status and to retrieve the inference base url when finished deploying.
Copy this Endpoint Url from step 2 and add the env variable HF_ENDPOINT_URL with this copied value. Again, see .env.example for how to format.

Run inference

The endpoint you have deployed above is OpenAI API Compatible, meaning you can use the OpenAI library and any other library built to use OpenAI's library with your endpoint. To see an example of how you can call inference using your new endpoint, see examples/inference.py.
To run the inference, run python examples/inference.py.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
examples		examples
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
endpoints-entrypoint.sh		endpoints-entrypoint.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-huggingface

General Setup

Deploy to HuggingFace Endpoint

Run inference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vllm-huggingface

General Setup

Deploy to HuggingFace Endpoint

Run inference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages