LLM Inference Services

This directory contains production-ready LLM inference deployments optimized for NVIDIA DGX Spark and multi-GPU VMs.

Architecture

┌─────────────────────────────────────────────────────────┐
│                  NGINX Gateway (HTTPS)                   │
│  Port 443 → Path-based routing to model servers         │
└─────────────────────────────────────────────────────────┘
                          │
      ┌───────────────────┼───────────────────┐
      │                   │                   │
┌─────▼─────┐   ┌─────────▼────────┐   ┌──────▼──────┐
│ GPT-OSS   │   │  GPT-OSS         │   │  Qwen-30B   │
│ 20B       │   │  120B            │   │  Coder      │
│ (1 GPU)   │   │  (3 GPUs)        │   │  (2 GPUs)   │
└───────────┘   └──────────────────┘   └─────────────┘

Prerequisites

Hardware: DGX Spark or multi-GPU VM (minimum 3 GPUs recommended)
Software:
- Docker & Docker Compose v2.0+
- NVIDIA Container Toolkit
- CUDA 13.0+
Access: HuggingFace account with token for gated models

Getting Started

Navigate to vLLM directory:
```
cd llms/vllm/
```
Follow the Quick Start guide:
- Standard Deployment - for DGX Spark or single-model setups
- VM GPU Deployment - for multi-model VMs
Access the gateway:
```
curl -k https://localhost/v1/models
```

Documentation

vLLM Multi-Model Gateway Documentation - Complete setup, configuration, and usage guide
vLLM Official Docs - vLLM framework documentation
OpenAI API Reference - API compatibility reference

Support

For issues or questions:

Check the vLLM Troubleshooting Guide
Review vLLM GitHub Issues
For DGX Spark specific issues, contact NVIDIA Enterprise Support

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
vllm		vllm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Inference Services

Contents

vLLM Multi-Model Gateway

Architecture

Prerequisites

Getting Started

Documentation

Support

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Inference Services

Contents

vLLM Multi-Model Gateway

Architecture

Prerequisites

Getting Started

Documentation

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages