[Roadmap] vLLM production stack roadmap for 2025 Q1

This project's scope involves a set of production-related modules around vLLM, including router, autoscaling, observability, KV cache offloading, and framework supports (KServe, Ray, etc).

This document will include the items on our Q1 roadmap. We will keep updating this document to include the related issues, pull requests, and discussions in the [#production-stack](https://vllm-dev.slack.com/archives/C089SMEAKRA) channel in the vLLM slack.

## Core features
- [ ] (P0) Prefix-cache-aware routing algorithm (#19, #59 )
- [x] (P1) Offline batched inference based on [OpenAI offline batching API](https://platform.openai.com/docs/api-reference/batch/create) 
  - [x] Part 1: file storage support (#47 , #52 )
  - [x] Part 2: batched inference API support (#109 )
- [x] (P1) Router observability (Current QPS, router-side queueing delay, number of pending / prefilling / decoding requests, average prefill / decoding length, etc) (https://github.com/vllm-project/production-stack/issues/78, #119 )
- [ ] (P1) Autoscaling support 
  - [x] Basic autoscaling support based on Prometheus stack (#209)
  - [ ] CRD-based autoscaling
- [ ] (P2) Experimental support for disaggregated prefill
- [x] (P2) Support vLLM v1
- [ ] (P2) Transcode the router using a more performant language (e.g., Rust and Go) for better QPS/throughput and lower delay 

## CI/CD and packaging
- [x] (P0) Add unit test to the repo (#24)
- [x] (P0) Add end-to-end test of the deployment (#30)
- [x] (P0) Automatically release the helm charts and the router docker images (#23, #74 )
- [x] (P1) Package the router into a separate python package `vllm-router` (#17)

## OSS-related supports
- [x] (P0) format checker for the code (#35)
- [x] (P2) Issue and PR templates and labels (#93)

---
If any of the items you wanted are not on the roadmap, your suggestion and contribution are strongly welcomed! Please feel free to comment in this thread, open a feature request, or create an RFC.

*Happy vLLMing*!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

Core features

CI/CD and packaging

OSS-related supports

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] vLLM production stack roadmap for 2025 Q1 #26

Description

Core features

CI/CD and packaging

OSS-related supports

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions