[Serve][3/n] Add router queue latency#59233
Merged
abrarsheikh merged 8 commits intomasterfrom Dec 16, 2025
Merged
Conversation
Signed-off-by: abrar <[email protected]>
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a new metric, serve_queue_wait_time_ms, to measure the time requests spend in the router queue. The implementation adds logic to record this latency in RequestRouter and FIFOMixin, and includes a new test case to verify the metric's correctness.
My review focuses on improving code maintainability by addressing duplication and ensuring consistency in metric tagging. I've also identified a bug in the new test case where time units are being compared incorrectly.
Key feedback points:
- Refactor duplicated metric recording logic into a helper method.
- Ensure consistent metric tagging for the new histogram.
- Correct the assertion in the new test to use the correct time unit (milliseconds).
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
Signed-off-by: abrar <[email protected]>
harshit-anyscale
approved these changes
Dec 15, 2025
Signed-off-by: abrar <[email protected]>
akyang-anyscale
approved these changes
Dec 16, 2025
Signed-off-by: abrar <[email protected]>
kriyanshii
pushed a commit
to kriyanshii/ray
that referenced
this pull request
Dec 16, 2025
fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]> Signed-off-by: kriyanshii <[email protected]>
cszhu
pushed a commit
that referenced
this pull request
Dec 17, 2025
fixes #59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]>
zzchun
pushed a commit
to zzchun/ray
that referenced
this pull request
Dec 18, 2025
fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]>
Yicheng-Lu-llll
pushed a commit
to Yicheng-Lu-llll/ray
that referenced
this pull request
Dec 22, 2025
fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]>
6 tasks
aslonnie
pushed a commit
that referenced
this pull request
Jan 21, 2026
## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in #59233 after #60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]>
jinbum-kim
pushed a commit
to jinbum-kim/ray
that referenced
this pull request
Jan 29, 2026
## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: jinbum-kim <[email protected]>
400Ping
pushed a commit
to 400Ping/ray
that referenced
this pull request
Feb 1, 2026
## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: 400Ping <[email protected]>
ryanaoleary
pushed a commit
to ryanaoleary/ray
that referenced
this pull request
Feb 3, 2026
## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]> Signed-off-by: peterxcli <[email protected]>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: peterxcli <[email protected]>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: peterxcli <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixes #59218
testing with 100 users