[Serve][3/n] Add router queue latency by abrarsheikh · Pull Request #59233 · ray-project/ray

abrarsheikh · 2025-12-07T02:46:50Z

import asyncio
from ray import serve
from typing import List

@serve.deployment(max_ongoing_requests=1000)
class MyDeployment:
    async def __call__(self) -> List[int]:
        await asyncio.sleep(.01)
        return 1

app = MyDeployment.bind()

testing with 100 users

Metric	With Change	Master	Δ (Master – With Change)
Requests	128,197	128,136	–61
Fails	1,273	1,363	+90
Median (ms)	170	170	0
95%ile (ms)	260	260	0
99%ile (ms)	310	320	+10 ms
Average (ms)	178.66	178.99	+0.33 ms
Min (ms)	9	8	–1 ms
Max (ms)	1,171	2,995	+1,824 ms
Average size (bytes)	0.99	0.99	0
Current RPS	538.8	558.2	+19.4
Current Failures/s	0	0	0

Signed-off-by: abrar <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a new metric, serve_queue_wait_time_ms, to measure the time requests spend in the router queue. The implementation adds logic to record this latency in RequestRouter and FIFOMixin, and includes a new test case to verify the metric's correctness.

My review focuses on improving code maintainability by addressing duplication and ensuring consistency in metric tagging. I've also identified a bug in the new test case where time units are being compared incorrectly.

Key feedback points:

Refactor duplicated metric recording logic into a helper method.
Ensure consistent metric tagging for the new histogram.
Correct the assertion in the new test to use the correct time unit (milliseconds).

python/ray/serve/tests/test_metrics.py

python/ray/serve/_private/request_router/request_router.py

Signed-off-by: abrar <[email protected]>

python/ray/serve/_private/request_router/request_router.py

Signed-off-by: abrar <[email protected]>

harshit-anyscale

the comment from cursor seems legit, else LGTM

python/ray/serve/_private/request_router/request_router.py

Signed-off-by: abrar <[email protected]>

python/ray/serve/_private/request_router/request_router.py

Signed-off-by: abrar <[email protected]>

fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]> Signed-off-by: kriyanshii <[email protected]>

fixes #59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]>

fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]>

## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in #59233 after #60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]>

## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: jinbum-kim <[email protected]>

## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: 400Ping <[email protected]>

## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]>

fixes ray-project#59218 ```python import asyncio from ray import serve from typing import List @serve.deployment(max_ongoing_requests=1000) class MyDeployment: async def __call__(self) -> List[int]: await asyncio.sleep(.01) return 1 app = MyDeployment.bind() ``` testing with 100 users Metric | With Change | Master | Δ (Master – With Change) -- | -- | -- | -- Requests | 128,197 | 128,136 | –61 Fails | 1,273 | 1,363 | +90 Median (ms) | 170 | 170 | 0 95%ile (ms) | 260 | 260 | 0 99%ile (ms) | 310 | 320 | +10 ms Average (ms) | 178.66 | 178.99 | +0.33 ms Min (ms) | 9 | 8 | –1 ms Max (ms) | 1,171 | 2,995 | +1,824 ms Average size (bytes) | 0.99 | 0.99 | 0 Current RPS | 538.8 | 558.2 | +19.4 Current Failures/s | 0 | 0 | 0 --------- Signed-off-by: abrar <[email protected]> Signed-off-by: peterxcli <[email protected]>

## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in ray-project#59233 after ray-project#60139 added throttling. --------- Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: peterxcli <[email protected]>

[Serve][3/n] Add router queue latency

0911e70

Signed-off-by: abrar <[email protected]>

abrarsheikh added the go add ONLY when ready to merge, run all tests label Dec 7, 2025

gemini-code-assist bot reviewed Dec 7, 2025

View reviewed changes

python/ray/serve/tests/test_metrics.py Outdated Show resolved Hide resolved

python/ray/serve/_private/request_router/request_router.py Outdated Show resolved Hide resolved

python/ray/serve/_private/request_router/request_router.py Outdated Show resolved Hide resolved

abrarsheikh added 4 commits December 7, 2025 02:51

refactor code in func

1b8fb87

Signed-off-by: abrar <[email protected]>

remove comment

8bcd937

Signed-off-by: abrar <[email protected]>

Merge branch 'master' into 59218-abrar-router_latency

4ac8703

add routing metrics

7f08cee

Signed-off-by: abrar <[email protected]>

abrarsheikh marked this pull request as ready for review December 15, 2025 07:58

abrarsheikh requested review from a team as code owners December 15, 2025 07:58

abrarsheikh requested a review from harshit-anyscale December 15, 2025 07:58

cursor bot reviewed Dec 15, 2025

View reviewed changes

python/ray/serve/_private/request_router/request_router.py Outdated Show resolved Hide resolved

ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Dec 15, 2025

remove log

e190ab4

Signed-off-by: abrar <[email protected]>

harshit-anyscale approved these changes Dec 15, 2025

View reviewed changes

akyang-anyscale reviewed Dec 15, 2025

View reviewed changes

python/ray/serve/_private/request_router/request_router.py Outdated Show resolved Hide resolved

python/ray/serve/_private/request_router/request_router.py Show resolved Hide resolved

rename

b4657f8

Signed-off-by: abrar <[email protected]>

akyang-anyscale approved these changes Dec 16, 2025

View reviewed changes

python/ray/serve/_private/request_router/request_router.py Outdated Show resolved Hide resolved

rename

c06f60d

Signed-off-by: abrar <[email protected]>

abrarsheikh merged commit 31a0e1e into master Dec 16, 2025
6 checks passed

abrarsheikh deleted the 59218-abrar-router_latency branch December 16, 2025 05:34

eicherseiji mentioned this pull request Jan 20, 2026

[Serve] Fix flaky test_router_queue_len_metric #60333

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve][3/n] Add router queue latency#59233

[Serve][3/n] Add router queue latency#59233
abrarsheikh merged 8 commits intomasterfrom
59218-abrar-router_latency

abrarsheikh commented Dec 7, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harshit-anyscale left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abrarsheikh commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harshit-anyscale left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abrarsheikh commented Dec 7, 2025 •

edited

Loading

harshit-anyscale left a comment •

edited

Loading