Skip to content

remote: add gate waiting duration metric#18491

Open
muratclk wants to merge 1 commit intoprometheus:mainfrom
muratclk:add-gate-wait-duration-metric
Open

remote: add gate waiting duration metric#18491
muratclk wants to merge 1 commit intoprometheus:mainfrom
muratclk:add-gate-wait-duration-metric

Conversation

@muratclk
Copy link
Copy Markdown

@muratclk muratclk commented Apr 9, 2026

Summary

Add a histogram metric prometheus_remote_read_handler_gate_wait_duration_seconds that tracks how long remote read requests wait for the concurrency gate to allow execution.

Problem

When the remote read concurrency gate is nearly saturated, operators have no visibility into how long requests are queued waiting for a free slot. The existing prometheus_remote_read_handler_queries gauge shows current query count but doesn't reveal queuing latency. Without this signal, operators cannot make informed decisions about when to increase the --storage.remote.read-concurrency-limit.

Solution

Record the elapsed time between requesting a gate slot and acquiring it, then observe it in a histogram. The measurement is done at the call site in the read handler (ServeHTTP) rather than inside the gate package, keeping the gate as a simple concurrency primitive with no metric dependencies.

The histogram uses both classic buckets (prometheus.DefBuckets) and native histogram configuration for forward compatibility.

How this helps

  • Alerting: operators can alert when p99 gate wait exceeds a threshold, indicating the concurrency limit needs attention.
  • Dashboarding: wait duration trends over time reveal whether the system is approaching saturation.
  • Capacity planning: correlating gate wait with query volume helps size the concurrency limit correctly.

Fixes #11365

[ENHANCEMENT] Remote Read: Add `prometheus_remote_read_handler_gate_wait_duration_seconds` histogram metric to track how long remote read requests wait for a concurrency gate slot.

Add a histogram metric prometheus_remote_read_handler_gate_wait_duration_seconds
that tracks how long remote read requests wait for the concurrency gate.

When the gate's concurrency limit is nearly saturated, operators currently have
no way to know that requests are being held up waiting for a free slot. This
metric makes that wait visible so operators can decide whether to increase the
concurrency limit.

The metric is recorded at the call site in the read handler rather than in the
gate package itself, keeping the gate as a simple concurrency primitive and
avoiding unnecessary API changes.

Fixes prometheus#11365

Signed-off-by: muratclk <[email protected]>
@bboreham
Copy link
Copy Markdown
Member

Please explain how this PR relates to #17024, #18098, #18355, #18378, #18450.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gate needs a waiting duration metric

2 participants