Skip to content

feat: Add waiting duration metric to query gate#18378

Open
AdeshDeshmukh wants to merge 1 commit intoprometheus:mainfrom
AdeshDeshmukh:add-gate-waiting-metric
Open

feat: Add waiting duration metric to query gate#18378
AdeshDeshmukh wants to merge 1 commit intoprometheus:mainfrom
AdeshDeshmukh:add-gate-waiting-metric

Conversation

@AdeshDeshmukh
Copy link
Copy Markdown

Fixes #11365

The query gate limits concurrent requests but we had no visibility into how long requests wait when the limit is hit.

This adds a histogram metric to track waiting duration, so operators can see if the gate is becoming a bottleneck and whether they need to increase the concurrency limit.

The metric is named 'prometheus_query_gate_waiting_duration_seconds' and uses standard histogram buckets. Waiting time is measured from when Start() is called until the request acquires a gate slot.

This includes comprehensive tests covering normal operation, context cancellation, and metric recording.

Fixes: prometheus#11365

The query gate limits concurrent requests but we had no visibility
into how long requests wait when the limit is hit.

This adds a histogram metric to track waiting duration, so operators
can see if the gate is becoming a bottleneck and whether they need to
increase the concurrency limit.

The metric is named 'prometheus_query_gate_waiting_duration_seconds'
and uses standard histogram buckets. Waiting time is measured from
when Start() is called until the request acquires a gate slot.

This includes comprehensive tests covering normal operation, context
cancellation, and metric recording.

Signed-off-by: Test User <[email protected]>
@AdeshDeshmukh AdeshDeshmukh force-pushed the add-gate-waiting-metric branch from 4b17969 to 92241cf Compare March 26, 2026 16:58
Copy link
Copy Markdown
Contributor

@ogulcanaydogan ogulcanaydogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AdeshDeshmukh — I also have an open PR for this issue (#18355).

A few observations on this approach:

  1. Global metric via promauto: The histogram is a package-level singleton, which means it can't be customized per caller and is harder to test (can't verify observations through a test registry). #18355 uses the prometheus.Registerer pattern (like util/notifications) so the caller controls naming and registration.

  2. No New() signature change: This keeps backward compat, but it also means the metric is always registered — even if the gate is used in a context where metrics aren't wanted.

  3. Metric naming: prometheus_query_gate_waiting_duration_seconds assumes the gate is only used for queries. The remote read handler also uses it, so a more generic name (or caller-provided prefix) might be better.

Happy to collaborate on converging the approaches — the core logic (measure time.Since(start) in Start()) is the same in both PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gate needs a waiting duration metric

2 participants