remote: add gate waiting duration metric#18491
Open
muratclk wants to merge 1 commit intoprometheus:mainfrom
Open
remote: add gate waiting duration metric#18491muratclk wants to merge 1 commit intoprometheus:mainfrom
muratclk wants to merge 1 commit intoprometheus:mainfrom
Conversation
Add a histogram metric prometheus_remote_read_handler_gate_wait_duration_seconds that tracks how long remote read requests wait for the concurrency gate. When the gate's concurrency limit is nearly saturated, operators currently have no way to know that requests are being held up waiting for a free slot. This metric makes that wait visible so operators can decide whether to increase the concurrency limit. The metric is recorded at the call site in the read handler rather than in the gate package itself, keeping the gate as a simple concurrency primitive and avoiding unnecessary API changes. Fixes prometheus#11365 Signed-off-by: muratclk <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a histogram metric
prometheus_remote_read_handler_gate_wait_duration_secondsthat tracks how long remote read requests wait for the concurrency gate to allow execution.Problem
When the remote read concurrency gate is nearly saturated, operators have no visibility into how long requests are queued waiting for a free slot. The existing
prometheus_remote_read_handler_queriesgauge shows current query count but doesn't reveal queuing latency. Without this signal, operators cannot make informed decisions about when to increase the--storage.remote.read-concurrency-limit.Solution
Record the elapsed time between requesting a gate slot and acquiring it, then observe it in a histogram. The measurement is done at the call site in the read handler (
ServeHTTP) rather than inside the gate package, keeping the gate as a simple concurrency primitive with no metric dependencies.The histogram uses both classic buckets (
prometheus.DefBuckets) and native histogram configuration for forward compatibility.How this helps
Fixes #11365