feat: add Prometheus collector for DERP server expvar metrics#22583
Merged
feat: add Prometheus collector for DERP server expvar metrics#22583
Conversation
Create a prometheus.Collector that bridges the tailscale derp.Server's expvar-based stats to Prometheus metrics with namespace coder, subsystem wsproxy_derp. Handles counters, gauges, labeled metrics (nested metrics.Set for drop reasons, packet types, etc.), and the average queue duration (converted from ms to seconds). Register the collector in the wsproxy server after derpServer creation.
Add Prometheus metrics tracking active DERP websocket connections and bytes relayed through the wsproxy: - coder_wsproxy_derp_websocket_active_connections (gauge) - coder_wsproxy_derp_websocket_bytes_total (counter, direction=read|write) Implementation adds a DERPWebsocketMetrics hook struct and countingConn wrapper in tailnet/, and a new WithWebsocketSupportAndMetrics function that instruments the websocket connection lifecycle. The existing WithWebsocketSupport function delegates to the new one with nil metrics.
…rs.NewExpvarCollector
Removes the hand-rolled enterprise/wsproxy/derpmetrics package and uses
the prometheus client library's NewExpvarCollector instead. This bridges
the same DERP server expvar stats to Prometheus with less code to maintain.
Metrics are now exposed as coder_wsproxy_derp{metric="<key>"} instead of
individual named metrics. Grafana dashboard queries updated accordingly.
- Rename expvar key from "wsproxy_derp" to "derp" to match coderd - Rename sync.Once variable to expDERPOnce with clearer comment - Move DERP metrics collector into enterprise/wsproxy/metrics.go - Revert tailnet/derp.go changes (remove WithWebsocketSupportAndMetrics) - Remove tailnet/derp_metrics.go (websocket byte counting was redundant with the DERP server expvar bytes_received/bytes_sent counters) - Remove unused collectors import from wsproxy.go
Moves the DERP expvar-to-Prometheus collector to tailnet/ so it can be
shared between coderd and wsproxy. Registers it on both Prometheus
registries. Resolves the existing TODO in coderd/coderd.go.
Metric name is now coder_derp{metric="..."} for both coderd and wsproxy.
Adds --prometheus-enable --prometheus-address=127.0.0.1:2113 to the local wsproxy started by develop.sh --use-proxy, so DERP metrics can be verified during development.
…etrics The generic collectors.NewExpvarCollector exported everything as untyped metrics under a single name with a label, losing counter/gauge type info and dropping nested metrics entirely. Replace with a custom DERPExpvarCollector that: - Properly types counters (bytes_received_total, packets_sent_total, etc.) and gauges (connections, clients_local, etc.) - Iterates nested metrics.Set for labeled counters (packets_dropped by reason, packets_received by kind, tcp_rtt by bucket) - Uses standard Prometheus naming (coder_derp_* prefix, _total suffix) - Accepts *derp.Server directly instead of relying on global expvar state
Add TestWorkspaceProxyDERPMetrics to verify the DERPExpvarCollector is registered during wsproxy startup, mirroring the existing TestDERPMetrics in coderd. Also fix expvar.Publish guards in both coderd and wsproxy to check expvar.Get before publishing. The sync.Once per package was insufficient when both coderd and wsproxy run in the same test process, as both attempt to publish under the same "derp" key.
Contributor
Documentation CheckUpdates Needed
Automated review via Coder Tasks |
dd70b73 to
4922902
Compare
Inline newDERPDesc wrapper to direct prometheus.NewDesc calls so the metricsdocgen scanner can discover them via static AST analysis. Add tailnet to the scanner's scanDirs list. Regenerate generated_metrics and prometheus.md docs.
4922902 to
c344a9e
Compare
Remove the expvar HTTP handler and the expvar.Publish call from wsproxy. The DERP metrics are now exported via the Prometheus collector, making the unauthenticated expvar endpoint unnecessary. coderd's /debug/expvar remains (it's behind authenticated routes).
d2708ac to
3ae9841
Compare
Rename all DERP Prometheus metrics from coder_derp_* to coder_derp_server_* for clearer namespacing. Regenerate generated_metrics and prometheus.md docs.
deansheather
approved these changes
Mar 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR does three things:
I generated this with mux but I also manually tested that the metrics were getting properly emitted