Skip to content

feat: add Prometheus collector for DERP server expvar metrics#22583

Merged
sreya merged 16 commits intomainfrom
jon/wsproxy-metrics
Mar 6, 2026
Merged

feat: add Prometheus collector for DERP server expvar metrics#22583
sreya merged 16 commits intomainfrom
jon/wsproxy-metrics

Conversation

@sreya
Copy link
Collaborator

@sreya sreya commented Mar 3, 2026

This PR does three things:

  • Exports derp expvars to the pprof endpoint
  • Exports the expvar metrics as prometheus metrics in both coderd and wsproxy
  • Updates our tailscale to a fix I also had to make to avoid a data race condition

I generated this with mux but I also manually tested that the metrics were getting properly emitted

sreya added 10 commits March 3, 2026 22:09
Create a prometheus.Collector that bridges the tailscale derp.Server's
expvar-based stats to Prometheus metrics with namespace coder, subsystem
wsproxy_derp. Handles counters, gauges, labeled metrics (nested
metrics.Set for drop reasons, packet types, etc.), and the average
queue duration (converted from ms to seconds).

Register the collector in the wsproxy server after derpServer creation.
Add Prometheus metrics tracking active DERP websocket connections and
bytes relayed through the wsproxy:

- coder_wsproxy_derp_websocket_active_connections (gauge)
- coder_wsproxy_derp_websocket_bytes_total (counter, direction=read|write)

Implementation adds a DERPWebsocketMetrics hook struct and countingConn
wrapper in tailnet/, and a new WithWebsocketSupportAndMetrics function
that instruments the websocket connection lifecycle. The existing
WithWebsocketSupport function delegates to the new one with nil metrics.
…rs.NewExpvarCollector

Removes the hand-rolled enterprise/wsproxy/derpmetrics package and uses
the prometheus client library's NewExpvarCollector instead. This bridges
the same DERP server expvar stats to Prometheus with less code to maintain.

Metrics are now exposed as coder_wsproxy_derp{metric="<key>"} instead of
individual named metrics. Grafana dashboard queries updated accordingly.
- Rename expvar key from "wsproxy_derp" to "derp" to match coderd
- Rename sync.Once variable to expDERPOnce with clearer comment
- Move DERP metrics collector into enterprise/wsproxy/metrics.go
- Revert tailnet/derp.go changes (remove WithWebsocketSupportAndMetrics)
- Remove tailnet/derp_metrics.go (websocket byte counting was redundant
  with the DERP server expvar bytes_received/bytes_sent counters)
- Remove unused collectors import from wsproxy.go
Moves the DERP expvar-to-Prometheus collector to tailnet/ so it can be
shared between coderd and wsproxy. Registers it on both Prometheus
registries. Resolves the existing TODO in coderd/coderd.go.

Metric name is now coder_derp{metric="..."} for both coderd and wsproxy.
Adds --prometheus-enable --prometheus-address=127.0.0.1:2113 to the
local wsproxy started by develop.sh --use-proxy, so DERP metrics can
be verified during development.
…etrics

The generic collectors.NewExpvarCollector exported everything as untyped
metrics under a single name with a label, losing counter/gauge type info
and dropping nested metrics entirely.

Replace with a custom DERPExpvarCollector that:
- Properly types counters (bytes_received_total, packets_sent_total, etc.)
  and gauges (connections, clients_local, etc.)
- Iterates nested metrics.Set for labeled counters (packets_dropped by
  reason, packets_received by kind, tcp_rtt by bucket)
- Uses standard Prometheus naming (coder_derp_* prefix, _total suffix)
- Accepts *derp.Server directly instead of relying on global expvar state
Add TestWorkspaceProxyDERPMetrics to verify the DERPExpvarCollector is
registered during wsproxy startup, mirroring the existing TestDERPMetrics
in coderd.

Also fix expvar.Publish guards in both coderd and wsproxy to check
expvar.Get before publishing. The sync.Once per package was insufficient
when both coderd and wsproxy run in the same test process, as both
attempt to publish under the same "derp" key.
@coder-tasks
Copy link
Contributor

coder-tasks bot commented Mar 3, 2026

Documentation Check

Updates Needed

  • docs/admin/integrations/prometheus.md - Regenerate the Prometheus metrics reference to include the 25 new coder_derp_* metrics.

Automated review via Coder Tasks

@sreya sreya force-pushed the jon/wsproxy-metrics branch 2 times, most recently from dd70b73 to 4922902 Compare March 4, 2026 01:01
Inline newDERPDesc wrapper to direct prometheus.NewDesc calls so the
metricsdocgen scanner can discover them via static AST analysis. Add
tailnet to the scanner's scanDirs list. Regenerate generated_metrics
and prometheus.md docs.
@sreya sreya force-pushed the jon/wsproxy-metrics branch from 4922902 to c344a9e Compare March 4, 2026 01:02
sreya added 2 commits March 4, 2026 01:02
Remove the expvar HTTP handler and the expvar.Publish call from wsproxy.
The DERP metrics are now exported via the Prometheus collector, making
the unauthenticated expvar endpoint unnecessary. coderd's /debug/expvar
remains (it's behind authenticated routes).
@sreya sreya force-pushed the jon/wsproxy-metrics branch 2 times, most recently from d2708ac to 3ae9841 Compare March 4, 2026 02:46
@sreya sreya requested a review from deansheather March 4, 2026 02:54
Rename all DERP Prometheus metrics from coder_derp_* to
coder_derp_server_* for clearer namespacing. Regenerate
generated_metrics and prometheus.md docs.
@sreya sreya merged commit 6c44de9 into main Mar 6, 2026
28 checks passed
@sreya sreya deleted the jon/wsproxy-metrics branch March 6, 2026 07:58
@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants