Skip to content

Ensure that all products correctly expose metrics #747

@sbernauer

Description

@sbernauer

Description

Note

In SDP 25.7 we finished the initial rollout of our Listener operator.

Ensure all Stackable operators correctly expose Prometheus metrics. During the listener rollout (hdfs → kafka → all other operators), we established a pattern how metrics are to be exposed (metrics service, labels, etc. see below) but we're not sure if we followed these practices consistently.

Context:

This issue is to make sure all products correctly expose metrics according to the decision mentioned above.

Tasks

For every product check that

  1. A metrics service exists
    • It has the label prometheus.io/scrape=true
    • It has the according annotations prometheus.io/scheme, prometheus.io/port and prometheus.io/path
    • It only exposes the metrics port, no data
    • The port is called metrics
  2. No other service exposes a port metrics
  3. No other service has a prometheus.io/scrape=true label
  4. All metric services have a correct app.kubernetes.io/name value that is appropriate for the service in question (listener is not appropriate, kafka would be for example).
    • This is important, as this label is carried over into the Prometheus metrics.
    • Service created by listener-operator have "wrong" labels, e.g. app.kubernetes.io/name=listener. This is not good, bad out of scope for this issue
  5. The Pod has the metrics port (if possible - it could be the case that the port number clashes with e.g. HTTP - which k8s doesn't like for some reason)
  6. JMX Exporter: Check <role>.yaml in docker images
    • Do they still work properly? E.g. for Kafka we use 2.0.0
    • Are any updates / improvements available?
    • Are all metrics there or do we lose any due to filtering etc.?
  7. The monitoring stack still collects the metrics out of the box and uses native metrics wherever possible (This was originally done in Update monitoring stack to SDP 25.7 and scrape all products demos#284, for this issue we only need to make sure we don't break it)
  8. Ideally all of the products work with a single ServiceMonitor similar to https://github.com/stackabletech/demos/blob/b72cee51ef8231c583bde26dde0bd5ab60d2381e/stacks/monitoring/prometheus-service-monitors.yaml#L171-L220
  9. Documentation is updated with new names and any changes done
  10. Release notes for breaking changes including migration paths have been written

Products to check/fix

Consolidate Monitoring Stack

Metadata

Metadata

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions