Skip to content

AliceO2Group/Monitoring

 
 

Repository files navigation

Monitoring

travis-ci aliBuild codecov JIRA doxygen

Monitoring module injects user custom metrics and monitors the process. It supports multiple backends, protocols and data formats.

Table of contents

  1. Installation
  2. Getting started
  3. Advanced features
  4. System monitoring and server-side backends installation and configuration

Installation

Click here if you don't have aliBuild installed

  • Compile Monitoring and its dependencies via aliBuild
aliBuild build Monitoring --defaults o2-dataflow
  • Load the environment for Monitoring (in the alice directory)
alienv load Monitoring/latest

Getting started

Monitoring instance

Get an instance from MonitoringFactory by passing backend's URI(s) as a parameter (comma separated if more than one). The factory is accessible from o2::monitoring namespace.

#include <MonitoringFactory.h>
using namespace o2::monitoring;
std::unique_ptr<Monitoring> monitoring = MonitoringFactory::Get("backend[-protocol]://host:port[/verbosity][?query]");

See the table below to find URIs for supported backends:

Backend name Transport URI backend[-protocol] URI query Default verbosity
No-op - no-op - -
InfluxDB UDP influxdb-udp - info
InfluxDB Unix socket influxdb-unix - info
InfluxDB StdOut influxdb-stdout - info
InfluxDB Kafka influxdb-kafka Kafka topic info
InfluxDB 2.x HTTP influxdbv2 org=ORG&bucket=BUCKET&token=TOKEN info
ApMon UDP apmon - info
StdOut - stdout, infologger [Prefix] debug

Metrics

A metric consist of 5 parameters:

  • name - metric name
  • values - vector of value and value name pairs
  • timestamp - time of creation
  • verbosity - metric "severity"
  • tags - metric metadata represented as map
Parameter name Type Required Default
name string yes -
values map<string, int/double/string/uint64_t> no/1 -
timestamp time_point<system_clock> no current time
verbosity Enum (Debug/Info/Prod) no Verbosity::Info
tags map no host and process names

A metric can be constructed by providing required parameters (value and metric name, value name is set to value):

Metric{10, "name"}

Values

By default metric can be created with zero or one value (in such case value name is set to value). Any additional value may be added using .addValue method, therefore the following two metrics are identical:

Metric{10, "name"}
Metric{"name"}.addValue(10, "value")

Tags

Each metric can be tagged with any number of predefined tags. In order to do so use addTag(tags::Key, tags::Value) or addTag(tags::Key, unsigned short) methods. The latter method allows assigning numeric value to a tag.

Metric{10, "name"}.addTag(tags::Key::Subsystem, tags::Value::QC)

See the example: examples/2-TaggedMetrics.cxx.

Sending metric

Pass metric object to send method as l-value reference:

send({10, "name"})
send(Metric{20, "name"}.addTag(tags::Key::CRU, 123))
send(Metric{"throughput"}.addValue(100, "tx").addValue(200, "rx"))

See how it works in the example: examples/1-Basic.cxx.

Advanced features

Metric verbosity

There are 3 verbosity levels (the same as for backends): Debug, Info, Prod. By default it is set to Verbosity::Info. The default value can be overwritten using: Metric::setDefaultVerbosity(verbosity). To overwrite verbosity on per metric basis use third, optional parameter to metric constructor:

Metric{10, "name", Verbosity::Prod}

Metrics need to match backends verbosity in order to be sent, eg. backend with /info verbosity will accept Info and Prod metrics only.

Buffering metrics

In order to avoid sending each metric separately, metrics can be temporary stored in the buffer and flushed at the most convenient moment. This feature can be controlled with following two methods:

monitoring->enableBuffering(const std::size_t maxSize)
...
monitoring->flushBuffer();

enableBuffering takes maximum buffer size as its parameter. The buffer gets full all values are flushed automatically.

See how it works in the example: examples/10-Buffering.cxx.

Calculating derived values

This feature can calculate derived values. To do so, use optional DerivedMetricMode mode parameter of send method:

send(Metric&& metric, [DerivedMetricMode mode])

Two modes are available:

  • DerivedMetricMode::RATE - rate between two following values,
  • DerivedMetricMode::INCREMENT - sum of all passed values.

The derived value is generated only from the first value of the metric and it is added to the same metric with the value name suffixed with _rate, _increment accordingly.

See how it works in the example: examples/4-RateDerivedMetric.cxx.

Global tags

Global tags are added to each metric sent using given monitoring instance. Two tags: hostname and name (process name) are set as global by default.

You can add your own global tag by calling addGlobalTag(std::string_view key, std::string_view value) or addGlobalTag(tags::Key, tags::Value).

Process monitoring

This feature provides basic performance status of the process. Note that is runs in separate thread (without mutex).

enableProcessMonitoring([interval in seconds]);

The following metrics are generated every interval:

  • cpuUsedPercentage - percentage of a core usage over time interval
  • involuntaryContextSwitches - involuntary context switches over time interval
  • memoryUsagePercentage - ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage (Linux only)

StdOut backend output format

[METRIC] <name>,<type> <values> <timestamp> <tags>

The prefix ([METRIC]) can be changed using query component.

Regex verbosity policy

Overwrite metric verbosity using regex expression:

Metric::setVerbosityPolicy(Verbosity verbosity, const std::regex& regex)

System monitoring, server-side backends installation and configuration

This guide explains manual installation. For ansible deployment see AliceO2Group/system-configuration gitlab repo.

About

The monitoring module for ALICE O2

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors