Enable the Datadog-Ceph integration to:
- Track disk usage across storage pools
- Receive service checks in case of issues
- Monitor I/O performance metrics
The Ceph check is included in the Datadog Agent package, so you don't need to install anything else on your Ceph servers.
Edit the file ceph.d/conf.yaml in the conf.d/ folder at the root of your Agent's configuration directory.
See the sample ceph.d/conf.yaml for all available configuration options:
init_config:
instances:
- ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
use_sudo: true # only if the ceph binary needs sudo on your nodesIf you enabled use_sudo, add a line like the following to your sudoers file:
dd-agent ALL=(ALL) NOPASSWD:/path/to/your/ceph
Available for Agent >6.0
-
Collecting logs is disabled by default in the Datadog Agent, enable it in your
datadog.yamlfile:logs_enabled: true
-
Next, edit
ceph.d/conf.yamlby uncommenting thelogslines at the bottom. Update the logspathwith the correct path to your Ceph log files.logs: - type: file path: /var/log/ceph/*.log source: ceph service: <APPLICATION_NAME>
Run the Agent's status subcommand and look for ceph under the Checks section.
See metadata.csv for a list of metrics provided by this integration.
Note: If you are running ceph luminous or later, you will not see the metric ceph.osd.pct_used.
The Ceph check does not include any events.
ceph.overall_status:
The Datadog Agent submits a service check for each of Ceph's host health checks.
In addition to this service check, the Ceph check also collects a configurable list of health checks for Ceph luminous and later. By default, these are:
ceph.osd_down:
Returns OK if your OSDs are all up. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.osd_orphan:
Returns OK if you have no orphan OSD. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.osd_full:
Returns OK if your OSDs are not full. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.osd_nearfull:
Returns OK if your OSDs are not near full. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.pool_full:
Returns OK if your pools have not reached their quota. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.pool_near_full:
Returns OK if your pools are not near reaching their quota. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.pg_availability:
Returns OK if there is full data availability. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.pg_degraded:
Returns OK if there is full data redundancy. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.pg_degraded_full:
Returns OK if there is enough space in the cluster for data redundancy. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.pg_damaged:
Returns OK if there are no inconsistencies after data scrubing. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.pg_not_scrubbed:
Returns OK if the PGs were scrubbed recently. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.pg_not_deep_scrubbed:
Returns OK if the PGs were deep scrubbed recently. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.cache_pool_near_full:
Returns OK if the cache pools are not near full. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.too_few_pgs:
Returns OK if the number of PGs is above the min threshold. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.too_many_pgs:
Returns OK if the number of PGs is below the max threshold. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.object_unfound:
Returns OK if all objects can be found. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.request_slow:
Returns OK requests are taking a normal time to process. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
ceph.request_stuck:
Returns OK requests are taking a normal time to process. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.
Need help? Contact Datadog support.
