ceph

Ceph Integration

Overview

Enable the Datadog-Ceph integration to:

Track disk usage across storage pools
Receive service checks in case of issues
Monitor I/O performance metrics

Setup

Installation

The Ceph check is included in the Datadog Agent package, so you don't need to install anything else on your Ceph servers.

Configuration

Edit the file ceph.d/conf.yaml in the conf.d/ folder at the root of your Agent's configuration directory. See the sample ceph.d/conf.yaml for all available configuration options:

init_config:

instances:
  - ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
    use_sudo: true               # only if the ceph binary needs sudo on your nodes

If you enabled use_sudo, add a line like the following to your sudoers file:

dd-agent ALL=(ALL) NOPASSWD:/path/to/your/ceph

Log collection

Available for Agent >6.0

Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:
```
  logs_enabled: true
```
Next, edit ceph.d/conf.yaml by uncommenting the logs lines at the bottom. Update the logs path with the correct path to your Ceph log files.
```
  logs:
    - type: file
      path: /var/log/ceph/*.log
      source: ceph
      service: <APPLICATION_NAME>
```
Restart the Agent.

Validation

Run the Agent's status subcommand and look for ceph under the Checks section.

Data Collected

Metrics

See metadata.csv for a list of metrics provided by this integration.

Note: If you are running ceph luminous or later, you will not see the metric ceph.osd.pct_used.

Events

The Ceph check does not include any events.

Service Checks

ceph.overall_status:
The Datadog Agent submits a service check for each of Ceph's host health checks.

In addition to this service check, the Ceph check also collects a configurable list of health checks for Ceph luminous and later. By default, these are:

ceph.osd_down:
Returns OK if your OSDs are all up. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.osd_orphan:
Returns OK if you have no orphan OSD. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.osd_full:
Returns OK if your OSDs are not full. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.osd_nearfull:
Returns OK if your OSDs are not near full. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.pool_full:
Returns OK if your pools have not reached their quota. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.pool_near_full:
Returns OK if your pools are not near reaching their quota. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.pg_availability:
Returns OK if there is full data availability. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.pg_degraded:
Returns OK if there is full data redundancy. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.pg_degraded_full:
Returns OK if there is enough space in the cluster for data redundancy. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.pg_damaged:
Returns OK if there are no inconsistencies after data scrubing. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.pg_not_scrubbed:
Returns OK if the PGs were scrubbed recently. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.pg_not_deep_scrubbed:
Returns OK if the PGs were deep scrubbed recently. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.cache_pool_near_full:
Returns OK if the cache pools are not near full. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.too_few_pgs:
Returns OK if the number of PGs is above the min threshold. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.too_many_pgs:
Returns OK if the number of PGs is below the max threshold. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.object_unfound:
Returns OK if all objects can be found. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.request_slow:
Returns OK requests are taking a normal time to process. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

ceph.request_stuck:
Returns OK requests are taking a normal time to process. Otherwise, returns WARNING if the severity is HEALTH_WARN, else CRITICAL.

Troubleshooting

Need help? Contact Datadog support.

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
datadog_checks		datadog_checks
images		images
tests		tests
CHANGELOG.md		CHANGELOG.md
MANIFEST.in		MANIFEST.in
README.md		README.md
manifest.json		manifest.json
metadata.csv		metadata.csv
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Ceph Integration

Overview

Setup

Installation

Configuration

Log collection

Validation

Data Collected

Metrics

Events

Service Checks

Troubleshooting

Further Reading

FilesExpand file tree

ceph

Directory actions

More options

Directory actions

More options

Latest commit

History

ceph

Folders and files

parent directory

README.md

Ceph Integration

Overview

Setup

Installation

Configuration

Log collection

Validation

Data Collected

Metrics

Events

Service Checks

Troubleshooting

Further Reading