End-to-end setup for monitoring NVMe SSD health using smartctl, Prometheus, and Grafana. Ideal for dedicated servers (Hetzner, OVH, etc).
This repository helps you:
- ✅ Check NVMe SSD health manually using
smartctl - ✅ Collect SMART metrics with Prometheus via
smartctl_exporter - ✅ Trigger alerts for early SSD failure detection
- ✅ Visualize health metrics in Grafana
- ✅ Replace disks before they die unexpectedly
smartctlfromsmartmontoolssmartctl_exporter(Go-based Prometheus exporter)- Prometheus
- Grafana
| Folder | Contents |
|---|---|
grafana/ |
Grafana dashboard JSON with all SSD panels |
prometheus/ |
Prometheus alert rules (smartctl_alerts.yaml) |
systemd/ |
Ready-to-use smartctl_exporter systemd unit file |
-
Install smartmontools
sudo apt install smartmontools
-
Clone and build smartctl_exporter
git clone https://github.com/prometheus-community/smartctl_exporter cd smartctl_exporter go build -
Install the exporter binary
sudo cp smartctl_exporter /usr/local/bin/ -
Use the systemd unit from systemd/ Enable it with:
sudo systemctl daemon-reload sudo systemctl enable smartctl_exporter sudo systemctl start smartctl_exporter -
Configure Prometheus
- Add job config from prometheus/
- Add alert rules from smartctl_alerts.yml to rule_files
-
Import Grafana dashboard
- Use the file grafana/ssd_dashboard.json