Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.pod

nagios_smart_logs

A Nagios plugin to monitor for errors reported by SMART to alert on failing hard drives.

I'd long used the good old check_ide_smart plugin from Nagios itself, but it relies upon SMART detecting and reporting problems well and doesn't monitor the SMART logs for failed self-tests etc.

I've seen a few drives failing and throwing IO errors, all the while SMART happily proclaims:

SMART overall-health self-assessment test result: PASSED

Similarly, check_ide_smart reported no problems:

[dave@devvps:~]$ sudo /usr/lib/nagios/plugins/check_ide_smart -n /dev/sdb
OK - Operational (18/18 tests passed)

I can no longer trust it alone.

So, this plugin requests the error log (using smartctl -l errors and alarms if errors are found:

[dave@devvps:~]$ ./nagios_smart_log -d /dev/sdb
CHECKSMARTLOG CRITICAL - 5 SMART errors, latest: Error 13 occurred at disk
power-on lifetime: 29769 hours (1240 days + 9 hours)

Far more useful.

Consider check_scsi_smart instead

Update Feb 2024: I was having problems with check_ide_smart on a drive in an external enclosure, where running it directly would work fine, but when run via Nagios it would fail with CRITICAL - SMART_CMD_ENABLE.

I couldn't see what was going on easily, and a Google turned up several people with the same problem, including:

Icinga #2725
monitoring-plugins#1104
Debian #690760

The Debian report suggested the problem was fixed in Wheezy and Jessie packages, but I was still seeing it on my Debian 12 (Buster) box with monitoring-plugins package version 2.3.3-5+deb12u2.

I saw check_scsi_smart recommended as a more modern alternative to check_ide_smart; it works a treat for me, and also can monitor the SMART logs as well, replacing the need for this plugin.

So, I'd recommend you give it a go first, and props to @spjmurray for it.