If you are looking for a dead simple way to monitor ALL hardware on Dell servers look no further than IPMI on the iDRAC cards.
The basic steps to get Nagios polling the iDRAC card for hardware sensors:
- Boot the server with a keyboard and monitor attached.
- Enter setup by pressing F2 in the BIOS POST.
- Select the iDrac configuration.
- Configure basic networking to get the iDRAC card online.
Once the iDrac card is online, go to the web interface by entering the URL:
http://your-idrac.example.com/
Under the network settings configure the IPMI settings:
In the User Configuration create a new user with IPMI User privileges:
Nagios setup:
In this example the Nagios server is running Ubuntu server 14.04.x LTS. I will assume Nagios core and plugins are already installed.
You can try the stock check_ipmi_sensors script, but I could not get it to work. It always failed with the following error:
jemurray@nagios:~$ /usr/lib/nagios/plugins/check_ipmi_sensor -H 192.168.28.12 -U lom -P lom-password -L user
ipmi_ctx_open_outofband: session timeout
-> Execution of ipmimonitoring failed with return code 1.
-> ipmimonitoring was executed with the following parameters:
/usr/sbin/ipmi-sensors -h 192.168.28.12 -u lom -p lom-password -l user --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors
When I downloaded the lastest plugin from: http://www.thomas-krenn.com/en/wiki/IPMI_Sensor_Monitoring_Plugin - everything worked just fine:
jemurray@selleck:~/check_ipmi_sensor_v3-96ed86b$ ./check_ipmi_sensor -H 192.168.28.12 -U lom -P lom-password -L user -x 55 -x 84 -x 86 -x 87
IPMI Status: OK | 'Fan1 RPM'=2400.00 'Fan2 RPM'=2400.00 'Fan3 RPM'=2280.00 'Fan4 RPM'=2280.00 'Fan5 RPM'=2160.00 'Fan6 RPM'=2160.00 'Inlet Temp'=22.00 'Exhaust Temp'=31.00 'Temp'=35.00 'Temp'=39.00 'Current 1'=0.60 'Current 2'=0.20 'Voltage 1'=208.00 'Voltage 2'=208.00 'Pwr Consumption'=112.00
Copy the newly downloaded plugin to /usr/local/bin.
Create the following Nagios plugin command. They key thing to note here is the _HOSTIPMI_IP macro that is defined in the command and then in the host definition:
# Example Syntax: ./check_ipmi_sensor -H 192.168.0.120 -U lom -P lom-nagios-password -L user -x 32 -x 52 -x 82
define command{
command_name check_ipmi_sensor
command_line /usr/local/bin/check_ipmi_sensor -H $_HOSTIPMI_IP$ -U $ARG1$ -P $ARG2$ -L user $ARG3$
}
Create the following Nagios host and service definitions:
define host {
host_name statseeker.example.com
alias statseeker.example.com
address 192.168.28.49
_ipmi_ip 192.168.28.12
parents dc-eps-0.example.com
hostgroups server
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
max_check_attempts 10
notification_interval 0
notification_period 24x7
notification_options d,u,r
contact_groups systems
icon_image base/linux40.png
statusmap_image base/linux40.png
icon_image_alt Server
}
define service {
service_description IPMI-statseeker.example.com
host_name statseeker.example.com
check_command check_ipmi_sensor!lom!my-lom-password!-x 55 -x 84 -x 86 -x 87
notification_interval 0
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
normal_check_interval 5
retry_check_interval 1
max_check_attempts 4
notification_period 24x7
notification_options w,u,c,r
contact_groups systems
}