How to Configure Nagios on a Raspberry

There are host templates and service templates. Host templates control settings for host objects and service templates control settings for service objects. A host object in Nagios is something that you want to monitor. A service object is a service that is running on a host. There are .cfg files that contain these definitions. The nagios.cfg file in /usr/local/nagios functions as a main config file that calls the include cfg files in /usr/local/nagios/objects.

I like to start small. Get the stock localhost stuff working first. Once you have good cfgs for a single server, you can branch out and add some more servers. Best practice is to set as many of the settings as possible via templates so changes are centralized later on down the road. Create your hosts. Group them into hostgroups and then monitor services by hostgroup. Hostgropus also make the “Host Groups” screen a nice default screen to look at.

A little foresight and planning can save hours. I try to cut and paste from the examples to create my own definitions. Only create one new cfg file at a time. You can check your work with this command

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Read the verification results closely. They can be very misleading. Usually the first error found is the culprit. It just gets worse from there.

“use” is awesome in a definition. use causes inheritance, so if you’ve got something close, use takes all of that defined object and gives those attributes to your new definition. Read through templates.cfg. You’ll see a generic-host definition get “use”d by the Linux object and the Windows object, each time they add more detail information to the basic generic-host object. So, following their lead, I copied the linux-servers definition into my new one called raspberries. Then, for each of my raspberries, I can define it as a raspberry with the use command.

Nagios Raspberry Template

define host {
     name                            raspberry
     use                             generic-host
     check_period                    24x7
     check_interval                  5
     retry_interval                  1
     max_check_attempts              10
     check_command                   check-host-alive
     notification_period             workhours
     notification_interval           120
     notification_options            d,u,r
     contact_groups                  admins
     register                        0
 }

I know this is a template because the last line tells Nagios not to register this object as a real host. After I’ve defined the host template I then define a host. But, because I’ve set so many things up already in the template, I only need to define a few things on my Host object.

define host {
     use                     raspberry
     host_name               salt
     alias                   salt
     address                 192.168.1.51
     check_interval          15
 }

Notice I’m referencing the Host Template from within the Host object via the use variable. Any settings that are set on the Host, and the Template, will be overridden by the settings on the host itself. So if I define my check_interval on both the host (15) object and the template (5), the host objects settings will override the settings within the template, and the check_interval will be set to 15. Maybe I decide, in the future, to make this the standard for all my raspberries. Easy fix. Move the check_interval line out of this host definition and into the raspberry template!

The same basic concept goes for services. The way I do it is to have a template for every type of host I’m going to configure. Routers get one template, servers get another template, and switches have their own host templates. The only things that I configure for the host objects themselves are name, IP, template, and parents. Parents are usually switches because if they die, they take the connected hosts with them. If you set things up like this, later on down the road you may have a few hundred devices of the same type you’re monitoring, and you want/need to change let’s say the check interval within Nagios for all those objects. If you’re using templates correctly, that can be as simple as making a single change on a template. 

Backwards

It works in the other direction, as well. Say I have a definition for a service check that runs on all the servers.

define service {
     use                     generic-service           ; Name of service template to use
     hostgroup_name          raspberries
     host_name               !trans,!salt,!minecraft,!openhabian
     service_description     HTTP
     check_command           check_http
     _port_number            80
     notifications_enabled   0
 }

It checks to see that port 80 on all servers in the raspberry hostgroup are accepting requests. Four of my servers in that group don’t have a web server, so these tests fail. Since most of the raspberries do provide web services, I’ll check for HTTP on all of them in the hostgroup, but skip the ones with the ! prefix in the host_name line.

Changing Defaults in Nagios

You might notice the “_port_number” line. Many of the web servers run on nonstandard ports: 8080, 9001, 81, etc. You can handle this by adding the _port_number switch and specifying the web port in the host definition. Once you’ve got the object definitions squared away, you’ll also need to reference that argument in the service definition. My HTTP check in command.cfg looks like this:

define command {
   command_name    check_http 
   command_line    $USER1$/check_http -I $HOSTADDRESS$ -p $_SERVICEPORT_NUMBER$ $ARG1$
 }

RTFM

Templates have inheritance and this can be tricky. Read this.

One way to use inheritance is to use multiple templates to get granular in the settings. Perhaps you want to use one template for your host check intervals, and another for your alert settings. You can have as many cfg files as you like, just make sure you reference them in nagios.cfg or they won’t get noticed when the process starts.

Another good reference is this doc for how the configuration files on the backend relate to each other. I’m including this because it can be a helpful doc to have on hand for reference.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.