There are host templates and service templates. Host templates control settings for host objects and service templates control settings for service objects. A host object in Nagios is something that you want to monitor. A service object is a service that is running on a host. There are .cfg files that contain these definitions. The nagios.cfg file in /usr/local/nagios functions as a main config file that calls the include cfg files in /usr/local/nagios/objects.
I like to start small. Get the stock localhost stuff working first. Once you have good cfgs for a single server, you can branch out and add some more servers. Best practice is to set as many of the settings as possible via templates so changes are centralized later on down the road. Create your hosts. Group them into hostgroups and then monitor services by hostgroup. Hostgropus also make the “Host Groups” screen a nice default screen to look at.
A little foresight and planning can save hours. I try to cut and paste from the examples to create my own definitions. Only create one new cfg file at a time. You can check your work with this command
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Read the verification results closely. They can be very misleading. Usually the first error found is the culprit. It just gets worse from there.
“use” is awesome in a definition. use causes inheritance, so if you’ve got something close, use takes all of that defined object and gives those attributes to your new definition. Read through templates.cfg. You’ll see a generic-host definition get “use”d by the Linux object and the Windows object, each time they add more detail information to the basic generic-host object. So, following their lead, I copied the linux-servers definition into my new one called raspberries. Then, for each of my raspberries, I can define it as a raspberry with the use command.
Nagios Raspberry Template
define host { name raspberry use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period workhours notification_interval 120 notification_options d,u,r contact_groups admins register 0 }
I know this is a template because the last line tells Nagios not to register this object as a real host. After I’ve defined the host template I then define a host. But, because I’ve set so many things up already in the template, I only need to define a few things on my Host object.
define host { use raspberry host_name salt alias salt address 192.168.1.51 check_interval 15 }
Notice I’m referencing the Host Template from within the Host object via the use variable. Any settings that are set on the Host, and the Template, will be overridden by the settings on the host itself. So if I define my check_interval on both the host (15) object and the template (5), the host objects settings will override the settings within the template, and the check_interval will be set to 15. Maybe I decide, in the future, to make this the standard for all my raspberries. Easy fix. Move the check_interval line out of this host definition and into the raspberry template!
The same basic concept goes for services. The way I do it is to have a template for every type of host I’m going to configure. Routers get one template, servers get another template, and switches have their own host templates. The only things that I configure for the host objects themselves are name, IP, template, and parents. Parents are usually switches because if they die, they take the connected hosts with them. If you set things up like this, later on down the road you may have a few hundred devices of the same type you’re monitoring, and you want/need to change let’s say the check interval within Nagios for all those objects. If you’re using templates correctly, that can be as simple as making a single change on a template.
Backwards
It works in the other direction, as well. Say I have a definition for a service check that runs on all the servers.
define service { use generic-service ; Name of service template to use hostgroup_name raspberries host_name !trans,!salt,!minecraft,!openhabian service_description HTTP check_command check_http _port_number 80 notifications_enabled 0 }
It checks to see that port 80 on all servers in the raspberry hostgroup are accepting requests. Four of my servers in that group don’t have a web server, so these tests fail. Since most of the raspberries do provide web services, I’ll check for HTTP on all of them in the hostgroup, but skip the ones with the ! prefix in the host_name line.
Changing Defaults in Nagios
You might notice the “_port_number” line. Many of the web servers run on nonstandard ports: 8080, 9001, 81, etc. You can handle this by adding the _port_number switch and specifying the web port in the host definition. Once you’ve got the object definitions squared away, you’ll also need to reference that argument in the service definition. My HTTP check in command.cfg looks like this:
define command {command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ -p $_SERVICEPORT_NUMBER$ $ARG1$
}
RTFM
Templates have inheritance and this can be tricky. Read this.
One way to use inheritance is to use multiple templates to get granular in the settings. Perhaps you want to use one template for your host check intervals, and another for your alert settings. You can have as many cfg files as you like, just make sure you reference them in nagios.cfg or they won’t get noticed when the process starts.
Another good reference is this doc for how the configuration files on the backend relate to each other. I’m including this because it can be a helpful doc to have on hand for reference.