Service AggregationMay 25. 2009
What is service aggregation?If you have many checks on a host then our display can get very confusing. Check_mk allows you to aggregate groups of services together into new services. That is similar but not quite the same as check_multi does. Aggregation in check_mk creates additional services on additional artifical hosts. For each host that has aggregates services check_mk automatically creates a second host: the summary host. Its name is that of the normal with a -s appended. All aggregated services of a host are mapped to its summary host. The state of an aggregated service is that of the worst underlying service. How aggregations can be usedCompact visualization of overall stateAggregations allow you to create a compact and yet concise overview over hosts with many services. You might want to define a service group with only your summary hosts and thus can visualize the complete state of that hosts on one screen. High-level notificaitonBecause summary hosts and aggregated services are just usual hosts and services from a Nagios point of view you can use them for notifications. Because Nagios notifications are triggered by state changes, an aggregated service sends out a notification when the first of its underlying services gets critical. No further notifications are triggered when further underlying services get critical. Configuration of service aggregationsService aggregation is controlled by the variable service_aggregations. For each aggregated service you define a name (used as Nagios' service description), a list of hosts for that the aggregation is valid and finally one regular expression pattern for the underlying services that should be aggregated. The list of host names may be optionally prepended with a host tag list. Consider the following example: main.mk service_aggregations = [ ( "Logfiles", ALL_HOSTS, "LOG "), ( "Services", ALL_HOSTS, "service"), ( "Filesystems", [ "linux" ], ALL_HOSTS, "fs_" ), ( "Other", ALL_HOSTS, "" ), ] This example defines four aggregated services:
That's all you have to do. Now try out the check with a dry run: root@linux# check_mk -nv localhost CPU load OK - 0.11 Number of threads OK - 58 threads fs_/ OK - 24.0% used (0.8 of 3.7 GB), (levels at 80.0/90.0) fs_/data CRIT - 73.0% used (0.1 of 0.1 GB), (levels at 60.0/70.0) Disk IO read OK - 0.0MB/s (in last 50 secs) Disk IO write OK - 0.0MB/s (in last 50 secs) CPU utilization OK - user: 0%, system: 1%, wait: 0% LOG /var/log/kern.log OK - no old or new error messages LOG /var/log/messages OK - no old or new error messages Memory used OK - 3.2% of RAM (21 MB) used by processes NIC eth0 counters OK - Receive: 0.00 MB/sec - Send: 0.00 MB/sec Aggregates Services: Filesystems fs_/data CRIT - 73.0% used (0.1 of 0.1 GB), (levels at 60.0/70.0) Logfiles OK - 2 services OK Other OK - 7 services OK OK - Agent Version 1.0.28, processed 11 host infos As you can see, three aggregated services have been created. The service Filesystems is critical, because one of the underlying filesystems is critical. After check_mk -U and restarting Nagios you'll see both the real and the summary host in Nagios:
Nagios configuration of summary hostsYour aggregated services and summary hosts do not automatically have the same host or service groups or contact groups. The configuration of those is done separately by special configuration variables:
Notes:
Dynamic aggregated services via pattern extractionSometimes you want to aggregate services not by their types but by the instance of software they belong to. You might want to have one aggregated service for each ORACLE instance running on a machine, that should summarized all logfiles, table spaces and so on belonging to that instance. This can be done by enclosing that part of the service description into brackets that identifies the instance. The following example groups together all services that begin with NIC, have the name of a network interface afterwards and - after another sapce - the type of the service: service_aggregations = [ ( "NIC_%s", ALL_HOSTS, "^NIC ([^ ]+) .*"), ] Consider that you have the following underlying services: NIC eth0 counters NIC eth0 link NIC eth0 parameter NIC eth1 counters NIC eth1 link NIC eth1 parameter Then you'll get the two aggretated services NIC_eth0 and NIC_eth1 each consisting of three underlying services. Our second example does a very similar thing with a more complex regular expression: service_aggregations = [ ( "DB %s", ALL_HOSTS, "^(LOG.*alert_|DB_|Tablespace )([^+][^._]+).*" ), ] Examples for services that get aggregated to DB K51: LOG oraclealert_K51.trans.log DB_K51 Tablespace K51.M1 Tablespace K51.M2 Notes:
Showing the state of Check_MKAs of version 1.1.0 it is possible to show the exit status of the active Check_MK check as well in the summary host. In order to activate this feature, simply set: main.mk aggregate_check_mk = True You will see a new service Check_MK in each summary host showing the output and status of the Check_MK check of the real host. Multiline outputNew in 1.1.6: When aggregating services, Check_MK usually just outputs OK - 7 services OK if everything is OK and the texts of the non-OK services otherwise. As of version 1.1.6 it is possible to switch to an alternate output format. If you set... aggregation_output_format = "multiline" ...then Check_MK will use a Nagios "multiline" plugin output format. The first line of output will show the number of services in the various states, e.g. 6 services OK, 1 service CRIT. That line will be shown as the plugin output in the GUI. The additional lines show the detailed information of each individual underlying service - even those in a OK state. They are displayed by the GUI in service details. The long plugin output is available in the Nagios macro $LONGSERVICEOUTPUT and in Multisite via the column Long output of check plugin (multiline)). Note: From version 1.1.9i1 on, the default setting is "multiline". Further configurationYou do not like to append a -s to your hostnames? No problem. The pattern for summary hostnames can be configured via aggr_summary_hostname. Put a string there containing exactly one %s, which will be replaced with the real host name. The default setting is: aggr_summary_hostname = "%s-s" You can change the suffix into -SUMMARY with: aggr_summary_hostname = "%s-SUMMARY" |
||||||||||||