MK LivestatusRequired version: 1.1.0
February 02. 2010
How to access Nagios status dataAccessing status data todayThe classical way of accessing the current status of your hosts and services is by reading and parsing the file status.dat, which is created by Nagios on a regular basis. The update interval is configured via status_update_interval in nagios.cfg. A typical value is 10 seconds. If your installation is getting larger, you might have to increase this value in order to minimize CPU usage and disk IO. The nagios web interface uses status.dat for displaying its data. Parsing status.dat is not very popular amongst developers of addons. So many use another approach: NDO. This is a NEB module that is loaded directly into the Nagios process and sends out all status updates via a UNIX socket to a helper process. That creates SQL statements and updates various tables in a MySQL or PostgreSQL database. This approach has several advantages over status.dat:
Unfortunately, however, NDO has also some severe shortcomings:
The FutureSince version 1.1.0, Check_MK offers a completely new approach for accessing status and also historic data: Livestatus. Just as NDO, Livestatus make use of the Nagios Event Broker API and loads a binary module into your Nagios process. But other then NDO, Livestatus does not actively write out data. Instead, it opens a socket by which data can be retrieved on demand. The socket allows you to send a request for hosts, services or other pieces of data and get an immediate answer. The data is directly read from Nagios' internal data structures. Livestatus does not create its own copy of that data. Beginning from version 1.1.2 you are also be able retrieve historic data from the Nagios log files via Livestatus. This is not only a stunningly simple approach, but also an extremely fast one. Some advantages are:
On the same time, Livestatus provides its own query language that is simple to understand, offers most of the flexibility of SQL and even more in some cases. It's protocol is fast, light-weight and does not need a binary client. You can even get access from the shell without any helper software. The PresentLivestatus is still a young technology, but already many addons support Livestatus as data source or even propose it as their default. Here is a (incomplete) list of addons with Livestatus support:
Please mail us if you thinks something should be added to this list. Setting up and using LivestatusAutomatic setupThe typical way to setup Livestatus is just to answer yes when asked by the Check_mk setup. Important is, that you have all tools installed that are needed for compiling C++ programs. Those are at least:
The script setup.sh compiles a module called livestatus.o and copies it into /usr/lib/check_mk (if you didn't change that path). It also adds two lines to your nagios.cfg, which are needed for loading the module. After that you just need to restart Nagios and a Unix socket with the name live should appear in the same directory as you Nagios command pipe. Manual setupThere are several situation in which a manual setup is preferrable, for example:
For manually setting up Livestatus, you can download the source code independently of Check_MK at the download page. Unpack the tarball at a conveniant place and change to the newly created directory: root@linux# wget 'http://www.mathias-kettner.de/download/mk-livestatus-1.1.6p1.tar.gz' root@linux# tar xzf mk-livestatus-1.1.6p1.tar.gz root@linux# cd mk-livestatus-1.1.6p1 Now let's compile the module. Livestatus uses a standard configure-script and is thus compiled with ./configure && make. user@host> ./configure checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking for g++... g++ checking for C++ compiler default output file name... a.out checking whether the C++ compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes ... and so on, until: configure: creating ./config.status config.status: creating Makefile config.status: creating src/Makefile config.status: creating config.h config.status: config.h is unchanged config.status: executing depfiles commands If you are running on a multicore CPU you can speed up compilation by adding -j 4 or -j 8 to make: user@host> make -j 8 g++ -DHAVE_CONFIG_H -I. -I.. -I../nagios -fPIC -g -O2 -MT livestatus_so-AndingFil... g++ -DHAVE_CONFIG_H -I. -I.. -I../nagios -fPIC -g -O2 -MT livestatus_so-ClientQue... g++ -DHAVE_CONFIG_H -I. -I.. -I../nagios -fPIC -g -O2 -MT livestatus_so-Column.o ... g++ -DHAVE_CONFIG_H -I. -I.. -I../nagios -fPIC -g -O2 -MT livestatus_so-ColumnsCo... g++ -DHAVE_CONFIG_H -I. -I.. -I../nagios -fPIC -g -O2 -MT livestatus_so-ContactsC... g++ -DHAVE_CONFIG_H -I. -I.. -I../nagios -fPIC -g -O2 -MT livestatus_so-CustomVar... g++ -DHAVE_CONFIG_H -I. -I.. -I../nagios -fPIC -g -O2 -MT livestatus_so-CustomVar... ... and so on.. After successful compilation, a make install will install a single filed named livestatus.o into /usr/local/lib/mk-livestatus and the small program unixcat into /usr/local/bin (as usual, you can change paths with standard options to configure): root@linux# make install Making install in src make[1]: Entering directory `/d/nagvis-dev/src/mk-livestatus-1.1.6p1/src' make[2]: Entering directory `/d/nagvis-dev/src/mk-livestatus-1.1.6p1/src' test -z "/usr/local/bin" || /bin/mkdir -p "/usr/local/bin" /usr/bin/install -c 'unixcat' '/usr/local/bin/unixcat' test -z "/usr/local/lib/mk-livestatus" || /bin/mkdir -p "/usr/local/lib/mk-livestatus" /usr/bin/install -c -m 644 'livestatus.so' '/usr/local/lib/mk-livestatus/livestatus.so' ranlib '/usr/local/lib/mk-livestatus/livestatus.so' /bin/sh /d/nagvis-dev/src/mk-livestatus-1.1.6p1/install-sh -d /usr/local/lib/mk-livestatus /usr/bin/install -c livestatus.o /usr/local/lib/mk-livestatus rm -f /usr/local/lib/mk-livestatus/livestatus.so make[2]: Leaving directory `/d/nagvis-dev/src/mk-livestatus-1.1.6p1/src' make[1]: Leaving directory `/d/nagvis-dev/src/mk-livestatus-1.1.6p1/src' make[1]: Entering directory `/d/nagvis-dev/src/mk-livestatus-1.1.6p1' make[2]: Entering directory `/d/nagvis-dev/src/mk-livestatus-1.1.6p1' make[2]: Nothing to be done for `install-exec-am'. make[2]: Nothing to be done for `install-data-am'. make[2]: Leaving directory `/d/nagvis-dev/src/mk-livestatus-1.1.6p1' make[1]: Leaving directory `/d/nagvis-dev/src/mk-livestatus-1.1.6p1' Your last task is to load livestatus.o into Nagios. Nagios is told to load that module and send all status updates event to the module by the following two lines to nagios.cfg: nagios.cfg broker_module=/usr/local/lib/mk-livestatus/livestatus.o /var/lib/nagios/rw/live event_broker_options=-1 The only mandatory argument is the complete path to the UNIX socket that Livestatus shall create (/var/lib/nagios/rw/live in our example). Please change that if needed. The best is probably to put it into the same directory as the Nagios pipe. Just as Nagios does with its pipe, Livestatus creates the socket with the permissions 0660. If the directory that the socket is located in has the SGID bit for the group set (chmod g+s), then the socket will be owned by the same group as the directory. After setting up Livestatus - either by setup.sh or manually - restart Nagios. Two things should now happen:
nagios.log [1256144866] livestatus: Version 1.1.6p1 initializing. Socket path: '/var/lib /nagios/rw/live' [1256144866] livestatus: Created UNIX control socket at /var/lib/nagios/rw/ live [1256144866] livestatus: Opened UNIX socket /var/lib/nagios/rw/live [1256144866] livestatus: successfully finished initialization [1256144866] Event broker module '/usr/local/lib/mk-livestatus/livestatus.o' initializ ed successfully. [1256144866] Finished daemonizing... (New PID=5363) [1256144866] livestatus: Starting 10 client threads [1256144866] livestatus: Entering main loop, listening on UNIX socket Options for nagios.cfgLivestatus understands several options, which can be added to the line beginning with broker_module:
Here is an example of how to add parameters: nagios.cfg broker_module=/usr/local/lib/mk-livestatus/livestatus.o /var/run/nagios/rw/live debug=1 Using LivestatusOnce your Livestatus module is setup and running, you can use its unix socket for retrieving live status data. Every relevant programming language on Linux has a way to open such a socket. We will show how to access the socket with the shell and with Python. Other programming languages are left as an excersize to the reader. Accessing Livestatus with the shellA unix socket is very similar to a name pipe, but has two important differences:
Livestatus ships a small utility called unixcat, which can communicate over a unix socket. It sends all data is reads from stdin into the socket and writes all data comming from the socket to stdout. The following command shows how to send a command to the socket and retrieve the answer - in that case a table of all of your hosts: root@linux# echo 'GET hosts' | unixcat /var/lib/nagios/rw/live acknowledged;action_url;address;alias;check_command;check_period;checks_ena bled;contacts;in_check_period;in_notification_period;is_flapping;last_check ;last_state_change;name;notes;notes_url;notification_period;scheduled_downt ime_depth;state;total_services 0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;Acht;check-mk-ping;;1;che ck_mk,hh;1;1;0;1256194120;1255301430;Acht;;;24X7;0;0;7 0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;DREI;check-mk-ping;;1;che ck_mk,hh;1;1;0;1256194120;1255301431;DREI;;;24X7;0;0;1 0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;Drei;check-mk-ping;;1;che ck_mk,hh;1;1;0;1256194120;1255301435;Drei;;;24X7;0;0;4 If you get that output, everything is working fine and you might want to continue reading with the chapter The Livestatus Query Language. Accessing Livestatus with PythonAccess from within Python does not need an external tool. The following example shows how to send a query, retrieve the answer and parse it into a Python table. After installing check_mk you find this program in the directory /usr/share/doc/check_mk: live.py
#!/usr/bin/python
#
# Sample program for accessing the Livestatus Module
# from a python program
socket_path = "/var/lib/nagios/rw/live"
import socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect(socket_path)
# Write command to socket
s.send("GET hosts\n")
# Important: Close sending direction. That way
# the other side knows, we are finished.
s.shutdown(socket.SHUT_WR)
# Now read the answer
answer = s.recv(100000000)
# Parse the answer into a table (a list of lists)
table = [ line.split(';') for line in answer.split('\n')[:-1] ]
print table
LQL - The Livestatus Query LanguageLQL - pronounced "Liquel" as in "liquid" - is a simple language for telling Livestatus what data you want and how it should be formatted. It does much the same as SQL but does it in another, simpler way. Its syntax reflects (but is not compatible to) HTTP. Each query consists of:
All keywords including GET are case sensitive. Lines are terminated by single linefeeds (no <CR>). The current version of Livestatus implements three tables (more will follow later):
Like in an SQL database all tables consist of a number of columns. If you query the table without any parameters, you retrieve all available columns in alphabetical order. The first line of the answer contains the names of the columns. Please note, that the available columns will change from version to version. Thus you must not depend on a certain order of the columns. Example: Retrieve all contacts: query_1 GET contacts Selecting which columns to retrieveWhen you write an application using Livestatus, you probably need the information just from selected columns. Add the header Columns to select which columns to retrieve. This also defines the order of the columns in the answer. The following example retrieves just the columns name and alias: query_2 GET contacts Columns: name alias If you want to test this with unixcat, a simple way is to put your query into a text file query and read that in using <: root@linux# unixcat < query /var/lib/nagios/rw/live check_mk;check_mk dummy contact hh;Harri Hirsch As you might have noticed from this example: if you use Columns: then no column headers will be output. You do not need them - as you have specified them yourselves. That makes parsing simpler. FiltersAn important concept of Livestatus is its ability to filter data for you. This is not only more conveniant than just retrieving all data and selecting the relevant lines yourself. It is also much faster. Remember, that Livestatus has direct access to all of Nagios' internal datastructures and can access them with the speed of native C. Filters are added by using Filter: headers. Such a header has three arguments: a column name, an operator and a reference value - all separated by spaces. The reference value - being the last one in the line - may contain spaces. Example: query_3 GET services Columns: host_name description state Filter: state = 2 That query gets all services with the current state 2 (critical). If you add more Filter: headers, you will see only data passing all of your filter. The next example outputs all critical services which are currently within their notification period: query_4 GET services Columns: host_name description state Filter: state = 2 Filter: in_notification_period = 1 The following eight operators are available:
Notes:
All operators can be negated by prefixing a !. Since !< would be the same as >=, this makes only sense for the equality operators:
Regular expression matchingThe operators ~ and ~~ do a match using POSIX extended regular expressions such as used by egrep. Some Linux distributions ship a manpage for those (man 7 regex). Livestatus always does a substring match. That means, that the text in question may appear somewhere in the text. You can use the anchors ^ and $ for matching the beginning or the end of the text. The following filter finds all services beginning with fs: Filter: description ~ ^fs Matching listsSome columns do not contain numbers or texts, but lists of objects. An example for that is the column contacts of hosts or services. It contains all contacts assigned to the data object by nagios. Currently there is only one operator defined on all lists: >=, which means contains. That way it is easy to restict the output of hosts or services to those items a certain contact is assigned to: query_5 GET services Columns: host_name description state contacts Filter: contacts >= harri Lists of host names also define the operator =, but currently only the comparison with the empty string. That way, you can filter out objects with (or with not) an empty list. The following query gets all hosts that do not have parents: GET hosts Columns: name Filter: parents = Matching attribute listsVersion 1.1.4 of Livestatus gives you access to the list of modified attributes of hosts, services and contacts. That way, you can query which attributes have been changed dynamically by the user and thus differ from the attributes configured in the Nagios object files. These new columns come in two variants: modified_attributes and modified_attributes_list. The first variant outputs an integer number representing a bit wise combination of Nagios' internal numbers. The second variant outputs a list of attribute names, such as notifications_enabled and active_checks_enabled. When you define a Filter, both column variants are handled in exactly the same way and both allow using the number or the list representation. They allow the following operators (and the negation of those):
Example 1: Find all hosts with modified attributes: GET hosts Columns: host_name modified_attributes_list Filter: modified_attributes != 0 Example 2: Find hosts where notification have been actively disabled: GET hosts Columns: host_name modified_attributes_list Filter: modified_attributes ~ notifications_enabled Filter: notifications_enabled = 0 Example 3: Find hosts where active or passive checks have been tweaked: GET hosts Columns: host_name modified_attributes_list Filter: modified_attributes ~~ active_checks_enabled,passive_checks_enabled Combining Filters with And, Or and NegatePer default a dataset must pass all filters if it wants to be output. In cases where alternatives are desirable, you can combine a number of filters with a logical "or" operation by using the header Or:. That header takes an integer number X as argument and comines the last X filters into a new filter using an "or" operation. The following example selects all services which are in state 1 or in state 3: GET services Filter: state = 1 Filter: state = 3 Or: 2 The next example shows all non-OK services which are within a scheduled downtime or which are on a host with a scheduled downtime: GET services Filter: scheduled_downtime_depth > 0 Filter: host_scheduled_downtime_depth > 0 Or: 2 It is also possible to combine filters with an And operation. This is only neccessary if you want to group filters together before "or"-ing them. Consider you want to get all services that are either critical and acknowledged or OK. This is how to do it: GET services Filter: state = 2 Filter: acknowledged = 1 And: 2 Filter: state = 0 Or: 2 The And: 2-header first combines with first two filters to one new filter. That is then "or"ed with the third filter. In version 1.1.11i2 the new header Negate: has been introduced. This logically negates the most recent filter. The following example outputs all hosts that have neither an a nor an o in their name: GET hosts Filter: name ~ a Filter: name ~ o Or: 2 Negate: Stats and CountsWhy counting?SQL has a statement "SELECT COUNT(*) FROM ..." which counts the number of rows matching a certain criteria. LQL's Stats:-Header allows something similar. In addition it can retrieve several counts at once. The Stats:-Header has the same syntax as Filter: but another meaning: Instead of filtering the objects it counts them. As soon as at least one Stats: header is used, no data is output anymore. Instead, one single row of data is output with one columns for each Stats:, showing the number of rows matching its criteria. The following example outputs the numbers of services which are OK, WARN, CRIT or UNKNOWN: query_6 GET services Stats: state = 0 Stats: state = 1 Stats: state = 2 Stats: state = 3 An example output looks like this: user@host> unixcat /var/lib/nagios/rw/live < query_6 4297;13;9;0 You want to restrict the output to services where the contact harri is assigned to? No problem, just add a Filter: header: query_7 GET services Stats: state = 0 Stats: state = 1 Stats: state = 2 Stats: state = 3 Filter: contacts >= harri Combining with and and orJust as the Filter headers, the Stats-headers can be combined with and and/or or operations. Important to know is, that they form their own stack. You combine them with StatsAnd and StatsOr. Here is a somewhat more complex query that scans all services of the service group windows which are within their notification period and are not within a host or service downtime. It computes seven counts:
GET services Filter: host_groups >= windows Filter: scheduled_downtime_depth = 0 Filter: host_scheduled_downtime_depth = 0 Filter: in_notification_period = 1 Stats: last_hard_state = 0 Stats: last_hard_state = 1 Stats: acknowledged = 0 StatsAnd: 2 Stats: last_hard_state = 1 Stats: acknowledged = 1 StatsAnd: 2 Stats: last_hard_state = 2 Stats: acknowledged = 0 StatsAnd: 2 Stats: last_hard_state = 2 Stats: acknowledged = 1 StatsAnd: 2 Stats: last_hard_state = 3 Stats: acknowledged = 0 StatsAnd: 2 Stats: last_hard_state = 3 Stats: acknowledged = 1 StatsAnd: 2 In version 1.1.11i2 the new header StatsNegate: has been introduced. It takes no arguments and logically negates the most recent stats-Filter. GroupingLetting Livestatus count items is nice and fast. But in our examples so far the answer was restricted to one line of numbers for a predefined set of filters. In some situations you want to get statistics for each object from a certain set. You might want to display a list of hosts and for each of those hosts the number of services which are OK, WARN, CRIT or UNKNOWN. In such situations you can add the Columns: header to your query. There is a simple and yet mighty notion behind it: You specify a list of columns of your table. The stats are computed and output separately for each different combination of values of those columns. The following query counts the number of service in the various states for each host in the host group windows: GET services Filter: host_groups >= windows Stats: state = 0 Stats: state = 1 Stats: state = 2 Stats: state = 3 Columns: host_name The output looks like this: winhost01;7;0;0;0 winhost02;7;0;1;0 srvabc44;7;0;1;0 srvabc45;2;0;1;0 termsv1;7;0;1;0 termsv2;3;0;1;1 As you can see, an additional column was prepended to the output holding the value of the group column. Here is another example that counts the total number of services grouped by the check command (the dummy filter expression is always true, so each service is counted). query GET services Stats: state != 9999 Columns: check_command Here is an example output of that query: root@linux# unixcat < query /var/lib/nagios/rw/live check-mk;14 check-mk-dummy;12 check-mk-inventory;14 check-mk-ping;2 check_mk-cpu.loads;2 check_mk-cpu.threads;2 check_mk-df;11 check_mk-diskstat;24 check_mk-ifoperstatus;7 check_mk-kernel.util;12 check_mk-local;6 check_mk-logwatch;4 check_mk-mem.used;13 check_mk-netctr.combined;12 check_mk-netif.link;2 check_mk-netif.params;2 A third example shows another way for counting the total number of services grouped by their states without an explicit Stat-header for each state: query GET services Stats: state != 9999 Columns: state And the output: root@linux# unixcat < query /var/lib/nagios/rw/live 0;113 1;2 2;28 In that example none of the services was in the state UNKNOWN. Hence no count for that state was output. One last note about grouping: The current implementation allows only columns of the types string or int to be used for grouping. Also you are limited to one group column. Note: prior to version 1.1.10 there was the header StatsGroupBy: instead of Columns:. That header is deprecated, though still working. Sum, Minimum, Maximum, Average, Standard DeviationStarting from version 1.1.2 Livestatus supports some basis statistical operations. They allow you for example to query for the average check execution time or the standard deviation of the check latency of all checks. These operations using one of the keywords sum, min, max, avg, std, suminv or avginv. The following query outputs the minimum, maximum and average check execution time of all service checks which are in state OK: query GET services Filter: state = 0 Stats: min execution_time Stats: max execution_time Stats: avg execution_time As with the "normal" stats-headers, the output can be grouped by one column, for example by the host_name: query GET services Filter: state = 0 Stats: min execution_time Stats: max execution_time Stats: avg execution_time Columns: host_name New in version 1.1.13i1 are the aggregation functions suminv and avginv. They compute the sum or the average of the inverse of the values. For example the inverse of the check_interval of a service is the number of times it is checked per minute. The suminv over all services is the total number of checks that should be executed per minute, if no checks are being delayed. Performance DataAs of version 1.1.11i2, MK Livestatus now supports aggregation of Nagios performance data. Performance data is additional information output by checks, formatted as a string like user=6.934;;;; system=6.244;;;; wait=0.890;;;;. If you create a Stats-query using sum, min, max, avg or std on several services with compatible performance data, then Livestatus will now aggregate these values into a new performance data string. Look at the following examples. First a query of two services without aggregation: query GET services Filter: description ~ CPU utilization Columns: perf_data Let's assume it produces the following output: user=7.594;;;; system=5.814;;;; wait=0.923;;;; user=6.934;;;; system=6.244;;;; wait=0.890;;;; Here is the same query which aggregates the data using the average: query GET services Filter: description ~ CPU util Stats: avg perf_data This is the result: system=6.02900000 user=7.26400000 wait=0.90650000 Output formatting and Character encodingCSV outputLivestatus supports three output formats: CSV, JSON and Python. The default is CSV. Datasets are separated by Linefeeds (ASCII 10), fields are separated by semicola (ASCII 59), list elements (such as in contacts) are separated by commas (ASCII 44) and combinations of host name and service description are separated by a pipe symbol (ASCII 124). In order to avoid problems with the default field separator semicolon appearing in values (such as performance data), it is possible to replace the separator characters with other symbols. This is done by specifying four integer numbers after the Separators: header. Each of those is the ASCII code of a separator in decimal. The four numbers mean:
It is even possible to use non-printable characters as separators. The following example uses bytes with the values 0, 1, 2 and 3 as separators: GET hosts Separators: 0 1 2 3 JSON outputYou can get your output in JSON format if you add the header OutputFormat: json, as in the following example: GET hosts Columns: name address state OutputFormat: json Like CSV, JSON is a text based format and it is valid JavaScript code. Furthermore, the JSON output produced by Livestatus is also completely valid Python code and can be parsed with the Python funktion eval(). In order to avoid redundancy and keep the overhead as low as possible, the output is not formatted as a list of objects (with key/value pairs), but as a list of lists (JSON speaks of arrays). Python outputPython format is very similar to JSON, but not 100% compatible. One difference is, that in Python all Unicode-Strings are prefixed with a small u. When you write extensions in Python, you can select the python output format and simply parse the response with an eval(). This does not introduce a dependency to JSON and also is faster, because it uses the builtin parser, which is written in native C: GET hosts Columns: name address state OutputFormat: python Character encodingLivestatus outputs data that origins in most cases from configuration files for Nagios (the object configuration). Nagios does not impose any restrictions of how these files have to be encoded (UTF-8, Latin-1, etc). If you select CSV output, then Livestatus simply returns the data as it is contained in the configuration files - with the same encoding. When using JSON or Python - however - non-ASCII-characters need to be escaped and properly encoded. Up to version 1.1.11i1, Livestatus automatically detects 2-Byte UTF-8 sequences and assumes all other non-ASCII characters to be Latin-1 encoded. While this works well for western languages and to a certain degree "auto-detects" the encoding, it does not support languages using other characters then those used in Latin-1. Even the €-Symbol is not working. As of version 1.1.11i2, Livestatus' behaviour is configurable with the option data_encoding and defaults now to UTF-8 encoding. Three different settings are valid:
Column headersPer default MK Livestatus outputs the names of all columns as a first line of the output, if there is no Columns-header in your query. With the header OutputColumns you can explicitely switch column headers on or off. The output to the following query will include column headers: GET hosts Columns: name alias address state ColumnHeaders: on Limitting the number of datasetsThe Limits-header allows you to limit the number of datasets being output. Since MK Livestatus currently does not support sorting, you'll have to live with the Nagios-internal natural sorting of objects. Hosts, for example, are sorted according to their host names - just as in the standard CGIs. The following example will output just the first 10 hosts: GET hosts Limit: 10 Please note, that the Limit-header is also applied when doing Stats. I'm not sure if there is any use for that but thats the way MK Livestatus behaves. The following example will count how many of the 10 first hosts are up: GET hosts Stats: state = 0 Limit: 10 If using filters, the Limit-header limits the number of datasets actually being output. The following query outputs the first 10 hosts which are down: GET hosts Filter: state = 1 Limit: 10 AuthorizationSince version 1.1.3, Livestatus supports addon developers by helping to implement authorization. You can let Livestatus decide wether a certain contact may see data or not. This is very simple to use. All you need to do is, to add an AuthUser header to your query with the name of a Nagios contact as single argument. If you do that, Livestatus will only output data that contact is a contact for - either directly or via a contact group. Example: GET services Columns: host_name description contacts AuthUser: harri In certain cases it would be possible to replace AuthUser with a Filter header. But that does not work (precisely) in all situations. ConfigurationIf your addon uses AuthUser then the administrator has a way to configure authentication details via nagios.cfg - and thus can do this uniformely across all addons using Livestatus. Currently two configuration options are available. Both can be set either to strict or loose:
Tables supporting AuthUserThe following tables support the AuthUser header (other simply ignore it): hosts, services, hostgroups, servicegroup and log. The log-table applies the AuthUser only to entries of the log classes 1 (host and service alerts), 3 (notifications) and 4 (passive checks). All other classes are not affected. LimitationsCurrently the AuthUser-header only controls which rows of data are output and has no impact on list columns, such as the column groups in the table services. That means that that column also lists service groups the contact might not be a contact for. This might be changed in a future version of Livestatus. WaitingStarting at version 1.1.3 Livestatus has a new and still experimental feature: Waiting. Waiting allows developers of addons to delay the execution of a query until a certain condition gets true or a Nagios event happens. This allows the implementation of a new class of features in addons, for example:
All that can be implemented without polling - and in a very simple way. All you have to do is to make of some new query headers:
The following triggers are available for the WaitTrigger-Header:
ExamplesRetrieve log messages since a certain timestamp, but wait until at least one new log message appeared: GET log Filter: time >= 1265062900 WaitTrigger: log The same, but do not wait longer than 2 seconds: GET log Filter: time >= 1265062900 WaitTrigger: log WaitTimeout: 2000 Retrieve the complete data about the host xabc123, but wait until its state is critical: GET hosts WaitObject: xabc123 WaitCondition: state = 2 WaitTrigger: state Filter: host_name = xabc123 Get data about the service Memory used on host xabc123, as soon as it has been checked some time after 1265062900: GET services WaitObject: xabc123 Memory used WaitCondition: last_check > 1265062900 WaitTrigger: check Filter: host_name = xabc123 Filter: description = Memory used Compensating timezone differencesWhen doing multi-national distributed monitoring with Livestatus you might have to deal with situations where your monitoring servers are running in different time zones. In an ordinary setup all servers would have the same system time but only configured different time zones. You can check this by calling on each monitoring server: user@host> date +%s That command should output the same value on all servers. If not, you've probably set your system to a wrong time zone. MK Livestatus can help to compensate the time difference in such situations. If you add the header Localtime: 1269886384 to your query with your current local time (the output of date +%s) as an argument, Livestatus will compare its local time against that of the caller and convert all timestamps accordingly. Please note, that Livestatus assumes that a difference in time is not due to clock inaccuracy but due to timezone differences. The delta time computed for compensating will be rounded to the nearest half hour. Response HeaderIf something in your request is not valid or an other error appears, a message is printed into the logfile of Nagios. If you want to write an API that displays error message to the user, you need information about errors as a part of the response. You can get such behaviour by using the header ResponseHeader. It can be set to off (default) or to fixed16: GET hosts ResponseHeader: fixed16 Other types of response headers might be implemented in future versions. The fixed16-header has the advantage that it is exactly 16 bytes long. That makes it easy to program an API. You simply can read in 16 bytes and need not scan for a newline or stuff like that. Here is a complete example session with response headers being activated: user@host> unixcat /var/lib/nagios/rw/live GET hirni ResponseHeader: fixed16 404 43 Invalid GET request, no such table 'hirni' The fixed16 response header has the following format:
These are the possible values of the status code:
The reponse contains the queried data only if the status code is 200. In all other cases the reponse contains the error message. The length field tells the length of the error message including the trailing linefeed in that case. It is not JSON-encoded, even if you set that in the OutputFormat-header. Keep alive (persistent connections)MK Livestatus allows you to keep open a connection and reuse it for several requests. I order to do that you need to add the following header: KeepAlive: on Livestatus will keep open the connection after sending its response and wait for a new query in that case. You probably also will activate a response header in that case, since only that allows you to exactly determine the length of the response (without KeepAlive you can simply read until end of file). KeepAlive: on ResponseHeader: fixed16 Please note, that keeping up a connection permanently occupies ressources within the Nagios process. In the current version Livestatus is limited to ten parallel persistent connections. This is different from the way persistent database connections are handled. The proposed way to use persistent connections in web applications is to keep the connection open only during the current request and close it after the complete result page has been rendered. The reason is that bringing up a database connection is a much more costly operation than connecting to MK Livestatus. Access to LogfilesSince version 1.1.1 Livestatus provides transparent access to you Nagios logfiles, i.e. nagios.log and the rotated files in archives (you might have defined an alternative directory in nagios.cfg). Livestatus keeps an index over all log files and remembers which period of time is kept in which log file. Please note that Livestatus does not depend on the name of the log files (while Nagios does). That way, Livestatus has no problem if the log file rotation interval is changed. The Livestatus table log is your access to the logfiles. Every log message is represented by one row in that table. Performance issuesIf your monitoring system is running for a couple of years, the number of log files and entries can get very large. Each Livestatus query to the table log has the potential of scanning all historic files (although an in-memory cache tries to avoid reading files again and again). It is thus crucial that you use Filter: in order to restrict:
If you set no filter on the column time, then all logfiles will be loaded - regardless of other filters you might have set. Setting a filter on the column class restricts the types of messages loaded from disk. The following classes are available:
RRD Files of PNP4NagiosNew in 1.1.9i3: In order to improve the integration between Multisite and PNP4Nagios, Livestatus introduces the new column pnpgraph_present in the tables hosts and services (and all other tables containing host_ or service_ columns). That column can have three possible values:
Livestatus cannot detect the base directory to your RRD files automatically, so you need to configure it with the module option pnp_path: nagios.cfg broker_module=/usr/local/lib/mk-livestatus/livestatus.o \ pnp_path=/var/lib/pnp4nagios/perfdata /var/lib/nagios/rw/live event_broker_options=-1 In order to determine Livestatus the availability of the PNP graph it checks for the existance of PNPs .xml file. A note for OMD users: OMD automatically configures this option correctly in etc/mk-livestatus/nagios.cfg. You need at least a daily snapshot of 2010-12-17 or later for using the new feature. Expansion of macrosNagios allows you to embed macros within your configuration. For example it is usual to embed $HOSTNAME$ and $SERVICEDESC$ into your action_url or notes_url when configuring links to a graphing tool. As of version 1.1.1 Livestatus supports expansion of macros in several columns of the table hosts and services. Those columns - for example notes_url_expanded - bear the same name as the unexpanded columns but with _expanded suffixed. Since macro expansion is very complex in Nagios. And unfortunately the Nagios code for that is not thread safe, so Livestatus has its own implementation of macros, which does not support all features of Nagios, but (nearly) all that are needed for visualization addons. Livestatus supports the following macros:
Remote access to Livestatus via SSH or xinetdLivestatus via SSHLivestatus current does not provide a TCP socket. Another (and more secure) way for remote accessing the unix socket is using SSH. The following example sends a query via SSH. The only priviledge the remote user needs, is write access to the unix socket: user@host> ssh < query nagios@10.0.0.14 "unixcat /var/lib/nagios/rw/live" ZWEI;NIC eth0 link;2 ZWEI;NIC eth0 parameter;2 Zwei;NIC eth0 link;2 Zwei;NIC eth0 parameter;2 laptop;Check_MK;2 laptop;Interface eth5;2 laptop;Interface eth6;2 laptop;Interface eth7;2 localhost;FILES_in_/bin;2 Livestatus via xinetdUsing xinetd and unixcat you can bind the socket of Livestatus to a TCP socket. Here is an example configuration for xinetd: /etc/xinetd.d/livestatus
service livestatus
{
type = UNLISTED
port = 6557
socket_type = stream
protocol = tcp
wait = no
# limit to 100 connections per second. Disable 3 secs if above.
cps = 100 3
# set the number of maximum allowed parallel instances of unixcat.
# Please make sure that this values is at least as high as
# the number of threads defined with num_client_threads in
# etc/mk-livestatus/nagios.cfg
instances = 500
# limit the maximum number of simultaneous connections from
# one source IP address
per_source = 250
# Disable TCP delay, makes connection more responsive
flags = NODELAY
user = nagios
server = /usr/bin/unixcat
server_args = /var/lib/nagios/rw/live
# configure the IP address(es) of your Nagios server here:
# only_from = 127.0.0.1 10.0.20.1 10.0.20.2
disable = no
}
You can access your socket for example with netcat: user@host> netcat 10.10.0.141 6557 < query_6 4297;13;9;0 TimeoutsIn version 1.1.7i3 the handling of timeouts has changed. There are now two configurable timeouts which protect Livestatus from broken clients hanging on the line for ever (remember that the maximum number of parallel connections is configurable but limited):
A Livestatus connection has two states: either Livestatus is waiting for a query. This is the case just after the client has connected, but also in KeepAlive-mode after the response has been sent. The client has now at most idle_timeout ms for starting the next query. The default is set to 300000 (300 seconds, i.e. 5 minutes). If a client is idle for more then that, Livestatus simply closes the connection. As soon as the first byte of a query has been read, Livestatus enters the state "reading query" and uses a much shorter timeout: the query_timeout. Its default value is 10000 (10 secs). If the client does not complete the query within this time, the client is regarded dead and the connection is closed. Both timeout values can be configured by Nagios module options in nagios.cfg. A timeout can be disable by setting its value to 0. But be warned: Broken clients can hang connections for ever and thus block Livestatus threads. Sending commands via LivestatusMK Livestatus supports sending Nagios commands. This is very similar to the Nagios command pipe, but very useful for accessing a Nagios instance via a remote connection. You send commands via the basic request COMMAND followed by a space and the command line in exactly the same syntax as needed for the Nagios pipe. No further header fields are required nor allowed. Livestatus keeps the connection open after a command and waits for further commands or GET-requests. It behaves like GET with KeepAlive: set to yes. That way you can send a bunch of commands in one connection - just as with the pipe. Here is an example of sending a command from the shell via unicat:
root@linux# echo "COMMAND [$(date +%s)] START_EXECUTING_SVC_CHECKS" \
| unixcat /var/lib/nagios/rw/live
Just as with GET, a query is terminated either by closing the connection or by sending a newline. COMMAND automatically implies keep alive and behave like GET when KeepAlive is set to on. That way you can mix GET and COMMAND quries in one connection. 1.1.9i3 Timeperiod transitionsVersion 1.1.9i3 introduces a new little feature, that does not really have something to do with status queries but is very helpful for creating availability reports and was easy to implement in Livestatus (due to its timeperiod cache). Each time a timeperiod changes from active to not active or vice versa, an entry in the Nagios logfile is being created. At start of Nagios the initial states of all timeperiods are also logged. This looks like this: nagios.log [1293030953] TIMEPERIOD TRANSITION: 24x7;-1;1 [1293030953] TIMEPERIOD TRANSITION: none;-1;0 [1293030953] TIMEPERIOD TRANSITION: workhours;-1;1 When a transition occurs one line is logged (here the state changed from 1 (in) to 0 (out). [1293066460] TIMEPERIOD TRANSITION: workhours;1;0 With that information, it is later possible to determine, which timeperiods were active when an alert happened. That way you can make availability reports reflect only certain time periods. Stability and PerformanceStabilityWhile early versions of MK Livestatus experianced some stability issues - not unusual for evolving software - nowadays it can be considered rock solid. There are no known problems with performance, crashes or a hanging Nagios, as long as two important requirements are fullfilled:
PerformanceLivestatus behaves with respect to your CPU and disk ressources. It doesn't do any disk IO at all, in fact - as long as the table log is not accessed, which needs read access to the Nagios' log files. CPU is only consumed during actual and queries and even for large queries we rather speek of micro seconds then of milli seconds of CPU usage. Furthermore, Livestatus does not block Nagios during the execution of a query but is running totally in parallel - and scales to all available CPU cores if neccessary. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||