Monitoring Windows with Check_MKNovember 09. 2011
The Windows AgentCheck_MK provides its own agent for monitoring Windows hosts: check_mk_agent.exe. This agent is being installed as a Windows services and has several advantages over NSClient++ (besides that fact, that it supports Check_MK, of course):
Where to findThe Windows agent will be installed into a directory that is configurable during setup of Check_MK. When using OMD you will find the agent in the directory via share/check_mk/agents/windows below your site directory. Installation of the agent1.1.13i3: The windows agent installerNew in 1.1.13i3: There is a installer available for the Windows agent now. For installing the windows agent using the new installer simply download the windows installer or copy the check-mk-agent-<version>.exe from the agent directory of your Check_MK installation to your Windows host and start it. The installer will ask you where to install and wether or not it should install and start the windows service. The default options should fit most users needs, so you can simply click through the installer. By default the agent will be installed to %PROGRAMFILES%\check_mk. After the installation has finished the service has been created and started. You can directly start monitoring your Windows host. The installer supports some command line arguments which are:
Manual Installation of the agentIf you don't like to use the agent installer you can also install the windows agent manually. The installation of the agent is easy. Just copy check_mk_agent.exe to your Windows host into a conveniant directory and call it from a command shell with the option install: C:\some\directory\> check_mk_agent.exe install This will install a new Windows service called Check_MK_Agent. This service can be started with the Windows service manager or simply by entering: C:\some\directory\> net start check_mk_agent TestIf the agent is running properly you should be able to connect to the Windows host to TCP port 6556 from the Nagios host. You can test this for example with telnet: user@host> telnet windowshost 6556 <<<check_mk>>> Version: 1.1.13i1 AgentOS: windows WorkingDirectory: C:\some\directory ConfigFile: C:\some\directory\check_mk.ini AgentDirectory: C:\some\directory PluginsDirectory: C:\some\directory\plugins LocalDirectory: C:\some\directory\local OnlyFrom: 0.0.0.0/0 <<<uptime>>> 12227 <<<df>>> C:\ NTFS 62902472 18363880 44538592 30% C:\ Integration into check_mkThe integration of Windows hosts is nothing special and goes the usual way: add the host to all_hosts in main.mk and run the inventory with check_mk -I. After that update your Nagios configuration files with check_mk -U and restart Nagios:
Trying out the agent without installing itThe check_mk_agent allows you to try it out without installing it as a service. The simplest way is to call it with the option test. This does not open a TCP socket but simply output all current data to your console: C:\some\directory\> check_mk_agent.exe test <<<check_mk>>> Version: 1.0.29rc <<<df>>> C:\ NTFS 18434584 6559144 11875440 36% C:\ <<<ps>>> [System Process] System smss.exe csrss.exe winlogon.exe Another option is to start the agent with the option adhoc. No it will open TCP port 6556 and handle requests just like the service. It will do so until you abort it by pressing Control-C: C:\some\directory\> check_mk_agent.exe adhoc Listening for TCP connections on port 6556 Close window or press Ctrl-C to exit Information provided by the agentAs of version 1.1.12 the agent provides access to the following data:
Furthermore checks can be added by making use of external plugins written in VBS, command or other languages. Currently we ship the following plugins:
The agent can be configured to output arbitrary Windows performance counters. Check_MK currently only extracts disk throughput, CPU usage and MS Exchange queues. Further checks can be implemented without any changes to the agent. Configuring Checks in main.mkMost of the items to be checked are found by the inventory function. If you want to autodetect processes and services as well, some configuration in main.mk is needed. ProcessesThe output of the Windows agent is compatible with that of the Linux and UNIX agents with respect to the processes. Please refer to "How to monitor processes". ServicesIn order to monitor services you need first to determine, which services are of interest to you. The easiest way is to look at raw the output of the agent and look for the section <<<services>>>. You can use check_mk -d for this: user@host> check_mk -d winhostxy | fgrep -A 10 '<< The first column of the output is the exact internal name of the service. Let's say you want to check if ALG is running and host winhostxy. The put the following line into your checks variable: main.mk checks = [ ( 'winhostxy', 'services', 'ALG', None ), # some other checks... ] If you have a larger number of windows hosts it is a tedious and error prone work to define for each host which services you expect. Check_mk helps you by providing an inventory mechanism for services. All you have to do is to provide a list of relevant services. This list is global and needs only to be defined once in main.mk in the variable inventory_services. When during the inventory check_mk scans a Windows host it will look for such relevant services and automatically create a check for each one found running. Lets assume that the services TSMListener, Httpd and TapiSrv should always be monitored if found running on a machine. All you have to do is to add to your main.mk: main.mk inventory_services = ['TSMListener', 'Httpd', 'TapiSrv' ] At the next inventory all hosts where that services run will be detected and checks created automatically. Note: From version 1.1.11i2 on, inventory_services is much more flexible. It allows:
Please refer to the check manual of the services check for details. Eventlog MonitoringThe Windows agent sends output that is fully compatible with that of the Logwatch extension of the Linux/UNIX agent and is thus handled in the same way. For sake of simplicity there are some differences, nevertheless:
What does this mean in detail? When the agent it started (most probably at boot time of the host) it will try to seek to the current end of the Eventlogs and waits there for new records. Only records appearing while the agent is running will be sent to Nagios. If the agent stopped and started again, it theoretically could miss some messages. As the agent is running permanently this should not be a practical problem, though. Since the agent is completely configuration-less, it does no specific filtering of events. It simply looks for messages of type Warning or Error. If such a message is seen, then the complete check interval will be declared as relevant and the agent sends all messages of that logfile to Nagios that appeared since the previous check - even those of type Information. This allows the administrator to have more context information about the problem at hand on the Nagios server. If you want to suppress some messages or reclassify them from Warning to Critical or vice versa, you can define a message filter in main.mk. This is done by setting the variable logwatch_patterns. This is a Python dictionary with a key for each logfile. The value is a list of pairs: main.mk
logwatch_patterns = {
'System': [
( 'W', 'sshd' ),
( 'W', 'rebooting.*system' ),
( 'C', 'path link down' ),
( 'I', 'ORA-4711' )
],
'Application': [
( 'W', 'crash.exe' ),
( 'C', 'ssh' ),
( 'I', 'test.*failed' )
]
}
All patterns for a logfile are executed from first to last. The first match wins. The entry ( 'W', 'sshd' ) reclassifies all messages containing sshd to Warning. There are three possible types:
Note that the patterns are regular expressions. Thus the the entry ( 'I', 'test.*failed' ) reclassifies all messages containing the word test and later the word failed. Messages that do not match any pattern retain their classification from the agent. Messages that are classified as context messages by the agent are never reclassified. Host specific filtering of messagesAs of version 1.0.37 of check_mk, host specific message filtering is supported. That means, that you can have your reclassification in logwatch_patterns depend on the host where the message has been found. Host specific patterns include a host list, or a host tag list and a host list as first elements of the entry. This works quite similar to many other configuration variables. Please read more about host tags for details on that. The following example makes some of the patterns of the upper example host specific: main.mk
logwatch_patterns = {
'System': [
# reclassify only on host abc123
( ["abc123"], 'W', 'sshd' ),
# the following holds for all hosts
( 'C', 'path link down' ),
# reclassify message to "ignore" on all hosts with the tag "test"
( ["test"], ALL_HOSTS, 'I', 'ORA-4711' )
],
'Application': [
# Do not reclassify on host "testhost"
( ["!testhost"], 'W', 'crash.exe' ),
# make ssh critical on "dmz" hosts that do not have the tag "test"
( ["dmz", "!test"], ALL_HOSTS, 'C', 'ssh' ),
# this is for all hosts again
( 'I', 'test.*failed' )
]
}
1.1.13i1: Advanced agent configurationNew in 1.1.13i1: the eventlog monitoring of the Windows agent can now be configured. For each eventlog you can decide, which messages should be sent to Check_MK. The default is that all eventlogs are processed and messages of the types warning or critical (or security failures) are being sent. If you create a file called check_mk.ini in the agent directory then you can configure which eventlogs and which levels to process. Here is an example: check_mk.ini
[logwatch]
# From the Application log send only critical messages
logfile application = crit
# From the Security log send all messages
logfile security = all
# Do not process other event logs at all
logfile * = off
Note: When setting a logfile to all, then also informational message are being sent. As long as you do not reclassify them via logwatch_patterns, those message will not trigger any alarm, nevertheless. Setting a logfile to off will disable processing of that eventlog at all. Reading application specific eventlog can have an impact on the stability of the agent if the application has bugs in their eventlog implementation. Performance Counters, monitoring MS Exchange 1.1.11i1New in 1.1.11i1: Several Windows checks are based on Performance Counters. These are special objects provided by the Windows operating system that contain information about throughput, queue lengths, latencies and other numbers of the system and applications like MS Exchange. Performance counters are grouped into Counter Objects. Within the operation system each object has a unique ID. Unfortunately IDs for applications (like MS Exchange) are not fixed but vary from server to server. In the registry there is a translation between those number and names - but the names are in the local installation language and thus not portable either. This is very sad. And it makes some configuration neccessary if you want to make use of all of the agent's features. One good thing is - however - that some basic counter objects seem to have fixed IDs. This is at least the case for the counters needed for monitoring the CPU utilization and the disk throughput. As of version 1.1.11i1 Check_MK ships ready-to-use checks for
If you want to make use of the MS Exchange checks, you first have to determine the ID of the counter object MSExchangeTransport Queues. In order to find that, first open a command box (DOS box) and dump the complete counter information into the file counters.ini: C:\> lodctr /s:counters.ini Now you can view that file with Notepad or use find on Windows - which does essentially the same as grep on Linux - and watch out for MSExchangeTransport Queues: C:\> find "MSExchangeTransport Queues" counters.ini ---------- COUNTERS.INI [PERF_MSExchangeTransport Queues] 10332=MSExchangeTransport Queues If you prefer analysing that file under Linux then you first need to convert it to UTF-8 (it is UTF-16 litte-endian encoded!). This can be done on Linux with: user@host> recode UTF-16LE..UTF-8 counters.ini In this example the ID of the counter object is 10332. This ID be now be configured in check_mk.ini. Create this file in the same directory as where check_mk_agent.exe is installed with the following content: check_mk.ini
[winperf]
counters = 10332:msx_queues
Now restart the agent. When you retrieve the agent output from your monitoring host, you should now get an additional section <<<winperf_msx_queues>>> with a content similar to this one: cmk -d YOURHOST <<<winperf_msx_queues>>> 12947263852.75 10332 1 instances: _total 2 0 rawcount 4 0 rawcount 6 0 rawcount 8 0 rawcount 10 0 rawcount 12 0 rawcount 14 0 rawcount 16 0 rawcount 18 895 rawcount 20 895 counter 22 0 rawcount 24 0 counter After that when you inventorize the host with cmk -I new checks for several MS Exchange mail queues should appear. For details please consult the man page of winperf_msx_queues. Extending the Windows agentPlugins and local checksAs of version 1.1.7i3 or later - the Check_MK agent for windows can be extended just as the Unix agents with local checks and plugins. Local checks are (usually simple) scripts or programs performing self written checks and computing the results directly on the target machines. Plugins are scripts or programs that output agent sections similar to those builtin in the agent. Several such scripts are shipped together with the agent and are found in the subdirectory plugins of where the agent is found. In order to use such plugins you need to:
One example of such a plugin is wmicchecks.bat, which uses wmic in order to output a list of processes with their ressource consumption: wmicchecks.bat @echo off echo ^<^<^<wmic_process:sep^(44^)^>^>^> wmic process get name,pagefileusage,virtualsize,workingsetsize,usermodetime,kernelmodetime /format:csv In order to make use of that agent information, your installation of Check_MK needs a check that can process that data. The checks needed for the shipped plugins are part of Check_MK. A tutorial for writing your own checks can be found here. WARNING: Windows' concept of launching other programs as subprocesses is sometimes hard to grasp for people used to Unix-like operating systems. So please make sure, that in your local or plugins directory nothing lies around that is not executable or that opens any window when being tried to execute. In particular - when a text file is there, Windows will open nodepad.exe as a subprocess of the agent on the Windows console. As long as the notepad is running, the agent is hanging and cannot even be killed an restarted. The solution in such a case is (in order to avoid a reboot):
As of version 1.1.13i1 the agent does not any longer execute all files in local and plugins, but does first check the file name extension. Per default all files except those with the suffix txt or dir are being executed. If you want you can specify an explicit list of extensions to executed. This is done in the optional file check_mk.ini in the [global] section: check_mk.ini
[global]
# Execute only files with the following extensions
execute = bat exe vbs ps1
Note: this settings are not being honored by MRPE (see below). MRPEAs of version 1.1.13i1 the Check_MK agent for Windows finally supports MRPE, MK's remote plugins executor. This allows to run classical Nagios plugins locally on a Windows host and gives you access to hundred of checks ready to use that circle around in the Internet. MRPE on Windows does not make use of an own configuration file but uses check_mk.ini. Put your plugins to execute into the section [mrpe] in the following format: check_mk.ini
[mrpe]
# Run classical Nagios plugins. The word before the command
# line is the service description for Nagios. Use backslashes
# in Windows-paths.
check = Dummy mrpe\check_crit
check = IP_Configuration mrpe\check_ipconfig 1.2.3.4
check = Whatever c:\myplugins\check_whatever -w 10 -c 20
They check_crit dummy is shipped with Check_MK and can be used for your tests. It simply fails with a Nagios state of CRITICAL and outputs one line of text. Please Note that just like in the Unix version of MRPE, the service description must not contain any space. General agent configurationRestricting AccessAs of version 1.1.9i1, the Check_MK agent for Windows allows the restriction of the access based on IP addresses - just like xinetd for Linux. If you want to make use of that feature, you need to create a configuration file check_mk.ini which must be in the same directory as the agent. The restriction is configured by the variable only_from in the section [global]: check_mk.ini
[global]
only_from = 127.0.0.1 10.0.0.0/16 192.168.56.17
You may add up to 32 IP addresses or networks with the slash-notation. If you do not configure only_from or leave it empty, then all clients are allowed. Make sure, that you restart the agent after any change in the configuration file: C:\some\directory\> net stop check_mk_agent C:\some\directory\> net start check_mk_agent Please note that the agent cannot inhibit a TCP connect, because it cannot check the IP address of the remote site before the connection is accepted. If a disallowed remote host has connected, the agent immediately closes the connection. The client thus sees a successfull connection but will read zero data. This is exactly the way xinetd behaves. You can also define matching scopes in the "windows firewall with advanced security" or IP Security rules. Sections to executeAs of version 1.1.13i1 the agent supports configuring which sections should be executed. This is done in the [global] section with the configuration option sections. Example:
[global]
sections = check_mk uptime ps df mem
In order for Check_MK to get general information about the agent, you should always enable the section check_mk. Host specific configurationDepending on how you manage your windows servers, you might need host-specific configuration settings in check_mk.ini. As of version 1.1.13i1 Check_MK offers a method for keeps settings for different hosts in a single check_mk.ini that can be distributed to all of your hosts. The basis for this is the network hostname of the server. Let's look at the following example:
[winperf]
host = zmsex?? zexchange*
counters = 10332:msx_queues
The directive counters = 10332:msx_queues will only be executed on hosts with a name beginning with zexchange or those made of zmsex and two further characters. The jokers * and ? can be used like for filenames. Any number of patterns can be added after the host = . This restriction holds until the end of the section:
[winperf]
# Counters for some selected hosts
host = zmsex0? zexchange*
counters = 10332:msx_queues
# Counters for some other hosts
host = zmsex1?
counters = 10320:msx_queues
[logwatch]
# This will be executed for all hosts:
logfile * = off
You also can switch of the restriction by setting a filter for *:
[logwatch]
# Logfiles for hosts ending in prod:
host = *prod
logfile security = warn
# This holds for all hosts:
host = *
logfile application = off
Note: The match for the hostname is done case insensitive. |
| |||||||||||||||||