check_mk magic dust for ignoring stuff ;)


I have promised the people on irc (#check_mk on freenode) that I’d try to re-figure how to do the magic stuff with ignoring checks and services.

This isn’t perfectly documented for a few reasons:

  • It’s right in the FAQ, but people don’t realize they’re looking at the solution even when they see it. (The Example is not very general, and you have to grok the power of python and check_mk to understand it is indeed applyable to all checks. In practice you can only understand it if you already know it.)
  • The power users understand the features,or maybe they even had us add that feature – and they’re the ones who give the most input for documentation (take that as a hint, ask on the mailing list if you read some part of the documentation 3-4 times, tried for weeks and aren’t getting further)
  • if you’re working with check_mk all day, then this is something you simply use.
  • It’s hard to explain.

Today I remembered the way it should work because the issue at hand made something “click” inside my head. Good thing to finally be back to simply typing down check_mk syntax. I get to do this far too seldom now, and had been trying to find how we had done this for like 5 months now. I didn’t have time to sit down and test, and I didn’t want to ask because finding out for myself has advantages when doing the documentation for this.

I figure it would help to do some explanations and a sketch explaining the check syntax down there but I don’t know a good graphics program for this, also it’s almost midnight.

If someone has a good tool for doing comic-like text comments or knows how the awesome sketchbook for google’s chrome launch was made, then I’d be glad to hear. Otherwise I’ll try using my girlfriends Wacom tablet some time soon.

Here a snippet from irc with two lessons:

23:21 < k0rupted> any tips on easily excluding all /*/.snapshot NFS mounts? I
can do /example/.snapshot with
inventory_df_exclude_mountpoints. But not sure if its
possible to exclude anything with .snapshot
23:24 < darkfaded> err yes that just a regex thing
23:24 < darkfaded> but i *do not* really understand regex
23:24 < darkfaded> (my boss would laugh now)

LESSON 1:

Check_MK is open source, and when dealing with syntax questions, it can help to look at the parser.

Lets look at the check_mk source and find out what we don’t remember:

23:26 < darkfaded> i’ll tell you where to look
23:26 < darkfaded> go to check_mk git
23:26 < darkfaded> then to modules
23:26 < darkfaded> then view check_mk.py
23:26 < darkfaded> look for anything with ignore
23:26 < darkfaded> then see ignored_services and ignored_checks
23:27 < darkfaded> i think here we need to make a definition that says

LESSON 2:

The non-comic, no graphical explanation right here and now:

23:28 < darkfaded> ignored_services += [ ( [ “filer”, “!snapmirror_filer_DCA” ],
nfs_mounts, <regex that matches on any string containing
snapshot> ) ]

ignored_services is a list (of ignored checks, obviously) (it might be ignored_checks or ignored_services, i’m not sure and I dont got time to check that now.  += means we’ll append something to that list (making it longer by one more ignored check). If we used = then we’d be overwriting it and that would be fail. 🙂 it being a list the thing we’re appending must be put into []

The thing we’re appending is a tuple – indicated by ()

This tuple is a standard check definition – just as if inventory would not exist.

It consists of 3 pieces (items)

The tags, or hostname it should apply to: filer, but not snapmirror_filer. This is a list, too. And you should also look and find out about the magically cool “ALL_HOSTS” there. (But don’t mix it up with the all_hosts = [] list!)

The name of the check concerned by this.

and finally the “service description” which has to base on the description given by inventory.

A side note:

Like many people I also used to think that the stuff in etc/check_mk/conf.d was a config file. Well sorry, it’s a lie.

In practice, thats just where we persist the internal python data structures that are used to represent the config in python syntax, so that they can be reloaded when we need to precompile the checks and generate the nagios config.

Some people might remember sendmail and m4, and there’s not thaaaaaat much difference. With power comes a syntax that is 100% representative of what the program knows.

I if you wonder if that is the best way of doing things, do like I did. Try to imagine a normal, human language config file that allows the same granularity.  It would be a nightmarish mess of many thousand lines for even a smaller scale setup.

Right, actually I’d seen that somewhere …

OMD[wartungsfenster]:~$ wc -l etc/nagios/conf.d/check_mk_*
 2458 etc/nagios/conf.d/check_mk_objects.cfg
  314 etc/nagios/conf.d/check_mk_templates.cfg
 2772 total
OMD[wartungsfenster]:~$ wc -l etc/check_mk/main.mk etc/check_mk/conf.d/*mk
   3 etc/check_mk/main.mk
  20 etc/check_mk/conf.d/mysqlmon.mk
   3 etc/check_mk/conf.d/wa_check_settings.mk
  74 etc/check_mk/conf.d/wa_processes.mk
 100 total

(and considering check_mk makes heavy use of a few  advanced nagios syntax tricks, otherwise it’d probably be even more – I never wanna go back there…….)

LESSON 3:

When you read the docs, also look at the FAQ 🙂 (just kidding, as I explained above the FAQ alone ain’t gonna do the trick – I also needed a long time to make the connection from “one thing in the FAQ that works for me” to “woah, THATS how it works!”


23:29 < darkfaded> have a look at the last entry in the check_mk faq
23:29 < darkfaded> i asked mathias the same thing 2 years ago thats where the
entry is from

LESSON 4:

don’t lag out of IRC at the wrong time.

23:29 -!- k0rupted [~k0rupted@ec2-50-16-198-57.compute-1.amazonaws.com] has
quit [Ping timeout: 245 seconds]
23:30 < darkfaded> lol now he dced

Please note: the key is to mix the inventory options, check parameter dictionaries and services definitions.

But that’s something we do consulting for! 🙂

http://mathias-kettner.de/nagios_support.html <<<- free off-hours commerciall

(this would have taken 15 minutes less if I had put it in confluence instead of wordpress, because brokeness of wordpress editor > brokenness of confluence editor)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s