check_mk Xen agent update


I’ve done a cleanup of the xen check for nagios / check_mk. The old one did not correctly handle VMs that were down and would confuse your main check_mk active check.

You can find the current version at: https://bitbucket.org/darkfader/nagios/src/8c95fbe779f2

This is now sorted out and working very well, even including “unstarted” VMs. I did most of the test using libvirtd though and now disabled it to investigate run time etc. The local agent plugin can use a lot of work too. My script skeleton only takes 0.059s to run on a very slow host. But every call to “xm” takes about 0.55 seconds.

It seems even less a python issue than just slowness of the Xenstore and was well documented in the following list post to xen-devel. Seems Daniel tried to wake up people for two times but without success. I’ll try to verify the xenstore performance issue and try using the ramdisk hack, too.

http://xen.1045712.n5.nabble.com/Revisiting-XenD-XenStored-performance-scalability-issues-td2504870.html

Thats how it looks like on a 1.5GHz box:

[root@davexh0001 ~]#   time for i in `seq 1 1000` ; do virsh list > /dev/null  ; done
real    29m36.072suser    0m8.509ssys     0m15.545s

On the “making things better” track, I’ve also written to the xen-devel list with a lot of questions which will hopefully help me implement the next few features.

Advertisements

6 thoughts on “check_mk Xen agent update

  1. Hi Florian. I feel I must be doing something wrong with your xen check plugin. The output in the check_mk_agent report just looks like this:
    <<>>
    vm Usage: running
    vm List running
    vm -l, running
    vm –label running
    mem 16382 6830

    This is on a Centos 5.5 host with one running guest. Seen this before?
    Thanks
    Dan

    • Yes,

      this is a known issue – the CentOS Xen version is very old – I wrote / tested the plugin on OracleVM.
      CentOS doesn’t have the Xen option that allows filtering of the output by using –state=running.

      Right now, just remove state=running from the line. This isn’t the full fix iirc – I’m not sure if it would pick up states like crashed or paused…
      I’ll upload the version we’re using on CentOS tonight then you’ll be able to compare.

  2. With the new plugin, I get a list of running VMs, with stopped ones not visible. There is also the Host Memory Used item that aggregates all VM memory. Is that correct function?
    Thanks, it’s useful info.

  3. Yes,

    time i’ll package the plugin so it includes the man page 😉
    As to the memory functions:
    It’s supposed to plot out the total memory, the amount used by dom0 and the sum of what all VMs (except dom0) use.
    That way I separate the data for “overhead” (dom0 and hypervisor) and “payload” (user vms).
    Once the 2 separate grahps work, the next thing will be plotting different rrds of the VMs:
    Their respective ressource usage versus the system total.

    I’ll be at the xen hackathon next month just to find out how to get these counters. I still expect it to be very tricky to get the right data out of xen.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s