The Open Monitoring Distribution (OMD) allows you to have multiple “sites” each consisting of configureable elements a Nagios (or Icinga, Shinken, Check_MK Microcore) instance, apache webserver and other tools.
Each site can be started/stopped individually, allowing you to take them offline for maintenance or have them in a cluster for failover.
The main apache on a system uses reverse proxies to let you access the “sites” and has always been able to tell you if a site wasn’t started at the moment.
This is done via a 503 ErrorDocument handler in the file “apache-own.conf”. It’s a nice feature but has a huge drawback if you run a kiosk mode browser for showing the monitoring dashboard on a TV or tablet (like me).
Once that page is displayed you’re out. You’ll never see that the site is back up.
I know 3 cases where this commonly becomes an issue:
- Bootup of Nagios server with local terminal
- Cluster failovers
- Apache dies
The second one is the most annoying:
- You have a GUI displaying valid info.
- one of the servers has a problem and it triggers a failover
- autorefresh kicks in and you get dropped to the 503 page
- Cluster failover finishes
- but nothing gets you back in.
Now, the fix is so easy you won’t believe it:
In apache-own.conf of your site, change the following:
ErrorDocument 503 “<h1>OMD: Site Not Started</h1>You need to start this site in order to access the web interface.”
ErrorDocument 503 “<META HTTP-EQUIV=\”refresh\” CONTENT=\”30\”><h1>OMD: Site Not Started</h1>You need to start this site in order to access the web interface.”
Restart the system apache (/etc/init.d/apache2 restart for most of us) and it’ll work.
I tried to develop a dev mindset, but found I like it when stuff really works.