The Myth of nines?

In the Open Source Tool Chains for Cloud Computing presentation ( ) from OSCon2010 they show one slide about system availability labeled “the myth of nines”. The title, and much more, the footnotes try to show that 6 nines availability is not really possible, and monitoring with 5 minute intervalls might present a picture that differs from the true availabilty. The worst argument they give is something like  “no administrator can react within 5 minutes”

Until that slide I was reading the thing keen and interested. Right now I’m just sad how lack of information can lead people into talking FUD.

A simple example for the 6nines story:

  • noone said the availability might not be active/active clustered
  • kexec was not build to do dumps with kdump – it was built to restart an upgraded or half-crashed Linux within one or two seconds.
  • errors can be handled automatically, (plus every good admin will be right on a problem within a minute from the coffee maker to having run basic perf data gathering on the affected system)
  • there was a good example on the nagios forums one day – a guy from a german stock exchange came along with some questions about availability monitoring. They generate their SLA records on a few 10k leased lines with per  second granularity.
  • there are a few industries where 5 and 6 nines availability is reality, and has been for years already.

dear cloud guys, your lack of knowledge about real HA is not something that qualifies you to decide it doesn’t exist.


