- Two data replicas must never be on the same node, no matter if different FS on the host are “under” the Gluster volumes.
- A node should be able to fail and storage layer should handle the error within less than 1 minute.
- A node should be able rejoin after failure and ‘heal’
- Different volumes should be defined as per the speed / raid level of the underlying FS.
(i.e. for spotcloud customers it might make sense to just run a raid0-setup as a “broken” vm could be reinstanced, no data lost. Plus they’re not getting any SLA, which means the prices will be quite too low to offer unwanted redundancy. On the other hand I don’t like the idea of having any unmirrored data. No raid just means a disk failure can trash your system, affecting all the other stuff that is mirrored 🙂
- A node should be able to lose its connectivity and recover from that in less than 2 minutes
(each VM will be setup with a udev rule to set /sys/block/<dev>/device/timeout to 120s. Allow 20 seconds for failure detection and i.e. infiniband linkup and some reserve, and the rest of the time is for GlusterFS to do it’s thing.