GlusterFS requirements #2


GlusterFS:

  • Two data replicas must never be on the same node, no matter if different FS on the host are “under” the Gluster volumes.
  • A node should be able to fail and storage layer should handle the error within less than 1 minute.
  • A node should be able rejoin after failure and ‘heal’
  • Different volumes should be defined as per the speed / raid level of the underlying FS.

(i.e. for spotcloud customers it might make sense to just run a raid0-setup as a “broken” vm could be reinstanced, no data lost. Plus they’re not getting any SLA, which means the prices will be quite too low to offer unwanted redundancy. On the other hand I don’t like the idea of having any unmirrored data. No raid just means a disk failure can trash your system, affecting all the other stuff that is mirrored 🙂

  • A node should be able to lose its connectivity and  recover from that in less than 2 minutes

(each VM will be setup with a udev rule to set /sys/block/<dev>/device/timeout to 120s. Allow 20 seconds for failure detection and i.e. infiniband linkup and some reserve, and the rest of the time is for GlusterFS to do it’s thing.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s