A summary of the biggest issues with GlusterFS for keeping track as (or if) they get resolved. Different from some other reviews, this one is based on actually testing GlusterFS for some time.
The overall perspective is about hosting Xen VM images on Gluster.
Please understand I’m not here to flame Gluster (see at end of article), but to point out current issues.
- FUSE performance (see below)
- Documentation (it has to rewritten by someone experienced. Right now it sucks!)
- Config File handling (it would figure glusterfs could *distribute* it’s own configs, and with a software like this, it would be helpful to handle syncronity of the configfiles automatically)
- Messages (one should not have to rely on a mount option to set debug level and logfile, and there should not be a total lack of syslog messages)
- Failover – heck, hasn’t one of you noticed it should be possible to failover from IB to ethernet?
- Failover times – reading the mailing lists, this is at best hard to define. One would expect guaranteed failover times of 30, 60 or 90s. Not blocking for 30 minutes, and not switching parameters whose multiple somehow influences the actual time it takes.
- Expansion / Migration of Raidlevel (distributed volumes should allow adding replication, less a technical than just a design issue)
- in 3.0 only volumes created by glusterfs-volgen.sh are supported, but that script is so shabby it can only create one volume; you end up merging the configs in vi, with the lack of documentation this is not fun at all
- in 3.0 the support for dist+repl volumes was temporarly dropped. just like that. How I found out? just by a sidenote in a mailinglist reply. It seems right now they don’t have a proper dev cycle or anything. otherwise there wouldn’t be a 3.0 release while 1 of 3 storage modes is currently unsupported.
- Move away from filelevel storage to chunklevel storage, the current behaviour wastes performance and scalabilty. sure, having everything consistent on the nodes is nice. but if it means 100MB/s instead of 2000MB/s, then it doesn’t matter as much. If GlusterFS fails me, I have a unwanted service failure, and I won’t care the current design lets me start stuff w/o GlusterFS. I don’t want it to fail, period
- and a personal wishlist item: fuck fuse. make it a kernel module! keep fuse only as an optional mode for the not-performance-minded. the supplied library for passing by the FS layer might do this trick, I just don’t yet know if it works for xen, too. what I know is i got a performance drop from 130 (FS io) to 100MB/s (GlusterFS IO) with a default setup – locally!
So, where does that leave me, as I wanted to base most of my business on gluster?
I will still go with their great offer for free evaluation including installation support, maybe even including 3 or 4 dedicated GlusterFS nodes. So far I’ve only done small tests to get a basic knowledge of cluster, and have been sourcing hardware for my “better” test box.
From what I could gather in production will be the biggest setup of GlusterFS in germany. For the next 1-2 years I will be a very dedicated user, but in the longer run it will be a race between GlusterFS and ceph. Ceph is officially not at all stable so far, but seems to have a superior design. It even brings along hadoop support for the EC2 compatible offering.
So the question is: where is GlusterFS going to go in the next years?