Back in November I had build the GlusterFS binaries using the Oracle VM SDK, now I moved on and got the infiniband up and built a fuse kernel module for Oracle VM.
I now started doing the first live tests..
I’m posting this here so I don’t clutter the OTN forums thread ( see: http://forums.oracle.com/forums/thread.jspa?messageID=9394644 ) with my test results –
[root@waxh0005 ~]# lsmod | grep fusefuse 43796 2
[root@waxh0005 ~]# dfFilesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 3046384 961508 1927632 34%
/tmpfs 293976 0 293976 0%
/dev/shmglusterfs#192.168.100.106:/test-volume 825698944 44585088 739170816 6% /data/export
This is built from 4 200GB LVs in 2 Oracle VM Hosts.Each is on it’s own comsumer-grade 1.5TB SATA Disk (we’re talking lab here after all 🙂
Well, OK, I wouldn’t care about using normal SATA disks on prod systems either. Higher failure ratio is easily compensated if you built a fault tolerant storage in the first place, and if you’re ensuring your storage gives more IOPS than your users need – why not?
Most people just badly fail on the second part and give you 3-spindle software raid5. The cpu overhead has been show to be neglible in lab tests, they say. Pity enough that real systems aren’t run in a lab and will be using their CPU % for something else[*] than raid.
[root@waxh0005 export]# dd if=/dev/zero of=lala bs=1024k count=20002000+0 records in2000+0 records out2097152000 bytes (2.1 GB) copied, 12.0661 seconds, 174 MB/s
Performance is still a bit flaky, for reading & writing 80MB/s is the lower boundary, 180MB/s average and it tops out at about 200MB/s.
I have set it to run as a 4-way stripe volume without mirroring and use the default “gluster” tool for volume creation. This is smaller than the smallest supported setup, which would be 4 Hosts.
Performance wise I think there will be about 30% increase possible by tuning readahead etc. and another 90% by adding mirroring.This will be done this or next weekend when I got time to stuff in more disks, and after that I’ll also re-run using SSDs.
About running Xen VMs, it’s somewhat documented that one has to disable direct IO on the lower FS layers, which is not possible for me right now as I built the RPMs using a version that had a bug in that command. I’ll have to make new ones.
Also interesting is traffic monitoring for RDMA – I don’t know much about this yet, but it makes some sense that the counters for “ib0” on a linux host will only show the metadata traffic, while everything else just passes by using RDMA. Time for a check_mk infiniband plugin?
[*]Yes, that goes for you iSCSI folks just the same.