GlusterFS oracle VM update


Back in November I had build the GlusterFS binaries using the Oracle VM SDK, now I moved on and got the infiniband up and built a fuse kernel module for Oracle VM.

I now started doing the first live tests..

I’m posting this here so I don’t clutter the OTN forums thread ( see: http://forums.oracle.com/forums/thread.jspa?messageID=9394644 ) with my test results –

[root@waxh0005 ~]# lsmod | grep fusefuse                   43796  2

[root@waxh0005 ~]# dfFilesystem           1K-blocks      Used Available Use% Mounted on

/dev/sda1              3046384    961508   1927632  34%

/tmpfs                   293976         0    293976   0%

/dev/shmglusterfs#192.168.100.106:/test-volume                     825698944  44585088 739170816   6% /data/export

This is built from 4 200GB LVs in 2 Oracle VM Hosts.Each is on it’s own comsumer-grade 1.5TB SATA Disk (we’re talking lab here after all 🙂

Well, OK, I wouldn’t care about using normal SATA disks on prod systems either. Higher failure ratio is easily compensated if you built a fault tolerant storage in the first place, and if you’re ensuring your storage gives more IOPS than your users need – why not?

Most people just badly fail on the second part and give you 3-spindle software raid5. The cpu overhead has been show to be neglible in lab tests, they say. Pity enough that real systems aren’t run in a lab and will be using their CPU % for something else[*] than raid.

[root@waxh0005 export]# dd if=/dev/zero of=lala bs=1024k count=20002000+0 records in2000+0 records out2097152000 bytes (2.1 GB) copied, 12.0661 seconds, 174 MB/s

Performance is still a bit flaky, for reading & writing 80MB/s is the lower boundary, 180MB/s average and it tops out at about 200MB/s.
I have set it to run as a 4-way stripe volume without mirroring and use the default “gluster” tool for volume creation. This is smaller than the smallest supported setup, which would be 4 Hosts.
Performance wise I think there will be about 30% increase possible by tuning readahead etc. and another 90% by adding mirroring.This will be done this or next weekend when I got time to stuff in more disks, and after that I’ll also re-run using SSDs.
About running Xen VMs, it’s somewhat documented that one has to disable direct IO on the lower FS layers, which is not possible for me right now as I built the RPMs using a version that had a bug in that command. I’ll have to make new ones.

Also interesting is traffic monitoring for RDMA – I don’t know much about this yet, but it makes some sense that the counters for “ib0” on a linux host will only show the metadata traffic, while everything else just passes by using RDMA. Time for a check_mk infiniband plugin?

[*]Yes, that goes for  you iSCSI folks just the same.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s