Over the course of the weekend I’ve switched one of my VM hosts to the newer recommended layout for NodeWeaver – this uses ZFS, with compression enabled.
First, let me admit some things you might find amusing:
- I found I had forgotten to add back one SSD after my disk+l2arc experiment
- I found I had one of the two nodes plugged into its 1ge ports, instead of using the 10ge ones.
The switchover looked like this
- pick a storage volume to convert, write down the actual block device and the mountpoint
- tell LizardFS i’m going to disable it (prefix it with a * and kill -1 the chunkserver)
- Wait a bit
- tell LizardFS to forget about it (prefix with a # and kill -1 the chunkserver)
- ssh into the setup menu
- select ‘local storage’ and pick the now unused disk, assign it to be a ZFS volume
- quit the menu after successful setup of the disk
- kill -1 the chunkserver to enable it
- it’ll be visible in the dashboard again, and you’ll also see it’s a ZFS mount.
Compression was automatically enabled (lz4).
I’ve so far only looked at the re-replication speed and disk usage.
Running on 1ge I only got around 117MB/s (one of the nodes is on LACP and the switch can’t do dst+tcp port hashing so you end up in one channel.
Running on 10ge I saw replication network traffic to go up to 370MB/s.
Disk IO was lower since the compression already kicked, and the savings have been vast.
[root@node02 ~]# zfs get all | grep -w compressratio srv0 compressratio 1.38x - srv1 compressratio 1.51x - srv2 compressratio 1.76x - srv3 compressratio 1.53x - srv4 compressratio 1.57x - srv5 compressratio 1.48x -
I’m pretty sure ZFS also re-sparsified all sparse files, the net usage on some of the storages went down from 670GB to around 150GB.
I’ll probably share screenshots once the other node is also fully converted and rebalancing has happened.
Another thing I told a few people was that I ripped out the 6TB HDDs once I found that $customer’s OpenStack cloud performed slightly better than my home setup.
Consider that solved. 😉
vfile:~$ dd if=/dev/zero of=blah bs=1024k count=1024 conv=fdatasync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.47383 s, 434 MB/s
(this is a network-replicated write…
make sure to enable the noop disk scheduler before you do that.
Alternatively, if there’s multiple applications on the server (i.e. a container hosting VM), use the deadline disk scheduler with the nomerges option set. That matters. Seriously 🙂
Happy hacking and goodbye