losing data with pvmove, then automagically moving LVM Volumes between VGs

First some story about how I ended up copying volumes between VGs:

I was replacing my Xen Host’s two disk drives (one 500GB, one 1.5TB) with a pair of WD caviar green 1.5TB drives to make it some more silent that it already is.

Of course the new disks were supposed to be in a raid1 setup.

Now this time I’ve had my 2nd disaster with using pvmove. This time I had some proper data loss.

What happened?

I had set up one of the 1.5TB disks as a degraded raid1 using an USB case because the system has only two SATA ports.

Then I added the resulting /dev/md1 device to the “vgxen2” volume group and started moving LVs using pvmove. Roughly 50GB into the data it turned out that the disk was faulty. The layering between md, lvm and kernel sucks big time, so atr some point I got the “rejecting IO to dead device” and some more of this kind.

procedure to fix:

pvmove –abort (fails somewhat)

vgreduce –removemissing (will not really work becasue it can’t deal with the volumes created by pvmove – this was reported by someone back in 2006… lol)

identify lvm metadata copy *before* pvmove by grepping in /etc/lvm/archive/

edit out the section for the volume that was being copied (I had a backup!)

vgcfgrestore -f /tmp/lvmdata_fixed

vgchange -a y vgxen2 (still getting the errors, vgchange is not properly implemented. vgchange -a y is supposed to reactivate the vg even if it’s already active. same for vgscan, guess why the original came with -v and -p options and why you had to move the lvmtab away? And of course there’s no lvlnboot command to sync userland and kernel. gosh, I so  FUCKING HATE linux lvm2. So many bugs and design flaws. If I could afford the power consumption of my hp-ux boxes that’d become a fileserver)

well. how to fix it? ah. a reboot.

Oh. err. and this is why I’m copying over my data instead of using pvmove.

So lets go to the actual script

# script for offline copying of lvm volume group contents
# free to use, but no warranties / liabilities accepted.


for lv in $OLDVG/*
 LVNAME=$(basename $lv)
# first check if we already have an existing copy because I manually copied the large ones.

 if [ ! -r $NEWVG/$LVNAME ]
# figure out the LV size in MB because lvdisplay has rounding errors and changes the unit.
 NUMLE=$( lvdisplay $lv | grep "Curr" | awk '{print $3}')
 LVSZ=$(( $NUMLE * 4 ))

# now we got the lv size and can create it and use DD to copy it.
 lvcreate -L $LVSZ -n $(basename $lv) $NEWVG
 echo "copying $LVSZ MB for $LVNAME from $OLDVG to $NEWVG"
# i had to specify obs because the usb bridge or something made 300 input ios into 5000 output io's
# also, if you want a performance couter or ETA display, you just need to split the dd operation
# into many and take their times.
 dd if=$OLDVG/$LVNAME of=$NEWVG/$LVNAME bs=512k ibs=512k obs=512k &&
 echo "lv copy ok" &&
 echo "lvremove -f $OLDVG/$LVNAME" >> /tmp/lvremovescript


echo "copies are completed, if no major errors occured you could remove the old LVs now using /tmp/lvremovescript"

This is running smoothly now and once done I will be able to work on the next checklist item:

The old 500GB will be attached via USB and contain my Backup VM – which will be dual bootable between Xen and real “iron”.
So if disaster strikes, I will be able to plug my backup server into any available system that can boot off USB and the backups will be available.
And until then I will enjoy the benefits of using a Xen VM, it can even stay in suspend mode for all the “normal” hours of the day and only be brought
up for the actual backups runs.

These options have existed for years now, it is time they see some more use.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s