A primer on risking data loss with pvmove

So, I’ll admit to you my dom0 had 128MB ram and no swapspace.
It was meant to be somewhat embedded and that used to be just fine.

I had to move roughly 300GB using pvmove.

It worked quite ok until at some point it suddenly ran out of Ram?!

Now neither will pvmove recover on the next call, nor will it abort, it’s just stuck with no working state.

And this is on a current system, but still behaviour so similar to what an irc friend ran into two years ago.

Loading vgxen-lvxensave table
 Suppressed vgxen-lvxensave identical table reload.
 Resuming vgxen-lvxensave (253:17)
 Found volume group "vgxen"
 Loading vgxen-pvmove0 table
 Suppressed vgxen-pvmove0 identical table reload.
 Loading vgxen-lv_lab_cent--lfw_swap table
 Suppressed vgxen-lv_lab_cent--lfw_swap identical table reload.
 Resuming vgxen-lv_lab_cent--lfw_swap (253:19)
 Found volume group "vgxen"
 Loading vgxen-pvmove0 table

A look into dmesg shows the issue:

HighMem per-cpu: empty
Free pages:        1964kB (0kB HighMem)
Active:9754 inactive:8 dirty:0 writeback:0 unstable:0 free:491 slab:2894 mapped-file:1270 mapped-anon:8577 pagetables:411
DMA free:652kB min:172kB low:212kB high:256kB active:7508kB inactive:0kB present:16384kB pages_scanned:1101018 all_unreclaimable? yes
lowmem_reserve[]: 0 0 120 120
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 120 120
Normal free:1312kB min:1316kB low:1644kB high:1972kB active:31536kB inactive:4kB present:122880kB pages_scanned:2104924 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 652kB
DMA32: empty
Normal: 2*4kB 5*8kB 1*16kB 3*32kB 0*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1312kB
HighMem: empty
1313 pagecache pages
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap  = 0kB
Total swap = 0kB
Free swap:            0kB
34816 pages of RAM
0 pages of HIGHMEM
17953 reserved pages
2055 pages shared
0 pages swap cached
0 pages dirty
0 pages writeback
1270 pages mapped
2894 pages slab
411 pages pagetables
Out of memory: Killed process 8899 (pvmove).

So, why am I here bitching if I configured too little ram?

Well, because the root cause was a memleak in pvmove – it had already worked for quite a few, and even bigger volumes without a problem and then suddenly it had sucked up all ram.

If we consider that the PE size in this VG was only 4meg, we can be quite sure it didn’t run out of space for data it was supposed to keep in ram, nothing more than 4meg and the bitmap for the volume in question could be in ram, that might be adding up to about 20MB for a bigger volume…

The next thing I’m just disgusted by is this whole “let’s do it in userspace stuff” – pvmove should do it’s job via some api to the kernel lvm driver, then the Oom killer would have caused no harm.

And, lastly, it is obviously low quality code considering it can only resume in theory but in practice I’ll have to pray, hard-reset, boot to runlevel one, retry the pvmove –abort and then pray some more.

I WISH the sistina people had invested some more time to really understand logical volume management when they “reimplemented” the hp-ux lvm, so there wouldn’t be so many points where lvm2 still breaks apart.

After all, that task / command pvmove has worked there about 12 years ago without any ram / cpu pressure even on 64meg boxes….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s