No Xen kernel
Yeah never forget when upgrading from older Alpine Linux that Xen itself moved into the xen-*-hypervisor package. Didn’t catch that on update and so I had no more hypervisor on the system.
Xen 4.6.0 crashes on boot
My experience: No you don’t need to compile a debug Xen kernel + Toolstack. No you don’t need a serial console. No you don’t need to attach it.
You need google and search for whatever random regression you hit.
In this case, if you set dom0_mem, it will crash instead of putting the memory in the unused pool: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810070
4.6.1 fixes that but isn’t in AlpineLinux stable so far.
So what I did was enabling autoballoon in /etc/xen/xl.conf. That’s one of the worst things you can do, ever. It slows down VM startup, has NO benefit at all, and as far as I know also increases memory fragmentation to the max. Which is lovely, especially considering Xen doesn’t detect this NUMA machine as one thanks to IBM’s chipset “magic”.
CPU affinity settings got botched
I had used a combination of the vcpu pinning / scheduling settings to make sure power management works all while dedicating 2 cores to dom0 for good performance. Normally with dom0 VCPU pinning you got a problem:
dom0 is being a good citizen, only using the first two cores. But all other VMs may also use those, breaking some of the benefits…
So what you’d do was have settings like this
memory = 8192 maxmem = 16384 name = "xen49.xenvms.de" vcpus = 4 cpus = [ "^0,^1" ]
That tells Xen this VM can have 4 virtual CPUs, but they’ll never be scheduled on the first two cores (the dom0 ones).
Yeah except in Xen 4.6 no VM can boot like that.
The good news is a non-array’ed syntax works:
cpus = "^0,^1,2-7"
IBM’s certificates for this old clunker are expired. Solution to access?
Oh, and if it’s in the BIOS from configuring the RSA module it’ll NEVER display anything, even if you reset the ASM. You need to reset the server once more. Otherwise you get a white screen. The recommended fix is to reinstall the firmware, which also gets you that reboot.
My network config also stopped working. That’s the one part I’d like to change in Alpine – not using that challenged “interfaces” file from Debian, but instead something that is more friendly to working with VLANs, Bridges and Tunnels.
If you’re used to it day-to-day it might “look” just fine but that’s because you don’t go out and compare to something else.
Bringing up a bridge interface was broken, because one sysctl knob was no longer supported. So it tried to turn off ebtables there, that didn’t work and so, hey, what should we do? Why not just stop bringing up any interfaces that are configured and completely mess up the IP stack?
I mean, what else would make sense to do than the WORST possible outcome?
If this were a cluster with a lost quorum I’d even agree. But you can bet the Linux kids will run that into a split brain with pride.
I’ll paste the actual config once I found how to take WordPress out of the idiot mode and edit html. Since, of course, pasting to a <PRE> section is utterly fucked up.
I removed my bonding config from this to be able to debug more easily. But the key points was to remove the echo calls and making the pre-up / post-up parts more reliable.
auto lo iface lo inet loopback auto br0 iface br0 inet static pre-up brctl addbr br0 pre-up ip link set dev eth0 up post-up brctl addif br0 eth0 address your_local_ip netmask subnet_mask broadcast subnet_bcast gateway your_gw hostname waxh0012 post-down brctl delif br0 eth0 post-down brctl delbr br0 # 1 gbit nach aussen auto eth0 iface eth0 inet manual up ip link set $IFACE up down ip link set $IFACE down
Xen VMs not booting
This is some annoying thing with Alpine only. Some VMs just require you to press enter, them being stuck at the grub menu. It’s something with grub parsing the extlinux.conf that doesn’t work, most likely the “timeout” or “default” lines.
And of course the idiotic (hd0) default vs (hd0,0) from the grub-compat wrapper.
I think you can’t possibly shout too loud at any of the involved people since this all goes to the “why care?” class of issues.
(“Oh just use PV-Grub” … except that has another bunch of issues…)
Normally I don’t want to bother anymore reporting all the broken stuff I run into. It’s gotten just too much, basically I would just spend 2/3 of my day on that. But since this concerns a super-recent release of Alpine & Xen (and even some debian) I figured I’ll save people some of the PITA I spend my last hours on.
When able I dump them to my confluence at this url:
Adminspace – Fixlets
I also try really hard to not rant there 🙂
Nathanael Copa reached out to me and let me know that the newer bridge packages on Alpine include a lot more helper scripts. That way the icky settings from the config would not have been needed any more.
Another thing one can do is to do
post-up my-command || echo "didnt work"
you should totally not … need to do that, but it helps.