Next on this channel

Instead of a new years resolution* I’ve looked into what things to work on next. Call it milestones, achievements, whatever.

  • I’ve already cleaned up my platform, based on Alpine Linux now.
  • IPv6 switchover is going well but not a prime concern. Much stuff just works and other stuff is heavily broken, so it’s best to not rush into a wall.
  • Bacula: I’ve invested a lot of time to into backup management routine again. This paid off and made clear it was stupid to decide per-system which VMs to backup and which not. If you want to have reliable backups, just backup everything and be done with it. Sidequests still available are splitting catalogs and wondering why there is no real operations manual. (rename a client? move clients backup catalogs? All this stuff is still at a level that I’d call grumpy retarded cleverness: “You can easily do that with a script” – yeah, but how comes that the prime opensource backup tool doesn’t bring along routine features that are elsewhere handled with ONE keypress (F2 to the rescue)
  • cfengine: This will be my big thing over the next 3 weeks, at home and on holiday. Same goal, coming to grips real well. During the last years I’ve tried puppet, liked but not used Chef and glimpsed at SALT. Then I skipped all of them and decided that Ansible was good for the easy stuff and for the not easy stuff I want the BIG (F) GUN, aka cfengine.
  • Ganeti & Job scheduling: In cleaning up the hosting platform I’ve seen I’ve missed a whole topic automation-wise. Ahead scheduling of VM migrations, Snapshots etc. A friend is pushing me towards ganeti and it sure fills a lot of gaps I currently see, but it doesn’t scale up to the larger goal (OpenNebula virtual datacenters). I’ll see if there is a reasonable way for picking pieces out of Ganeti. Still, the automation topic stays unresolved. There is still no powerful OSS scheduler – the existing ones are all aimed at HPC clusters which is very easy turf compared to enterprise scheduling. So it seems I’ll need to come up with something really basic that does the job.
  • Confluence: That’s an easier topic, I’m working to complete my confluence foo so that I’ll be able to make my own templates and use it real fast.

What else…. oh well that was quite a bit already 😉

Otherwise I’ve been late (in deciding) for one project that would have been a lovable start and turned down two others because they were abroad. Being at the start of this new (selfemployed) career I’m feeling a reasonable amount of panic. Over the weekends it usually turns into a less reasonable one 😉

But yet also cheerful at being able to give a focus on building up my skills, and I also think it was the right decision. I went into the whole Unix line of work by accident but loved it since “if it doesn’t work you know it’s your fault” – a maybe stale, yet basically bug-less environment where you concentrate on issues that come up in interactions of large systems instead of bug after bug after bug. (See above at platform makeover – switching to alpine Linux has been so much fun for the same reason).

My website makeover is in progress and I’m happy to know I’ll visually arrive in this decade-2.0 with it soon.


Of course I’ve also run into some bugs in DTC while trying to auto-setup a WordPress setup. DTC is the reason for the last Debian system I’m keeping. Guess who is REALLY at risk of being deprecated now 😉

*not causing doomsday worked for 2012 though


Ready to serve

The hardware on this picture is uncommon even for a tech-geek like me.

opteron 6100, 8gb dimms

I remember when we upgraded a Cell in a HP SuperDome to 96GB in 2004, and now this amount of memory is here on my desk.
The two Opterons (12core, 2.3 or 2.4GHz) would of course love 4 more DIMMs with this to be in full interleaved quadchannel mode. (That way the can hit 50GB/s memory throughput, which is quite far and beyond any normal PC standards and nothing you’ll need at home any time soon…)


Tomorrrow these will all move into their new home and very soon (I hope) this will be the home of a few very lucky VPS customers (since Xen *does* of course allow “bursting” into unused memory. That means the first guy will start his 1GB contract with 45ish GB (NUMA beware) until the second guy moves in.

I hope this concept will not end in users trying to kick each other off the server to get more ressources? 🙂



Slice your Xen with automatic resource limits

I uploaded Scripts to limit IO and CPU ressource usage for Xen Hosts written as part of my Project black magic (the unkillable Xen host). You can find the scripts in the bitbucket repo below /usr/local/bin or you simply check it out using hg clone

Some example output:

waxh0002:~# xm sched-credit
Name                                ID Weight  Cap
Domain-0                             0   2048  400
xen01                               49    256  130
xen02                               35    256   65
xen03                               15    256   65
xen04                               16    256   65
xen05                               85    256  260
xen06                               37    256   65
xen07                                7    256   65
xen08                               84    256  260
xen09                               80    256  195
xen10                               82    256  130

This is using long-existing but hardly used capabilities and gives you the power to somewhat “cap” ressources used by abusive users or even establish a safe baseline beforehand. I assume a lot more research & testing will be needed by Xen users to expose it’s powerful features, and I hope this can be a good starting point.

Commencts / testing / patches appreciated 🙂

Are you really done virtualizing – or did you stop at the start?

My train of thought is still rolling. I think the usual virtualization setups both in Enterprise and hosting environments have stopped short of where they should be.

I’ll go as far as saying

  • On-Demand provisioning is stupid (creates massive process and labor overheads)
  • On-Demand provisioning is unnecessary (if centralized management is in place)
  • NPIV actually proves I’m right (if you think it through)

This is what the announcement said for the last station:

  • VM as container
  • Create all VMs right away
  • Including the “outer rim”, i.e. firewall rules, mac address, SAN zoning
  • Select storage arrays based on space / performance parameters so your SAN can scale out (instead of looking beautiful centralized and symmetrical on the paper)
  • Prepare VLANs and VPN access already
  • No need to actually allocate / activate things yet
  • Create Nagios, Backup config for all VMs. Just turn off their checks / backup runs.

Have a pool of VMs ready for assignment

  • Yes, you can thin provision them
  • But don’t do it on everything or you’ll see your performance go to hell
  • These have OS installed and are waiting  for policy updates (that will customize them as needed)

When a customer books the VM

  • Automatically select a pooled or prepared VM for him
  • No VM provisioning should be left to be done
  • Activate the customer dependent parts of the policy
  • If not using a pool VM, boot up into the OS installer and then let the policy do its work
  • The policy should now also enable the infrastructure items (Nagios, Backups, OS Installation)
  • Storage assignment as per customers choice
  • Extra IP / VLAN / VPN assignment as per customers choice (from a pool

most people get that last one right, some don’t. I know one guy who hacked his automated VLAN/IP pool allocation system within a day with the rest of his customer panel, and I’ve seen (multiple) places where they use excel files to manage IP addresses. And some (also, multiple ones) that will actually have multiple out-of-sync systems for managing the same IP.

This is where I think the issue with on-demand arises. If your processes have issues, you’re making them worse this  way. Because you will be having changes of the “Excel management” tool (in fact: all changes related to phasing in a new system) each time, one-by-one for each VM. It won’t scale like an assembly chain because you don’t have dedicated people doing the exact same thing all day. Instead you get the typical context-switch behaviour and insane overhead of constantly distracted admins doing manual labor (an  automated bulk change on 10000 VMs will take them only 1-10 times of a manual change of 1 VM).

if “done right” you’ll plan and run all your assignments in the start and get a higher quality result, too.