Ceph Training Overview


Someone on #ceph asked about training in Europe / major cities there.

So what I did is I googled the s*** out of “Ceph Training”…

I mean, I’ve done a little browse about who’s currently offering any, part as research “do I wanna do that again?” and also because I think alternatives are good.

 

Here’s the offerings I found, along with some comments.

All of them have basics pretty nice now, meaning you get to look at CRUSH, make sure  you understand PGs reasonably and most will let you do maintenance tasks like OSD add / remove…

I didn’t only look at classes in German, but it seems the interest in Ceph in Germany is just pretty high. Rightfully so:-)

Ceph Trainings

Btv0bouCcAA_VD3

CEPH – Cluster- und Dateisysteme

(obviously a Germany speaking class)

https://www.medienreich.de/training/ceph-cluster-und-dateisysteme

They have lots of references for this course, and clearly involve CephFS. They offer a flexible duration so you can chose how deep some topics will be covered. They handle common mistakes, which is very nice for the trainees.

Some addon they do is re-exporting, i.e. HA NFS or VMs in RADOS etc. Surely helpful, but with that one make sure you either cut that part very short (like 2hours) to only get the overview, or stretch to be 2 days of its own. Clustering isn’t a part of your toolkit, it’s a new toolkit you learn. If you cut short, you end up worse. And make no mistake, Ceph(FS) will be your new SPOF, so make sure you get in very close touch with it.

One thing I’d also recommend is not to do the class with just a 3-node setup if you take a longer one. 3 nodes is really nice for your first days with Ceph but the reliability and performance are completely unrelated to what you see in a larger setup.

hastexo – “Get in touch”

https://www.hastexo.com/services/training/

I found some past feedback from one of their classes, it seems to be very good.

Also keep in mind they’re among the really long-time Ceph users, they hang out in #ceph at least as long as I do, and that is now, idk? 6 years?

Different from pure trainers or people that run around bragging about Ceph but aren’t even on the community, hastexo has also spent years delivering Ceph setups large- and small.

The only grudge I have with them is when they recommended consumer Samsung SSDs in a Ceph Intro for iX magazine. That wasn’t funny, I met people who thought that was a buying advice for anything serious. Ignoring the that any power outage would potentially fizzle all the journal SSDs ain’t something you do. But the author just probably tried to be nice and save people some money in their labs.

Anyway, hastexo do to their large amount of installations is the very best choice if your company is likely to have a few special requirements; let’s say you’re a company that might test with 8 boxes but later run 500+ and you want real-world experience and advice for scalability, even in your special context.

Afaik they’re in Germany, but they’re real IT people, as far as I can tell any of them would speak most fluent english:-)

Seminar Ceph

http://www.seminar-experts.de/seminare/ceph/

This is just a company re-selling trainings someone else is doing.

The trainer seems to have a good concept though, adding in benchmarking and spending a lot of time on the pros/cons of fuse vs kernel for different tasks.

This is the course you should take if you’ll be “the ceph guy” in your company and need to fight and win on your own.

Nothing fuzzy, no OpenStack or Samba “addons”. Instead you learn about Ceph to the max. I love that.

Price isn’t low even for 4 days, but I see the value in this, and in-house training generally ain’t cheap.

There’s also an “streaming” option which comes around cheaper but a Ceph class without a lab is mostly useless. It also doesn’t say anything about the trainer, so no idea if he’d do it in another language than German.

Red Hat Ceph Storage Architecture and Administration

http://www.flane.de/en/course/redhat-ceph125

Seriously, no. This is all about OpenStack. You can take this course if you have some extra time to learn Ceph in-depth or if you’re the OpenStack admin and do some Ceph on the side, and aren’t the only guy around.

Can also be partially interesting if you have other ideas for using the Rados Gateway.

 

Merrymack Ceph Training

http://www.ceph-training.com/index.html

A web-based / video-based training. Price-wise this beats them all if you just have 1-2 attendees and no prior knowledge.

Probably a very good refresh if Ceph knowledge is dated or if you want to learn at your own pace. That way you can spend a lot more lab time, rather nice.

If you have a few people on the team the price goes up and you should really negotiate a price.

Personally I’d prefer something with a trainer who looks at your test and tells you “try like this and it’ll work” but $499 are hard to argue with if you got some spare time to do the lab chores.

I think this is the actual launch info of the course:

https://www.linkedin.com/pulse/i-just-launched-on-demand-ceph-training-course-donald-talton

 

No longer available

 Ceph 2 – Day workshop at Heise / iX magazine.

It was a bit expensive for 2 days with up to 15 people.

http://www.heise.de/ix/meldung/iX-Workshop-zum-Dateisystem-Ceph-2466563.html

Nice as a get-to-know thing, I would not recommend it as an only training before going into a prod deployment

 

MK Linux Storage & LVM

That’s the original first Ceph training, the one I used to do:-)

Ceph was done on the final day of the class, because back then you’d not find enough people to just come around for a Ceph training😉

But it’s not offered by them any longer. As far as I know the interest was always a little bit too low since this hardcore storage stuff seems to have a different audience than the generic Linux/Bash/Puppet classes do.

 

Summary

Which one would I recommend?

Seminar Ceph” from that reseller would be for storage admins who need to know their ceph cluster as well as a seasoned SAN admin knows their VMAX etc. Also the best choice for people at IT shops who need to support Ceph in their customer base. You’ll be better off really understanding all parts of the storage layer, you might get your life sued away if you lose some data.

Go to hastexo if you generally know about Ceph, you already read the Ceph paper and some more current docs, your team is strong enough to basically set it up on your own (at a scale, so not “we can install that on 5 servers with ansible but “we’ve brought up new applications in size of 100s of servers capacity often enough, thank you”). You’d be able to strengthen some areas with them and benefit from their implementation experience.

Take the online Ceph Training if you want something quick and cheap and are super eager to tinker around and learn all the details. You’ll end up at the same level as with the pro training but need more time to get there.

Myself?

I still got no idea if I should do another training. I looked at all their outlines and it looked OK. Some more crush rebuilds to flex fingers and add-/remove/admin-socketify all the things:-) So, that’s fine with a week of prep and slides.

Training is a lot more fun than anything else I do, too.

But, to be honest the other stuff isn’t done and also pretty cool, with 1000s of servers and so on.

At my next website (www.florianheigl.me) iteration I’ll be adding classes and schedule.

Some upgrades are special


 

No Xen kernel

Yeah never forget when upgrading from older Alpine Linux that Xen itself moved into the xen-*-hypervisor package. Didn’t catch that on update and so I had no more hypervisor on the system.

Xen 4.6.0 crashes on boot

My experience: No you don’t need to compile a debug Xen kernel + Toolstack. No you don’t need a serial console. No you don’t need to attach it.

You need google and search for whatever random regression you hit.

In this case, if you set dom0_mem, it will crash instead of putting the memory in the unused pool: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810070

4.6.1 fixes that but isn’t in AlpineLinux stable so far.

So what I did was enabling autoballoon in /etc/xen/xl.conf. That’s one of the worst things you can do, ever. It slows down VM startup, has NO benefit at all, and as far as I know also increases memory fragmentation to the max. Which is lovely, especially considering Xen doesn’t detect this NUMA machine as one thanks to IBM’s chipset “magic”.

CPU affinity settings got botched

I had used a combination of the vcpu pinning / scheduling settings to make sure power management works all while dedicating 2 cores to dom0 for good performance. Normally with dom0 VCPU pinning you got a problem:

dom0 is being a good citizen, only using the first two cores. But all other VMs may also use those, breaking some of the benefits…

So what you’d do was have settings like this

 

memory = 8192
maxmem = 16384
name   = "xen49.xenvms.de"
vcpus  = 4
cpus = [ "^0,^1" ]

That tells Xen this VM can have 4 virtual CPUs, but they’ll never be scheduled on the first two cores (the dom0 ones).

Yeah except in Xen 4.6 no VM can boot like that.
The good news is a non-array’ed syntax works:

cpus = "^0,^1,2-7"

 

IBM RSA2

IBM’s certificates for this old clunker are expired. Solution to access?

Use Windows.

Oh, and if it’s in the BIOS from configuring the RSA module it’ll NEVER display anything, even if you reset the ASM. You need to reset the server once more. Otherwise you get a white screen. The recommended fix is to reinstall the firmware, which also gets you that reboot.

 

Alpine Linux

My network config also stopped working. That’s the one part I’d like to change in Alpine – not using that challenged “interfaces” file from Debian, but instead something that is more friendly to working with VLANs, Bridges and Tunnels.

If you’re used to it day-to-day it might “look” just fine but that’s because you don’t go out and compare to something else.

Bringing up a bridge interface was broken, because one sysctl knob was no longer supported. So it tried to turn off ebtables there, that didn’t work and so, hey, what should we do? Why not just stop bringing up any interfaces that are configured and completely mess up the IP stack?

I mean, what else would make sense to do than the WORST possible outcome?

If this were a cluster with a lost quorum I’d even agree. But you can bet the Linux kids will run that into a split brain with pride.

 

I’ll paste the actual config once I found how to take WordPress out of the idiot mode and edit html. Since, of course, pasting to a <PRE> section is utterly fucked up.

 

I removed my bonding config from this to be able to debug more easily. But the key points was to remove the echo calls and making the pre-up / post-up parts more reliable.

auto lo
iface lo inet loopback

auto br0
iface br0 inet static
    pre-up brctl addbr br0
    pre-up ip link set dev eth0 up
    post-up brctl addif br0 eth0
    address your_local_ip
    netmask subnet_mask
    broadcast subnet_bcast
    gateway your_gw
    hostname waxh0012
    post-down brctl delif br0 eth0
    post-down brctl delbr br0
           
                    
# 1 gbit nach aussen
auto eth0             
iface eth0 inet manual      
    up ip link set $IFACE up    
    down ip link set $IFACE down

Xen VMs not booting

This is some annoying thing with Alpine only. Some VMs just require you to press enter, them being stuck at the grub menu. It’s something with grub parsing the extlinux.conf that doesn’t work, most likely the “timeout” or “default” lines.
And of course the idiotic (hd0) default vs (hd0,0) from the grub-compat wrapper.
I think you can’t possibly shout too loud at any of the involved people since this all goes to the “why care?” class of issues.
(“Oh just use PV-Grub” … except that has another bunch of issues…)

Normally I don’t want to bother anymore reporting all the broken stuff I run into. It’s gotten just too much, basically I would just spend 2/3 of my day on that. But since this concerns a super-recent release of Alpine & Xen (and even some debian) I figured I’ll save people some of the PITA I spend my last hours on.
When able I dump them to my confluence at this url:
Adminspace – Fixlets

I also try really hard to not rant there:-)

Update:
Nathanael Copa reached out to me and let me know that the newer bridge packages on Alpine include a lot more helper scripts. That way the icky settings from the config would not have been needed any more.
Another thing one can do is to do

post-up my-command || echo "didnt work"

you should totally not … need to do that, but it helps.

Time for 2016


Hi everyone.

 

I just thought some post is in place after having gone dark for quite a long time.

I’d been home sick for almost a month. First I’d snapped my back very badly and then I caught a strong flu. This completely ruined the month I had meant to spend on posting things here. Lying on your back is BORING and painkillers make you too numb do to anything useful.

Between that I’ve also done a few fun projects and been to OpenNebulaConf (Oct), the chaos communication camp (Dec) and the config management camp (Feb) and each time I came home with some nice ideas to throw around.

To be honest though, the highlight of the last months was watching Deadpool.

If you can handle some completely immature humor and the good old ultraviolent, go watch it.

 

For this year, there will be EuroBSDCon and OpenNebulaConf yet again.

One great thing about OpenNebula is the extremely friendly community. Comparing this to *any* other conf I’ve been to they are all pretty darn hostile and bro-ish. OpenNebula is such a nice community in  comparison and I really hope the others will start trying to match up with that at some point.

Custom NFS options for XenServer


Just had to spend half a night to make move a XenServer lab behind a firewall

Now, the servers are behind a NAT firewall, but my ISO repositories and the test NFS SR are not.

The two steps to get a solution were:

  • enable insecure mounting of the shares, because the NATting scrambles the ports
  • use TCP instead of UDP for the mount

The right hints came from this mailing list thread:

http://www.gossamer-threads.com/lists/xen/users/289442

So what I did was patch the NFS storage manager on both nodes.

In this thread: Kelvin Vanderlip and someone at Softlayer, the internet is rather small at times.

The nicest thing: 100MB/s read throughput… I’m more than surprised!

tinker

This traffic comes into the Debian PV domU on XenServer via XenServers’ OpenVswitch. The XenServers are, like the pfSense only VMs running on ESXi. So next, the pfSense firewall, doing NAT, two of the VMWare vSwitches, 3 real gigabit switches and again OpenVswitch till it finally hits the File server.

Except for the ISO repository which comes re-exported from a LizardFS share.

Lost BNX2 Broadcom BCM5708 drivers after Ubuntu upgrade


Hey everyone,

this feels so important I’d rather leave a post here to save you the same troubles.

Networking nightmare:

On Ubuntu 14 LTS you’ll need not just the non-free firmware package, but also this last one called:

“linux-image-extra-3.13.0-63-generic 3.13.0-63.103                        amd64        Linux kernel extra modules for version 3.13.0 on 64 bit x86 SMP”

Otherwise you’ll not have much more than the stock e1000e around for networking, meaning your servers may miss some nics. This was extremely hard to figure due to the fact that *first* my /lib/modules/3.13.0-63 included the bnx2 and bnx2x modules. Amusingly I found it would still boot 3.13.0-61. After the install of the -63 kernel, the modules were gone. It seems there’s some stupid trimming hook.

Installing the linux-image-extra package made the module stick and I have all 4 nics back.

Lost monitoring site:

A really nice feat is how OMD integrates with most distros by just having a zzz-omd.conf that includes the per-site config files. Now, funny enough, this has been in /etc/apache2/conf.d for years. Ubuntu 14.04 doesn’t read that anymore, it only handles /etc/apache2/conf-enabled. Which is more aligned with the Debian way of things (not that I enjoy it, but at least it’s consistent), but HELL why do you need to suddenly change it after you already fucked up?

I was looking for proxy module issues / misconfiguration for ages until I decided to just add random crap to config files and see if it would break Apache. No, it didn’t. After that some greps verified conf.d isn’t read any longer. It’s beyond me why they don’t at least move the contents over.

SAN nightmare:

One more thing that caused the server to not even boot:

Ubuntu has no concept of loading local disk drivers before SAN drivers. It scanned a lot of SAN luns, hit udev rule bugs by running a bad inquiry on each of them and then finally hit the root device mount timeout.

Root was a local SAS attached SATA disk.

Drivers in this case were mptsas and mptfc. You get the idea, yes? Alphabetically “FC” comes before “SAS”. And no, I don’t think Ubuntu is commonly used with SAN boot plus local disk……..

I’m pretty sure once the devs notice the issue they’l go with a highly professional solution like in /etc/grub.d, i.e. 10-mptsas and 40-mptfc. So clever:)

Anyway, to sort that out:

Blacklist the mptfc module in /etc/modules.d/blacklist.conf

fire up update-initramfs -u

Load it again from /etc/rc.local. Of course this also means you can’t really use your SAN luns anymore.

no, there’s no hook in their ramdisk framework to change the order. I don’t know what else to do about it.

If it weren’t for the udev issue, adding a script for the modprobe in init-bottom might work (prior to lvm and multipath).

But also, I have not unlimited time in my life to properly fix shit like this. I have searched for a few hours and have not found anything that comes close to a clean solution for the udev or init order problem. And it’s not my box, so even a update-grub hook etc. just wouldn’t cut it.

If my friend whose server this is needs a more long-term fix, it would be to use a HBA from a vendor with an initial higher than “M”, so in that case, switch to QLogic.

In summary, I think this story covers all there is to say about working with Ubuntu 14 on a server.

How to break LizardFS…


To start with:
MooseFS and LizardFS are the most forgiving, fault-tolerant filesystems I know.
I’m working with Unix/Storage system for many years, I like running things stable.

What does running a stable system mean to you?

To me it means I’ve taken something to it’s breaking point and I learned how it will exactly behave at that point. Suffice to say, I’ll further not allow it to get to that point.
That, put very bluntly means stable operation.
If we were dealing with a real science and real engineering there would be a sheet of paper indicating tolerances. But the IT world isn’t like that. So we need to find out ourselves.

I’d done a lot of very mean tests to first MooseFS and then later LizardFS.
My install is currently spread over 3 Zyxel NAS and 1 VM. Most of the data is on the Zyxel NAS (running Arch), one of which also has a local SSD using EnhanceIO to drive down latencies and CPU load. The VM is on Debian.
The mfsmaster is running on a single cubietruck board that can just so handle the compute load.

The setup is sweating, has handled a few migrations between hardware and software setups.
And, this is the point it has been operating rock-solid for over a year.

How I finally got to the breaking point.
A few weeks back I migrated my Xen host to OpenVswitch. I’m using LACP over two gigE ports, they both serve a bunch of Vlans to the host. The reason for switching was to get sFlow exports and also the cool feature of directly running .1q VLANs into virtual machines.

After the last OS upgrade (system had been crashing *booh*) I had some openvswitch bug for about a week or two.
Any network connection would initially not work, i.e. every ping command would drop the first packet, and then work.

In terms of my shared filesystem, this affected only the Debian VM on the host, which only held 1TB of data.
I’ve got most of my data at goal: 3, meaning two of the copies were not on that VM.

Now see for yourself:


root@cubie2 /mfsmount # mfsgetgoal cluster/www/vhosts/zoe/.htaccess
cluster/www/vhosts/zoe/.htaccess: 3

root@cubie2 /mfsmount # mfscheckfile cluster/www/vhosts/zoe/.htaccess
cluster/www/vhosts/zoe/.htaccess:
chunks with 0 copies: 1

I don’t understand how this happened.

  • The bug affected one of four mfs storage nodes
  • the file(s) had a goal of 3
  • the file wasn’t touched from the OS ever during that period.

Finally, don’t do a mfsfilerepair on a file with 0 copies left. I was very blonde – but it also doesn’t matter:)

Amazon und wir Deutschen?


am ehrlichsten sind wir deutschen eigentlich samstag morgens um halb 12.
da stehen wir naemlich in der postfiliale schlange, um unser amazon paket abzuholen.
wir sind verschlafen und fuehlen eine tiefsitzende angst.
irgendwo koennte uns so ein aufrechter preusse auflauern.
und uns zur rede stellen, warum wir bis eben eigentlich noch geschlafen haben.

german angst.

Your network is probably owned – What to do next?


I’ll try to summarize my thoughts after the pretty shocking 31C3 talk.

The talk was this one: Reconstructing .Narratives.

This trip to 31C3 was meant to be a normal educational excursion but it is now just depressing. The holes the NSA & friends rip into the networks we are looking after are so deep it’s hard to describe.

Our democratic governments using the data gathered for KILL LISTS of people, even assigning a “kill value” as in how many people are legit to kill if it helps the matter. This is something I can’t yet fit into my head. The political and technical aspects are covered on Spiegel.de.

Note that the info there will be extended in 3 weeks since there will be another drop of info regarding malware aspects.

Personally, I’m not feeling well just over what I heard there and I’m grateful they didn’t come around to the malware list.

Now I’ll go ahead on the tech side and talk about what you should consider, we NEED to clean up our networks.

This is not a check list. It is a list to start from.

Your admin workstation:

  • Buy a new one. Install Qubes as per https://qubes-os.org/
  • If your box runs it nicely, submit it to their HCL.
  • Talked to Joanna before this shaking talk, and I’ll write about my “interview” at a later time.
  • Use the TOR VM or another box with Tails for your FW downloads
  • I wish coreboot was actually usable, if you can help on that end, please do it.

Point of Administration MATTERS

  • IPSEC VPN with preshared keys: Not safe
  • IPSEC VPN: Should be safe?
  • PPTP VPN: (Obviously) Not safe
  • SSH: VERY VERY questionable
  • ISDN Callback: Sorry, that was only safe before IP was the standard. And maybe not then

So basically, if your servers aren’t in the cloud but in your basement, THAT IS A GOOD THING.

Really sorry but it has to be said.

Re-keying:

  • wipe your ssh host keys, regenerate them
  • Don’t use less than 4k keys.
  • include the routers and other networking equipment.
  • Drop ALL your admin keys
  • Regenerate them monthly
  • Be prepared to re-key once we find out what SSH ECDSA-style option is actually safe

SSH adjustments are now described very well at the following github url:
stribika – Secure Secure Shell

Passwords:

change passwords!

this is sounding funny and old, but since any connection you have ever made might get decrypted at a later time, you should consider all compromised.
I think it should also be a good thing[tm] to have separate passwords on the first line of jump hosts than on the rest of systems.

yes, keys seem safer. But i’ve been talking about passwords, which included issues like keystroke timing attacks on password based logins to systems further down the line.
of course applies to public keys; i.e. don’t overly enjoy agent forwarding. I’d rather not allow my “jump host login” key on the inner ring of systems.

Password management:

It seems the tool from Bruce Schneier is rather safe, I’d go away from the “common” choices like KeepassX.

Info / Download: https://www.schneier.com/passsafe.html

Firmware:

Make BIOS reflashing a POLICY.

Random number generators:

Expect you will need to switch them, personally I THINK you should immediately drop the comforts of haveged.

GnuPG

It was recommended more than one time.

Start using it more and more, putting more stuff in it than you’d have done till today.

Switches and routers:

Your network is NOT your friend.

  • IP ACLs are really a good thing to consider and piss off intruders
  • A good tool to set ACLs globally on your hardware is Googles capirca. Find it at a href=”https://code.google.com/p/capirca/”>https://code.google.com/p/capirca/. Shorewall etc. is more on the “nice for a host” level. We have come a long way with host based firewalls, but…
  • Think harder about how to secure your whole network. And how to go about replacing parts of it.

We can’t be sure which of our LAN active components are safe, your WAN probably IS NOT.

Clients

We really need to have PSF more commonspread.

Talk it over with your clients, how much ongoing damage is acceptable for helping the helpless XP users.

Guest WIFI

Do NOT run a flat home network.

Additions welcome, comment if you know something to *advance* things.

OpenStacking


OpenStack

Notes from IBM OpenStack workshop I was at.

  • i haven’t seen a single thing that was exciting when you’ve seen and used multiple solutions.

Many sensible features (i.e. deployment hosts like Oracle VM had) are just being added and *lol* with the same naive approach. (Oh, let’s add a dedicated server for this. Oh, lets run all  deployment jobs in parallel there, so they trash the disks and we cripple the benefit it could have brought)

  • I haven’t seen a single thing that is done much better than what’s in OpenNebula (and there it would be much much more efficient to work with)
  • There is a chance that OpenStack with it’s different components will be better at doing one thing, and doing that one thing good: From what I’ve seen it has a lot less issues than OpenNebula “when something goes wrong”, but on the other hand everything is under a pile of APIs and services anyway.

So, from a birds eye view: What you can do can hardly go wrong, but you also can’t really do a lot. Especially for people coming from VMWare the current state of (opensource) affairs is insulting.

Some detail points:

hybrid cloud: generally not considered workable, except for “extremely dumb” workloads like web etc. For those on, most people will be better served with a CDN type setup

Some (cloud vendors) sales people are actually running around selling a hybrid cloud that look like this: you/they add a backup active directory controller at their datacenter.

This of course is not AT ALL hybrid cloud or “bursting” but poses a problem. Knowledgeable people saying “sorry, but hybrid fully dynamic cloud is still tricky” will not be closing any sale. Instead the completely clueless sales drone will do, since he promises that it will work. since neither he or the customer knows the original idea, this works out

why doesn’t it work so far:

api elasticity including buying of new vms etc. was said to be rarely working, much less so if bare metal bringup is involved (adding hosts to remote cloud etc)

shrink down is also apparently a pretty ugly topic.

(The focus in this question was OpenStack to OpenStack bursting mostly)

Misunderstandings are expectations:

Can my VM move to the cloud, from vCenter to an openstack at the ISP?

General answer: no

General expectation: why not?

I wonder: why not just provide a good P2V tool (i.e. platespin) so this is on the list?

Sadly the relation between data lock in (meaning safe revenues) and lack of workload portability did not come up as a topic

This is a downward spiral – if you buy infrastructure, you can save some admin money. yet, that takes away the skill level you’d need to simply overstep those portabilty restrictions. Any seasoned admin could plan and handle an iSCSI based online migration from Cloud A to Cloud B.

But running off an IaaS (or MSP) platform, you might not have that as an in-House skill any longer.

Also tools that handle cloud federation DO exist, but are completely unknown.

Examples are Panacea and Contrail (this isnt Contrail related to SDN).

It is around for much longer, probably works but nobody knows (of it).

Sad so many millions were spent there, spent on the right thing but ultimately nothing came of it so far.

I think this would need unusual steps, i.e. on every 10m invested in openstack there need to be put 100k into marketing rOCCI / OCCI.

A nice hack was using OVF (sick format nonetheless) to avoid cloud-init style VM contextualization.

On the infrastructure front, it was shocking to see the networking stack to higher detail (we worked in a “smaller” multi-tenant example with a few 100 vlans). The OpenStack people decided to keep distance from any existing standard (like qinq, S-VLAN, PBB-TE) and instead made a large pile of shit with a lot of temporary / interconnecting vlans / vswitches.

The greatest shit was to see what they did for mac adresses:

When Xen came out, the Xensource Inc. guys went to the IEEE and got their own mac prefix of 00:16:3e.

Someone figured the best way to use use fa:16:3e. Of course they didn’t register that.

Probably thought he’s the most clever guy in the universe, except he simply didn’t get it at all.

All cross-host connections are done using on-the-fly GRE tunnels and all hosts are apparently fully meshed. I suppose this whole setup + OpenVSwitch are so ineffecient it doesn’t matter any more?

There are other mode selectable, and it seems to me that flow based switching will be less bullshit than what Openstack does by default.

I hope they don’t find out about DMVPN.

Problems of datacenter extension and cross-site VLANs were a no-concern topic.

Having a flat L2 seems to be oh so pretty. I am in fear.

What else did I dig into:

Rate limiting in this mess is a neccessity but seems to be possible. workable.

There are some hints at offloading intra-host switching when using Emulex CNA or Mellanox. It seems not possible with Solarflare.

I’m pretty sure someone at Emulex knows how to do it. But it is not documented any place you could just find it.

Considing this would have massive (positive) performance impact, it’s just sad.

Digging takeaway:

I would try to only use SR-IOV HBAs, ensure QoS is enforced at ingress (that means, on the vm hosts, before customer traffic from a VM reaches the actual wires)

Unanswered:

IP address assignments. One thing we didn’t get to was that creating the network required setting up IP ranges etc.

I’m not sold on the “IaaS needs to provide XXX” story at all.

In summary, I want to provide customers with a network of their own, optionally providing dhcp/local dns/load balancers/firewall/etc.

But by default it should be theirs – let me put it like this:

When you say “IaaS” I read infrastructure as a service. Not infrastructure services as a service. I’m sure a medium sized dev team can nicely integrate an enterprise’s IPAM and DNS services with “the cloud” but I doubt it will provide any benefit over using their existing management stack. Except for the medium sized dev team of course.

What I see is cloud stacks that are cluttered with features that bring high value to very small, startup like environments (remember i.e. the average OpenStack install is <100 cores). It’s cool to have them, but the thing is: If you’re expecting people to use them, you’re doing it wrong. They’re trivial, puny and useless (i.e. “yes we can assign ipv4”, “yes we can assign ipv6” – but what happens if you ask about dual stack? subnets?) and it’s becoming a bad joke to expect companies that do more than “something on the internet” to spend considerable time on de-integration of those startup convenience features.

Another interesting side note:

Softlayer is also Xen based. That’s the cloud service that made IBM suddenly the number one of the market.

Among Amazon, Rackspace, Linode, OrionVM and Softlayer using Xen, a 9x% VMWare share in the enterprise market (which is probably a lot bigger than cloud), I’m once again puzzled at the hybris of KVM community thinking they are the center of the universe. People tell me about oVirt / RHEV while it has NO RELEVANCE at all.

The only really cool KVM based place I know is Joyent. And they don’t even use Linux.

Oh, and, coming back to cloud, I’m still puzzled by the amount of attention Microsoft Azure gets in Germany. It seems the competitors (especially the higher end ones like, HP, IBM, Profitbricks, etc who actually offer SLAs worth the name) simply can’t get a shot at the Microsoft-addicted SMB and medium enterprise crowd.

That said (enough ranting) they are cool to have in a demo style setup like the one we played with.

IBM’s solution seems a nice middle ground – config adjustments are easily done, yet the deployment is highly automated and also highly reliable.

They’re going the right way, selling a rackful of servers with one usb stick to install the management server from. Wroooom[*].

Here’s your cloudy datacenter-in-a-box

p.s.: Wrooom was taking a little over an hour. Pretty different to what I’m used to with CFEngine now.

Ps2: Links: Contrail and http://contrail-project.eu/en_GB/federation and Panacea http://projects.laas.fr/panacea-cloud/node/31

Bacula version clash between 5 and 7


This is the second time I run into the error “Authorization key rejected by Storage daemon.”

It makes backups and restores impossible. Most traces / explanations on the internet will point at FD hostname or SD hostname or key mismatch issues.

That is of course always possible, but if you had it working until a week ago when you updated – please don’t let them discourage you. This error will also occur for any version 7 client connecting to a version 5 server. I’ve had it on my Macbook after running “port upgrade outdated” and just now on my FreeBSD desktop during a migration restore.

The jobs will abort after the client is asked to send/receive files.

Debug output of the storage daemon shows that this is in fact a client error!

the red herring, a bacula error message saying

Authorization key rejected by Storage daemon

is completely wrong.

They just abstracted / objectified their logging a little too much. The SD received the error “client didn’t want me” and has to pass it own. Not helpful. Sorry guys:)

As a warning / example, here have a look at the log:

JobName: RestoreFiles
Bootstrap: /var/lib/bacula/mydir-dir.restore.1.bsr
Where:
Replace: always
FileSet: Full Set
Backup Client: Egal
Restore Client: Egal
Storage: PrimaryFileStorage-int
When: 2014-09-14 12:40:15
Catalog: MyCatalog
Priority: 10
Plugin Options: *None*
OK to run? (yes/mod/no): yes
Job queued. JobId=17300
*mess
14-Sep 12:40 waxu0604-dir JobId 17300: Start Restore Job RestoreFiles.
14-Sep 12:40 waxu0604-dir JobId 17300: Using Device "PrimaryFileDevice"
14-Sep 12:39 Egal JobId 17300: Fatal error: Authorization key rejected by Storage daemon.
Please see http://www.bacula.org/en/rel-manual/Bacula_Freque_As[...]
*status client=Egal
Connecting to Client Egal at 192.168.xxx:9102

Egal Version: 5.2.12 (12 September 2012)  amd64-portbld-freebsd10.0
Daemon started 14-Sep-14 12:43. Jobs: run=0 running=0.
 Heap: heap=0 smbytes=21,539 max_bytes=21,686 bufs=50 max_bufs=51
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 
Running Jobs:
Director connected at: 14-Sep-14 12:43
No Jobs running.
====

As you saw the restore aborts while a status client is doing just fine.
The same client is now running its restore without ANY issue after doing no more than downgrading the client to version 5.

*status client=Egal
Connecting to Client Egal at 192.168.xxx.xxx:9102

Egal Version: 5.2.12 (12 September 2012)  amd64-portbld-freebsd10.0
Daemon started 14-Sep-14 12:43. Jobs: run=0 running=0.
 Heap: heap=0 smbytes=167,811 max_bytes=167,958 bufs=96 max_bufs=97
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 
Running Jobs:
JobId 17301 Job RestoreFiles.2014-09-14_12.49.00_41 is running.
      Restore Job started: 14-Sep-14 12:48
    Files=2,199 Bytes=1,567,843,695 Bytes/sec=10,812,715 Errors=0
    Files Examined=2,199
    Processing file: /home/floh/Downloads/SLES_11_SP3_JeOS_Rudder_[...]

All fine, soon my data will be back in place.

(Don’t be shocked by the low restore speed, my “server” is running the SDs off a large MooseFS share built out of $100 NAS storages.
I used to have the SDs directly on NAS and got better speeds with that but I like distributed storage better than speed)