How to break LizardFS…


To start with:
MooseFS and LizardFS are the most forgiving, fault-tolerant filesystems I know.
I’m working with Unix/Storage system for many years, I like running things stable.

What does running a stable system mean to you?

To me it means I’ve taken something to it’s breaking point and I learned how it will exactly behave at that point. Suffice to say, I’ll further not allow it to get to that point.
That, put very bluntly means stable operation.
If we were dealing with a real science and real engineering there would be a sheet of paper indicating tolerances. But the IT world isn’t like that. So we need to find out ourselves.

I’d done a lot of very mean tests to first MooseFS and then later LizardFS.
My install is currently spread over 3 Zyxel NAS and 1 VM. Most of the data is on the Zyxel NAS (running Arch), one of which also has a local SSD using EnhanceIO to drive down latencies and CPU load. The VM is on Debian.
The mfsmaster is running on a single cubietruck board that can just so handle the compute load.

The setup is sweating, has handled a few migrations between hardware and software setups.
And, this is the point it has been operating rock-solid for over a year.

How I finally got to the breaking point.
A few weeks back I migrated my Xen host to OpenVswitch. I’m using LACP over two gigE ports, they both serve a bunch of Vlans to the host. The reason for switching was to get sFlow exports and also the cool feature of directly running .1q VLANs into virtual machines.

After the last OS upgrade (system had been crashing *booh*) I had some openvswitch bug for about a week or two.
Any network connection would initially not work, i.e. every ping command would drop the first packet, and then work.

In terms of my shared filesystem, this affected only the Debian VM on the host, which only held 1TB of data.
I’ve got most of my data at goal: 3, meaning two of the copies were not on that VM.

Now see for yourself:


root@cubie2 /mfsmount # mfsgetgoal cluster/www/vhosts/zoe/.htaccess
cluster/www/vhosts/zoe/.htaccess: 3

root@cubie2 /mfsmount # mfscheckfile cluster/www/vhosts/zoe/.htaccess
cluster/www/vhosts/zoe/.htaccess:
chunks with 0 copies: 1

I don’t understand how this happened.

  • The bug affected one of four mfs storage nodes
  • the file(s) had a goal of 3
  • the file wasn’t touched from the OS ever during that period.

Finally, don’t do a mfsfilerepair on a file with 0 copies left. I was very blonde – but it also doesn’t matter :)

Amazon und wir Deutschen?


am ehrlichsten sind wir deutschen eigentlich samstag morgens um halb 12.
da stehen wir naemlich in der postfiliale schlange, um unser amazon paket abzuholen.
wir sind verschlafen und fuehlen eine tiefsitzende angst.
irgendwo koennte uns so ein aufrechter preusse auflauern.
und uns zur rede stellen, warum wir bis eben eigentlich noch geschlafen haben.

german angst.

Your network is probably owned – What to do next?


I’ll try to summarize my thoughts after the pretty shocking 31C3 talk.

The talk was this one: Reconstructing .Narratives.

This trip to 31C3 was meant to be a normal educational excursion but it is now just depressing. The holes the NSA & friends rip into the networks we are looking after are so deep it’s hard to describe.

Our democratic governments using the data gathered for KILL LISTS of people, even assigning a “kill value” as in how many people are legit to kill if it helps the matter. This is something I can’t yet fit into my head. The political and technical aspects are covered on Spiegel.de.

Note that the info there will be extended in 3 weeks since there will be another drop of info regarding malware aspects.

Personally, I’m not feeling well just over what I heard there and I’m grateful they didn’t come around to the malware list.

Now I’ll go ahead on the tech side and talk about what you should consider, we NEED to clean up our networks.

This is not a check list. It is a list to start from.

Your admin workstation:

  • Buy a new one. Install Qubes as per https://qubes-os.org/
  • If your box runs it nicely, submit it to their HCL.
  • Talked to Joanna before this shaking talk, and I’ll write about my “interview” at a later time.
  • Use the TOR VM or another box with Tails for your FW downloads
  • I wish coreboot was actually usable, if you can help on that end, please do it.

Point of Administration MATTERS

  • IPSEC VPN with preshared keys: Not safe
  • IPSEC VPN: Should be safe?
  • PPTP VPN: (Obviously) Not safe
  • SSH: VERY VERY questionable
  • ISDN Callback: Sorry, that was only safe before IP was the standard. And maybe not then

So basically, if your servers aren’t in the cloud but in your basement, THAT IS A GOOD THING.

Really sorry but it has to be said.

Re-keying:

  • wipe your ssh host keys, regenerate them
  • Don’t use less than 4k keys.
  • include the routers and other networking equipment.
  • Drop ALL your admin keys
  • Regenerate them monthly
  • Be prepared to re-key once we find out what SSH ECDSA-style option is actually safe

SSH adjustments are now described very well at the following github url:
stribika – Secure Secure Shell

Passwords:

change passwords!

this is sounding funny and old, but since any connection you have ever made might get decrypted at a later time, you should consider all compromised.
I think it should also be a good thing[tm] to have separate passwords on the first line of jump hosts than on the rest of systems.

yes, keys seem safer. But i’ve been talking about passwords, which included issues like keystroke timing attacks on password based logins to systems further down the line.
of course applies to public keys; i.e. don’t overly enjoy agent forwarding. I’d rather not allow my “jump host login” key on the inner ring of systems.

Password management:

It seems the tool from Bruce Schneier is rather safe, I’d go away from the “common” choices like KeepassX.

Info / Download: https://www.schneier.com/passsafe.html

Firmware:

Make BIOS reflashing a POLICY.

Random number generators:

Expect you will need to switch them, personally I THINK you should immediately drop the comforts of haveged.

GnuPG

It was recommended more than one time.

Start using it more and more, putting more stuff in it than you’d have done till today.

Switches and routers:

Your network is NOT your friend.

  • IP ACLs are really a good thing to consider and piss off intruders
  • A good tool to set ACLs globally on your hardware is Googles capirca. Find it at a href=”https://code.google.com/p/capirca/”>https://code.google.com/p/capirca/. Shorewall etc. is more on the “nice for a host” level. We have come a long way with host based firewalls, but…
  • Think harder about how to secure your whole network. And how to go about replacing parts of it.

We can’t be sure which of our LAN active components are safe, your WAN probably IS NOT.

Clients

We really need to have PSF more commonspread.

Talk it over with your clients, how much ongoing damage is acceptable for helping the helpless XP users.

Guest WIFI

Do NOT run a flat home network.

Additions welcome, comment if you know something to *advance* things.

OpenStacking


OpenStack

Notes from IBM OpenStack workshop I was at.

  • i haven’t seen a single thing that was exciting when you’ve seen and used multiple solutions.

Many sensible features (i.e. deployment hosts like Oracle VM had) are just being added and *lol* with the same naive approach. (Oh, let’s add a dedicated server for this. Oh, lets run all  deployment jobs in parallel there, so they trash the disks and we cripple the benefit it could have brought)

  • I haven’t seen a single thing that is done much better than what’s in OpenNebula (and there it would be much much more efficient to work with)
  • There is a chance that OpenStack with it’s different components will be better at doing one thing, and doing that one thing good: From what I’ve seen it has a lot less issues than OpenNebula “when something goes wrong”, but on the other hand everything is under a pile of APIs and services anyway.

So, from a birds eye view: What you can do can hardly go wrong, but you also can’t really do a lot. Especially for people coming from VMWare the current state of (opensource) affairs is insulting.

Some detail points:

hybrid cloud: generally not considered workable, except for “extremely dumb” workloads like web etc. For those on, most people will be better served with a CDN type setup

Some (cloud vendors) sales people are actually running around selling a hybrid cloud that look like this: you/they add a backup active directory controller at their datacenter.

This of course is not AT ALL hybrid cloud or “bursting” but poses a problem. Knowledgeable people saying “sorry, but hybrid fully dynamic cloud is still tricky” will not be closing any sale. Instead the completely clueless sales drone will do, since he promises that it will work. since neither he or the customer knows the original idea, this works out

why doesn’t it work so far:

api elasticity including buying of new vms etc. was said to be rarely working, much less so if bare metal bringup is involved (adding hosts to remote cloud etc)

shrink down is also apparently a pretty ugly topic.

(The focus in this question was OpenStack to OpenStack bursting mostly)

Misunderstandings are expectations:

Can my VM move to the cloud, from vCenter to an openstack at the ISP?

General answer: no

General expectation: why not?

I wonder: why not just provide a good P2V tool (i.e. platespin) so this is on the list?

Sadly the relation between data lock in (meaning safe revenues) and lack of workload portability did not come up as a topic

This is a downward spiral – if you buy infrastructure, you can save some admin money. yet, that takes away the skill level you’d need to simply overstep those portabilty restrictions. Any seasoned admin could plan and handle an iSCSI based online migration from Cloud A to Cloud B.

But running off an IaaS (or MSP) platform, you might not have that as an in-House skill any longer.

Also tools that handle cloud federation DO exist, but are completely unknown.

Examples are Panacea and Contrail (this isnt Contrail related to SDN).

It is around for much longer, probably works but nobody knows (of it).

Sad so many millions were spent there, spent on the right thing but ultimately nothing came of it so far.

I think this would need unusual steps, i.e. on every 10m invested in openstack there need to be put 100k into marketing rOCCI / OCCI.

A nice hack was using OVF (sick format nonetheless) to avoid cloud-init style VM contextualization.

On the infrastructure front, it was shocking to see the networking stack to higher detail (we worked in a “smaller” multi-tenant example with a few 100 vlans). The OpenStack people decided to keep distance from any existing standard (like qinq, S-VLAN, PBB-TE) and instead made a large pile of shit with a lot of temporary / interconnecting vlans / vswitches.

The greatest shit was to see what they did for mac adresses:

When Xen came out, the Xensource Inc. guys went to the IEEE and got their own mac prefix of 00:16:3e.

Someone figured the best way to use use fa:16:3e. Of course they didn’t register that.

Probably thought he’s the most clever guy in the universe, except he simply didn’t get it at all.

All cross-host connections are done using on-the-fly GRE tunnels and all hosts are apparently fully meshed. I suppose this whole setup + OpenVSwitch are so ineffecient it doesn’t matter any more?

There are other mode selectable, and it seems to me that flow based switching will be less bullshit than what Openstack does by default.

I hope they don’t find out about DMVPN.

Problems of datacenter extension and cross-site VLANs were a no-concern topic.

Having a flat L2 seems to be oh so pretty. I am in fear.

What else did I dig into:

Rate limiting in this mess is a neccessity but seems to be possible. workable.

There are some hints at offloading intra-host switching when using Emulex CNA or Mellanox. It seems not possible with Solarflare.

I’m pretty sure someone at Emulex knows how to do it. But it is not documented any place you could just find it.

Considing this would have massive (positive) performance impact, it’s just sad.

Digging takeaway:

I would try to only use SR-IOV HBAs, ensure QoS is enforced at ingress (that means, on the vm hosts, before customer traffic from a VM reaches the actual wires)

Unanswered:

IP address assignments. One thing we didn’t get to was that creating the network required setting up IP ranges etc.

I’m not sold on the “IaaS needs to provide XXX” story at all.

In summary, I want to provide customers with a network of their own, optionally providing dhcp/local dns/load balancers/firewall/etc.

But by default it should be theirs – let me put it like this:

When you say “IaaS” I read infrastructure as a service. Not infrastructure services as a service. I’m sure a medium sized dev team can nicely integrate an enterprise’s IPAM and DNS services with “the cloud” but I doubt it will provide any benefit over using their existing management stack. Except for the medium sized dev team of course.

What I see is cloud stacks that are cluttered with features that bring high value to very small, startup like environments (remember i.e. the average OpenStack install is <100 cores). It’s cool to have them, but the thing is: If you’re expecting people to use them, you’re doing it wrong. They’re trivial, puny and useless (i.e. “yes we can assign ipv4”, “yes we can assign ipv6” – but what happens if you ask about dual stack? subnets?) and it’s becoming a bad joke to expect companies that do more than “something on the internet” to spend considerable time on de-integration of those startup convenience features.

Another interesting side note:

Softlayer is also Xen based. That’s the cloud service that made IBM suddenly the number one of the market.

Among Amazon, Rackspace, Linode, OrionVM and Softlayer using Xen, a 9x% VMWare share in the enterprise market (which is probably a lot bigger than cloud), I’m once again puzzled at the hybris of KVM community thinking they are the center of the universe. People tell me about oVirt / RHEV while it has NO RELEVANCE at all.

The only really cool KVM based place I know is Joyent. And they don’t even use Linux.

Oh, and, coming back to cloud, I’m still puzzled by the amount of attention Microsoft Azure gets in Germany. It seems the competitors (especially the higher end ones like, HP, IBM, Profitbricks, etc who actually offer SLAs worth the name) simply can’t get a shot at the Microsoft-addicted SMB and medium enterprise crowd.

That said (enough ranting) they are cool to have in a demo style setup like the one we played with.

IBM’s solution seems a nice middle ground – config adjustments are easily done, yet the deployment is highly automated and also highly reliable.

They’re going the right way, selling a rackful of servers with one usb stick to install the management server from. Wroooom[*].

Here’s your cloudy datacenter-in-a-box

p.s.: Wrooom was taking a little over an hour. Pretty different to what I’m used to with CFEngine now.

Ps2: Links: Contrail and http://contrail-project.eu/en_GB/federation and Panacea http://projects.laas.fr/panacea-cloud/node/31

Bacula version clash between 5 and 7


This is the second time I run into the error “Authorization key rejected by Storage daemon.”

It makes backups and restores impossible. Most traces / explanations on the internet will point at FD hostname or SD hostname or key mismatch issues.

That is of course always possible, but if you had it working until a week ago when you updated – please don’t let them discourage you. This error will also occur for any version 7 client connecting to a version 5 server. I’ve had it on my Macbook after running “port upgrade outdated” and just now on my FreeBSD desktop during a migration restore.

The jobs will abort after the client is asked to send/receive files.

Debug output of the storage daemon shows that this is in fact a client error!

the red herring, a bacula error message saying

Authorization key rejected by Storage daemon

is completely wrong.

They just abstracted / objectified their logging a little too much. The SD received the error “client didn’t want me” and has to pass it own. Not helpful. Sorry guys :)

As a warning / example, here have a look at the log:

JobName: RestoreFiles
Bootstrap: /var/lib/bacula/mydir-dir.restore.1.bsr
Where:
Replace: always
FileSet: Full Set
Backup Client: Egal
Restore Client: Egal
Storage: PrimaryFileStorage-int
When: 2014-09-14 12:40:15
Catalog: MyCatalog
Priority: 10
Plugin Options: *None*
OK to run? (yes/mod/no): yes
Job queued. JobId=17300
*mess
14-Sep 12:40 waxu0604-dir JobId 17300: Start Restore Job RestoreFiles.
14-Sep 12:40 waxu0604-dir JobId 17300: Using Device "PrimaryFileDevice"
14-Sep 12:39 Egal JobId 17300: Fatal error: Authorization key rejected by Storage daemon.
Please see http://www.bacula.org/en/rel-manual/Bacula_Freque_As[...]
*status client=Egal
Connecting to Client Egal at 192.168.xxx:9102

Egal Version: 5.2.12 (12 September 2012)  amd64-portbld-freebsd10.0
Daemon started 14-Sep-14 12:43. Jobs: run=0 running=0.
 Heap: heap=0 smbytes=21,539 max_bytes=21,686 bufs=50 max_bufs=51
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 
Running Jobs:
Director connected at: 14-Sep-14 12:43
No Jobs running.
====

As you saw the restore aborts while a status client is doing just fine.
The same client is now running its restore without ANY issue after doing no more than downgrading the client to version 5.

*status client=Egal
Connecting to Client Egal at 192.168.xxx.xxx:9102

Egal Version: 5.2.12 (12 September 2012)  amd64-portbld-freebsd10.0
Daemon started 14-Sep-14 12:43. Jobs: run=0 running=0.
 Heap: heap=0 smbytes=167,811 max_bytes=167,958 bufs=96 max_bufs=97
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 
Running Jobs:
JobId 17301 Job RestoreFiles.2014-09-14_12.49.00_41 is running.
      Restore Job started: 14-Sep-14 12:48
    Files=2,199 Bytes=1,567,843,695 Bytes/sec=10,812,715 Errors=0
    Files Examined=2,199
    Processing file: /home/floh/Downloads/SLES_11_SP3_JeOS_Rudder_[...]

All fine, soon my data will be back in place.

(Don’t be shocked by the low restore speed, my “server” is running the SDs off a large MooseFS share built out of $100 NAS storages.
I used to have the SDs directly on NAS and got better speeds with that but I like distributed storage better than speed)

No-copy extracting Xen VM tarballs to LVM


SUSE Studio delivers Xen VM images which is really nice. They contain a sparse image and a (mostly incomplete) VM config file. Since I’m updating them pretty often I needed a hack that saves on any unneeed copies and needs no scratch space, either.

Goal: save copy times and improve life quality instead of copying and waiting…

First, lets have a look at the contents and then let’s check out how to directly extract them…

(Oh. Great. Shitbuntu won’t let me paste here)

 

Well, great.

I’n my case the disk image is called:

SLES_11_SP3_JeOS_Rudder_client.x86_64-0.0.6.raw

It’s located in a folder named:

SLES_11_SP3_JeOS_Rudder_client-0.0.6/

 

So, what we can do is this:

First, set up some variables so we can shrink the command later on…

version=0.0.6
appliance=SLES_11_SP3_JeOS_Rudder_client
url=https://susestudio.com/...6_64-${version}.xen.tar.gz
appliance=SLES_11_SP3_JeOS_Rudder_client
folder=${appliance}-${version}
vmimage=${appliance}.x86_64-${version}.raw
lv=/dev/vgssdraid5/lvrudderc1

Then, tie it together to store our VM data.

wget -O- $url | tar -O -xzf - ${folder}/${vmimage} | dd of=$lv bs=1024k

Storing to a file at the same time:

wget -O- $url | tee /dev/shm/myfile.tar.gz | tar -O -xzf - ${folder}/${vmimage} |\
dd of=$lv bs=1024k

 

Wget will fetch the file, write it to STDOUT, tar will read STDIN, only extract the image file, and write the extracted data to STDOUT, which is then buffered and written by the dd.

 

If you’ll reuse the image for multiple VMs like me you can also write it to /dev/shm and, if RAM allows, also gunzip it. the gzip extraction is actually limiting the performance, and even tar itself seems to be a little slow. I only get around 150MB/s on this.

I do remember it needs to flatten out the sparse image while storing to LVM, but I’m not sure if / how that influences the performance.

 

(Of course none of this would be necessary if the OSS community hadn’t tried to ignore / block / destroy standards like OVF as much as they could. Instead OVF is complex, useless and unsupported. Here we are.)

Blackhat 2014 talks you should really really look at


This is my watchlist compiled from the 2014 agenda, many of those talks are important if you want to be prepared of future and current issues.

Very great to see there’s also a few talks that fall more into the “defense” category.

 

# Talks concerning incredibly big and relevant issues. I filed those under “the world is gonna end”.

The first two are worthy of that and hopefully wake up people in the respective design bodies:

  • CELLULAR EXPLOITATION ON A GLOBAL SCALE: THE RISE AND FALL OF THE CONTROL PROTOCOL
  • ABUSING MICROSOFT KERBEROS: SORRY YOU GUYS DON’T GET IT

Also annoying to horrible threats

  • EXTREME PRIVILEGE ESCALATION ON WINDOWS 8/UEFI SYSTEMS
  • A PRACTICAL ATTACK AGAINST VDI SOLUTIONS
  • BADUSB – ON ACCESSORIES THAT TURN EVIL
  • A SURVEY OF REMOTE AUTOMOTIVE ATTACK SURFACES

 Things that will actually help improve security practices and should be watched as food for thought

  • OPENSTACK CLOUD AT YAHOO SCALE: HOW TO AVOID DISASTER
  • CREATING A SPIDER GOAT: USING TRANSACTIONAL MEMORY SUPPORT FOR SECURITYo
  • BUILDING SAFE SYSTEMS AT SCALE – LESSONS FROM SIX MONTHS AT YAHOO
  • BABAR-IANS AT THE GATE: DATA PROTECTION AT MASSIVE SCALE
  • FROM ATTACKS TO ACTION – BUILDING A USABLE THREAT MODEL TO DRIVE DEFENSIVE CHOICES
  • THE STATE OF INCIDENT RESPONSE

What could end our world five years from now:

  • EVASION OF HIGH-END IPS DEVICES IN THE AGE OF IPV6

note, memorize, listen to recommendations

  • HOW TO LEAK A 100-MILLION-NODE SOCIAL GRAPH IN JUST ONE WEEK? – A REFLECTION ON OAUTH AND API DESIGN IN ONLINE SOCIAL NETWORKS
  • ICSCORSAIR: HOW I WILL PWN YOUR ERP THROUGH 4-20 MA CURRENT LOOP
  • MINIATURIZATION

scada / modbus / satellites

  • THE NEW PAGE OF INJECTIONS BOOK: MEMCACHED INJECTIONS
  • SATCOM TERMINALS: HACKING BY AIR, SEA, AND LAND
  • SMART NEST THERMOSTAT: A SMART SPY IN YOUR HOME
  • SVG: EXPLOITING BROWSERS WITHOUT IMAGE PARSING BUGS
  • THE BEAST WINS AGAIN: WHY TLS KEEPS FAILING TO PROTECT HTTP

Don’t recall what those two were about

  • GRR: FIND ALL THE BADNESS, COLLECT ALL THE THINGS
  • LEVIATHAN: COMMAND AND CONTROL COMMUNICATIONS ON PLANET EARTH

Xen Powermanagement


Hi all,

this is a very hot week and the sun is coming down on my flat hard. Yet, I’m not outside having fun: Work has invaded this sunday.

I ran into a problem: I need to run some more loaded VMs but it’s going to be hotter than usual. I don’t wanna turn into a piece of barbeque. The only thing I could do is to turn my Xen host’s powersaving features to the max.

Of course I had to write a new article on power management in the more current Xen versions from that… :)

Find it here: Xen Power management – for current Xen.

When I saved it I found, I also have an older one (which i wasn’t aware of anymore) that covers the Xen 3.4 era.

Xen full powersaving mode – for Xen 3.x

 

 

 

Trivia:
Did you know those settings only take a mouse click in VMWare?

Check_MK support for Allnet 3481v2


A friend of mine has had this thermometer and asked me to look into monitoring and setup.

I don’t think I ever put as much work into monitoring such a tiny device. Last evening and almost night I stabbed at it some more and finally completed the setup and documentation. I literally went to bed at 5am because of this tiny sensor.

To save others from this (and to make sure I have a reliable documentation for it…), I’ve made a wiki article out of the pretty tricky setup. Along the way I even found it still runs an old openssl.

You can check it out here:

http://confluence.wartungsfenster.de/display/Adminspace/Monitoring+Allnet+3418v2

The bitbucket version isn’t yet committed, I hope I will do this in a moment… :p
One interesting hurdle was I couldn’t do a check_mk package (using mkp list / mkp pack) since I also needed to include things from local/lib and similar folders. When I visit the MK guys again I’ll nag about this.

 

 

They have really pretty meters in their UI by the way.

Would hope something like it makes it to the nagvis exchange some day.

edit note: I initially wrote it has an “affected OpenSSL”. It seems they had built it back in 2012 without heartbeat, which is a nice and caring thing to do.
It’s still goddamn outdated.

Friday special: screenrc for automatic IRSSI start


Just wanted to share a little snippet.

This is my SSH+Screen config for my IRC box:

  • If I connect from any private system, I’ll get my irc window.
  • If it rebooted or something, the screen command will automatically re-create an IRSSI session for me.
  • If I detach the screen, i’m automatically logged out.
~$ cat .ssh/authorized_keys
command="screen -d -RR -S irc -U" ssh-[ key removed] me@mypc

The authorized keys settings enforce running only _this_ command, and the screen options set a title for, force-detach, force-reattach and force-create a screen session by the name “irc”.

~$ cat .screenrc 
startup_message off
screen -t irssi 1 irssi

The screenrc does the next step by auto-running irssi in win1 with title accordingly set.
(And it turns off the moronic GPL notice)
Irssi in itself is configured to autoconnect to the right networks and channels, of course. (to be honest: Irssi config is something I don’t like to touch more than every 2-3 years.)

On the clients I also have an alias in /etc/hosts for it, so if I type “ssh irc”, I’ll be right back on irc. Every time and immediately.

 

This is the tiny little piece of perfect world I was able to create, so I thought I’d share it.