Xen Powermanagement


Hi all,

this is a very hot week and the sun is coming down on my flat hard. Yet, I’m not outside having fun: Work has invaded this sunday.

I ran into a problem: I need to run some more loaded VMs but it’s going to be hotter than usual. I don’t wanna turn into a piece of barbeque. The only thing I could do is to turn my Xen host’s powersaving features to the max.

Of course I had to write a new article on power management in the more current Xen versions from that… 🙂

Find it here: Xen Power management – for current Xen.

When I saved it I found, I also have an older one (which i wasn’t aware of anymore) that covers the Xen 3.4 era.

Xen full powersaving mode – for Xen 3.x

 

 

 

Trivia:
Did you know those settings only take a mouse click in VMWare?

Switched from CentOS to Oracle Enterprise Linux


Over the last few weeks I have switched all my CentOS 5/6 systems over to the public yum repos from Oracle. They had made it really clear that they want more community users and that they’ll consider a “switchedover” system viable for support should you need / want support.

Compared to what I knew from other RHEL-support offerings like HP, Fujitsu or RedHat themselves this is pretty end-user-friendly.

Of course they’ll be happy if you get a support contract and a ksplice license.

But most probably I’ll not be needing any support, so what else was there?

For me the reasons were:

  • quicker updates with reliable release plan

I had already decided I *had* to get away from CentOS. Situations like last year where for half a year no security updates (or any other) were done, just can’t happen. RHEL wasn’t really interesting too since I don’t see myself running one system that subscribes channels and then distributes the updates and whatnot. My time isn’t free either and too many people miss that. If you assume some few $$$ for the subscription and then another 10 hrs/year for maintenance that *is* costly.

  • more current supported kernel (if needed)

Where I need it I can switch to the OEL kernel that is stripped down and more current. The german iX magazine had a good review of the “UBK” vs. Stock RH Kernel vs. SLES. While RedHat had put a lot more work on the RHEL kernel, most was backporting…

  • tmem support with less hassle

I can easily grab a kernel supporting TMEM to have more usable RAM

  • Supported platform for running virtualized Oracle DBs

This is just a side bonus, since I run a bastardized OVM for dom0. 🙂

 

 

So what were the experiences from switching over for a small-ish setup?

I run about 15 RHEL-clone-ish virtual machines and servers and switched all of them over time.

I did it in 3 groups, deciding by usage and backup footprint mostly.

To switch I used the CentOS2OEL script from Ksplice/Oracle.

You just run the script, let it replace the yum repos and then it de-downloads the packages and replaces them. If it all works out, you reboot into the new kernel and you’re done.

I reported back all errors I found and they even gave me instructions for how to complete the migration with each error. All these errors were subsequently fixed in their new version of the script, and they also were all limited to non-standard installs I run.

All in all, that was a pretty nice experience. Most of the issues I fixed within 3-5 minutes, for one I didn’t find the solution so I used the one they suggested and it worked 🙂

 

What else:

Where I had stripped down dependencies, some stuff was added back by yum. That was to be expected.

Binary compatible: I saw they actually say RedHat Enterprise Linux in /etc/redhat-release. I didn’t really like that, but I surprise this is a compatibilty thing. Many times I’ve had to fiddle with that file to install certain software. But I think it is potentially worse to give a false name. Maybe I’ll ask them about it.

Mirror Speed: The mirrors are a tad slow, but the number is also increasing. It just seems demand is bigger than what they can feed it.

Infiniband: There are OFED packages  and kernels, but Oracle’s own (UEK) kernels don’t ALL come with the infiniband packages. By the version history I feel they actually forgot about it. The standard RHEL-compatible kernels are as good or bad as the original ones are.

Yum updates: YES THEY ARE PROPERLY SIGNED:

 

yum updates plugin for Check_MK showing CRIT since it gets valid security data input

(sorry for the bad screenshot, wordpress is being a ***** again)

 

-bash-4.1# yum –security check-update
Loaded plugins: downloadonly, fastestmirror, priorities, security
Loading mirror speeds from cached hostfile
* epel: mirror.fraunhofer.de
Limiting package lists to security relevant ones
14 package(s) needed for security, out of 20 available

dhclient.x86_64 12:4.1.1-31.P1.0.1.el6_3.1 ol6_latest
dhcp-common.x86_64 12:4.1.1-31.P1.0.1.el6_3.1 ol6_latest
glibc.x86_64 2.12-1.80.el6_3.3 ol6_latest
glibc-common.x86_64 2.12-1.80.el6_3.3 ol6_latest
kernel.x86_64 2.6.32-279.2.1.el6 ol6_latest
kernel-firmware.noarch 2.6.32-279.2.1.el6 ol6_latest
krb5-libs.x86_64 1.9-33.el6_3.2 ol6_latest
nspr.x86_64 4.9.1-2.el6_3 ol6_latest
nss.x86_64 3.13.5-1.0.1.el6_3 ol6_latest
nss-sysinit.x86_64 3.13.5-1.0.1.el6_3 ol6_latest
nss-tools.x86_64 3.13.5-1.0.1.el6_3 ol6_latest
nss-util.x86_64 3.13.5-1.el6_3 ol6_latest
openldap.x86_64 2.4.23-26.el6_3.2 ol6_latest
sudo.x86_64 1.7.4p5-13.el6_3 ol6_latest

Cobbler:
I still haven’t switched my Cobbler servers over. That means I sometimes install CentOS just to migrate it a moment later.

Veritas:
I had installed Veritas Storage foundation basic and ran into some kernel module issues, the vxportal could not correctly load.
I’m wondering if that was coming down to a hardware issue in my server, because later on I found the onboard NICs caused all kinds of issues. I didn’t have a need to re-test this yet, but probably it couldn’t hurt to take a few more stabs at testing DKMS and friends in OEL.

That’s all I could think of now.

It’s been a breeze, got feedback from nice people(*) and now my systems have a lot of added stability by getting updates when I need them.

(if you consider Oracle the evil, then well, I just talked to a small part named Ksplice.
No, actually the same goes for the Xen devs, for the Oracle VM project people.
You can’t say that for some Linux distros)

local suid in linux ;)


There’s a ugly local suid root in linux that is only patched since, well, very recently. What’s worse there’s also a number of exploits already because this has apparently interested many people.

Further reading:

http://blog.zx2c4.com/749

Otherwise… Well. If you can, search (find -local -perm 4000 / ) and disable all your suid binaries (i.e. start X as root via login manager, then you don’t need suid startx) It’ll be over at some point lol.

The FreeBSD user in me has a strong urge to let out some laughter…

kldunload -f linux.ko 🙂

Can you find the DDOS?


This kept me a little busy on Friday night, a long running DDOS hammering at my server, specifically at the VPS subnet, not caring if the IPs were even allocated.

I Reported it to my ISP quite immediately, but didn’t get an answer so far.

At some point I figured this (I guess some few hundred kpps) was just beyond what I could fix on my own, and that this, after all, had not been my weekend plan.

I throttled all traffic to somewhere around 2KB/s and went off to buy Batman Arkham City instead.

This is a weekly RRD that averages the numbers down, but makes better for a comparism. The small spikes are daily backups, a few GB give or take. On the long green one you’ll see how traffic went down after throttling, and you can see it took a full day till the attack finally wore out.

When I looked there was about 5MB/s of incoming SYN with all kinds of funny options, and around 5MB/s of useless ICMP replies from my box. Gotta love comparing this to FreeBSD boxes which simply auto-throttle such an attack right…

Lessons learned:

  • Syncookies are not optional, you WANT them enabled.
  • Your kernel will reply to anything it feels reponsible for, thats why I had to concern with the many-MB’s of ICMP replies for the unallocated IP under attack.
  • Nullrouting unused IPs was the most helpful thing I did.
  • Throttling was the second most helpful, just next time it needs to be a lot more specific.
  • IPTables & tc syntax is a complete nightmare when compared to any router OS. I wonder what they took before designing their options. Every single thing it can do is twisted until it’s definitely non-straightforward.
  • Methodically working on shapers and drop rules was the wrong thing to do! Either have them prepared and ready to enable, or skip it and look at more powerful means right away. If someone is throwing nukes at you, then don’t spend the last minute setting up your air defences. 🙂
  • enabling the kernel ARP filter might be the right thing to suppress unwanted.responses – or it might break VM networking.
  • The check_mk/multsite idea of running quite a few distributed monitoring systems is great. Even if I lost livestatus connectivity to the system it still DID do the monitoring, so once I had reasonable bandwidth again all the recorded data was there to look at.
  • IMO this is much more cruicial with IDS logs. It’s very rare, but there are cases where a big nasty DDOS is just used to hide the real attack.
  • It feels a smart move to plan for real routers on the network. Of course,  that has certain disadvantages on the “OPEX” side of things. I got the routers, but rack units are not free :/
  • If you see a sudden traffic spike and spend hours trying to find a software bug or a hacked system, you might be looking at a DDOS probe. Look at this, recorded roughly two weeks earlier:
 I noticed this because I had quite well-tuned traffic monitoring already, using the ISP’s standard tools. Even then my guts had been telling me this was someone probing the the targets performance etc. prior to a real attack.

And, finally: I guess I now lost more sleep to playing Batman, even forgot I wanted to go to a party on Saturday. Those damn sidequests 🙂

Squeezing the Raidcontroller


This is a good day for benchmarking galore!


I’m trying to collect performance data for my controllers so that I can fine-tune the monitoring based on real measures, not educated guessing. I want to know the actual IOPS and MB per second limits and set the levels accordingly.

Todays victim is a

“Intel(R) RAID Controller SROMBSAS18E”

as found in the SR1550 servers on the active SAS backplane.

It is also very well known as the Dell PERC5…

With Intel servers you need addons for 512MB Ram and BBU. These came included with my server.

Right now we’re only doing readonly tests here. For one, the BBU is utterly discharged.

Test setup:

3x73GB 15K SAS drive in Raid0 config (IO Policy WB, Cached)

4x60GB OCZ Vertex2 in Raid10 config (IO Policy Direct)

OCZ Vertex2 SSDs

Linux Settings: cfq scheduler, Readahead is set to 1MB for both Luns.

Test scenario: Pull as fast off the disks as we can.

Write down the numbers from SAR afterwards.

[root@localhost ~]# dd if=/dev/sdc of=/dev/null bs=1024k count=10240 &  
dd if=/dev/sdb of=/dev/null bs=1024k count=10240
[1] 4198
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 37.0208 seconds, 290 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 37.0224 seconds, 290 MB/s

Average:          sdb   1677.58 553577.58      0.00    329.99      1.32      0.79      0.46     77.04
Average:          sdc   1867.27 578368.77      0.00    309.74      1.44      0.77      0.45     83.96

 

 

Other fun things to test now…

  • Switch to SATA SSD Raid0 instead of Raid10
  • Look at IO Overhead in Xen domU*
  • See how much faster the SR1625 will perform 🙂
  • Update the outdated firmware 🙂
  • Switch to deadline scheduler
*  already tried that one, still trying to really understand the results. Most important: enable readahead in dom0, helps heaps, if i remember correctly it bumped me from 300 to 400MB/s

Practical results, too:

If the controller peaks out at 580MB/s, I can now plan the number of 10k/15k/ssd…

Linux LVM mirroring comes at a price


You can find some nice article about clvm mirroring here http://www.joshbryan.com/blog/2008/01/02/lvm2-mirrors-vs-md-raid-1

A reader had already tried to warn people but I think it went unheard

LVM is not safe in a power failure, it does not respect write barriers and pass those down to the lower drives.

hence, it is often faster than MD by default, but to be safe you would have to turn off your drive’s write caches, which ends up making it slower than if you used write barriers.

First of all, he’s right. More on that below. Also I find it kinda funny how he goes into turning off write caches. I was under the impression that NOONE is crazy enough to have write caches enabled in their servers, unless they’re battery backed and the local disk is only used for swap anyway. I mean, that was the one guy who at least know about the barrier issue and he thinks it’s safe to run with his cache turned on.

All the pretty little linux penguins look soooo much faster – as long as we just disable all those safeguards that people built into unix over the last 20 years 🙂

Anyway, back to LVM mirrors!

We just learned: All devicemapper based IO layers in Linux can/will lose barriers.

Furthermore LVM2 has it’s own set of issues, and it’s important to chose wisely – I think these are the most notable items that can give you lots of trouble in a mirror scenario:

  • no sophisticated mirror write consistency (and worse, people who are using –corelog)
  • only trivial mirror policies
  • no good per LE-PE sync status handling
  • (no PV keys either? – PV keys are used to hash LE-PE mappings independent of PVID)
  • limited number of mirrors (this can turn into a problem if you wanna move data with added redundancy during the migration)
  • no safe physical volume status handling
  • too many userspace components that will work fine as long as everything is ok but can die on you if something is broken
  • no reliable behaviour on quorum loss (VG should not activate, optionally the server should panic upon quorum loss, but at LEAST vgchange -a y should be able to re-establish the disks once their back). I sometimes wonder if the LVM2 even knows a quorum?!!
  • On standard distros nothing hooks into the lvm2 udev event handlers, so there are no reliable monitors for your status. Besides, the lvm2 monitors suck seem to be still in a proof-of-concept state…

since barriers are simply dropped in the devicemapper (not in LVM btw) you should chose wisely whether to use lvm2 mirrors for critical data mirroring.

Summary:

  • LVM mirror may look faster, but it comes at a price
  • Things tend to be slower if they do something the proper way.

Of course, if you’re using LVM on top of MD you *also* lose barriers.

Usually we can all live pretty well with either of those settings, but we should be aware there are problems and that we opted managability / performace over integrity.

Personally I’ll see the management advantages of LVM as high enough to accept the risk of FS corruption. I think the chance of losing data is much higher when I manually mess around with fdisk or parted and MD on every occasion I add a disk etc.

If it were very critical data you can either replicate in the storage array (without LVM and multipath??????) or scratch up the money for a Veritas FS/Volume Manger license (unless you’re a Xen user like me… 😦 )

either way…:

SET UP THE MONITORING.

 

A little update here:

According to the LVM article on wikipedia.com the kernels from 2.6.31 do handle barriers correctly even with LVM. On the downside that article only covers Linux LVM and imho has a lot of factual errors, so I’m not sure I’ll just go and be a believer now.