Apache VirtualHost config gone after Wheezy to Jessie upgrade

Here’s a fun one that had me running in circles for a while today:

I’ve been running deluge and the deluge-webui on Debian Wheezy for a while now. Pretty solid. I needed to download a torrent using a magnet URI today and deluge-webui on Wheezy won’t do it. This feature was added to the webui in 1.3.4 though so the version in Jessie should work.

I did the typical dist-upgrade song and dance per the usual but after the upgrade Apache was all hosed up. It was just showing the default example page. All of access logs that would normally go to my configured virtual host were landing in /var/log/apache2/other_vhosts_access.log which is all wrong. I started out thinking it was the hostname of the system that got messed up but that was a dead end.

I started making progress when I found the command

apache2ctl -S

This dumps out a bunch of data about your configuration and it basically said that my VirtualHostM configuration was empty:

VirtualHost configuration:

Yeah it was basically an empty string. This seemed wrong but I wasn’t sure what to expect really. After banging around a bit longer and getting no where I finally decided to just disable and re-enable my site configuration. This was desperation because my site config was already linked into /etc/apache2/sites-enabled so it must have been enabled … right?

a2dissite mysite

But disabling it failed! It gave me some sort of “no such file” error. Whaaaaaa?. So I ran the commend through strace and it turns out that the new apache2 package on Jessie expects the site config file to have the suffix .conf. Changing the name of my site config fragment fixed this and I was then able to enable the config as expected.

That was unbelievably annoying. Hopefully this will save someone else a few minutes.

First OpenXT build

UPDATE: 2014-06-20 added section about using bash instead of dash
UPDATE: 2014-06-19 fix git clone URI
UPDATE: 2014-06-18 add genisoimage package to list of required packages.
UPDATE: 2014-06-18 remove the section on mangling the manifest file now that I’ve upstreamed a patch.
UPDATE: 2014-06-18 to reflect default mirror being available now.
UPDATE: 2014-06-18 to clarify location of STEPS variable and the setupoe step.

With the transition of XT to OpenXT I’m realizing that the mark of most successful open source projects is good tools and great documentation. Right now we’re a bit short on both. The tools to build the core of OpenXT aren’t our own, they’re maintained by the upstream OpenEmbedded and Yocto communities. The documentation to build OpenXT on the other hand is our responsibility. This is a quick recap of my first build of the code that’s up on github with step-by-step instructions so that you can follow along at home.

The Existing Docs

The closest thing to build docs that we have on github are a README that was left over from a previous attempt to open source this code. That work was done under the name “XenClient Initiative” or XCI for short. This project was before my time on the project but my understanding is that it wasn’t particularly successful.

I guess you can call OpenXT our second go at the open source thing. The instructions in this file are way out of date. They need to be replaced and hopefully this write-up will be the first step in fixing this.

Build Machine

There’s a lot of technical debt that’s built up over the years in the OpenXT code base. The first and most obvious bit of technical debt is in our build system. We require that the build be done on a 32 bit Debian Squeeze system. The 64 bit architecture may work but it’s untested AFAIK.

We require Squeeze for a number of reasons the most obvious of which is our dependency on the GHC 6.12 compiler. Wheezy ships with version 7 and our toolstack hasn’t been updated to work with the new compiler yet. To the Haskell hackers out there: your help would be much appreciated. No doubt there are other dependencies and issues that would need to be worked around in an upgrade but we know this one to be a specific and prominent issue.

The requirement for a 32 bit build host is likely that only 32 bit build hosts have been tested. Any one out there who tries a 64 bit build or a build on Wheezy please report your results so we can get documentation together for the tested and supported build hosts.

Required Packages

The initial list of packages required to build OpenXT can be obtained from the OE wiki. The requirements are:

sed wget cvs subversion git-core coreutils unzip
texi2html texinfo docbook-utils gawk python-pysqlite2
diffstat help2man make gcc build-essential g++
desktop-file-utils chrpath

The list is pretty short as far as build requirements go because OE builds nearly all of the required tools as part of the build. This is a big part of what makes OE so great.

Additionally we require a few extra packages:

ghc guilt iasl quilt bin86 bcc libsdl1.2-dev liburi-perl genisoimage

Packages like guilt and quilt are used in our bitbake patch queue class in the expected way. ghc is the Hasekll compiler which is required to … build the Haskell compiler (much like OE requires gcc to build gcc). genisoimage is used by our build scripts to put the final installer ISO together.

The remaining dependencies: iasl, bin86, bcc, libsdl1.2-dev, and liburi-perl are another instance of technical debt. These packages should be built in OE as dependencies of other packages. Instead our recipes take a short cut and require they be installed on the build host. This seems like a smart shortcut but it’s a shortcut that ends in cross-compile misery. This may be what causes issues between 32 and 64 bit build hosts.

A good example of how to fix these issues already exists. If you’ve been following upstream Yocto development the Xen recipe contributed there gets this right depending on the dev86-native and iasl-native packages. OpenXT would benefit from pulling in the meta-virtualization layer and using this recipe (thanks Chris!)

Bash vs Bourne

Bitbake recipes contain a lot of shell functions and fragments. Per the these will be executed by the host systems /bin/sh. Unfortunately lots of build metadata (including the OpenXT build metadata) is rife with ‘bashisms’. Because of this, Linux distros that don’t link /bin/sh to /bin/bash will cause builds to fail.

The way to resolve this is laid out in the Ubuntu section of the “OE and Your Distro” docs as Debian and thus Ubuntu use the dash shell instead of bash by default. Switching dash for bash is pretty easy thankfully:

sudo dpkg-reconfigure dash

Executing the command above will result in a dialog screen asking you whether or not you want to use dash as your default shell. Select ‘No’ and your system will use bash instead.

I’ve gone ahead and filed a ticket to get ‘bashisms’ out of the first place I ran into them in OpenXT: https://github.com/OpenXT/xenclient-oe/issues/1. If you’ve got some time to kill it would be helpful if someone could track down more of our dirty laundry like this, or better yet, send in a pull request to sort some of this out.

Clone and Configure

If you’re following along you should now have a 32 bit Debian Squeeze build host with some additional packages installed. The next step is to clone the base OpenXT git repo:

git clone git://github.com/OpenXT/openxt.git

This will give you a directory named openxt that contains our build scripts. Change into this directory and we’ll take a look at the important files.

Firstly the file you’ll most often customize in here is the .config but you don’t have one yet. Copy the example-config file to .config so we can modify it for our environment:

cp example-config .config

The .config file is read by the script do_build.sh that … does the build. There are a hand full of variables in .config that are interesting for our first build. For now we’ll ignore the rest.

STEPS

We’ll start with STEPS. This one isn’t defined in the example-config but it’s initialized in the do_build.sh script. Since this script imports all variables from the config we can add it to the config manually to get the desired effect.

This variable defines the build steps that will be carried out when do_config.sh is run with no options. The default steps don’t all work yet so we’ll set this to the minimum that we can get away with for now:

STEPS="initramfs,stubinitramfs,dom0,uivm,ndvm,syncvm,installer,installer2,ship,copy"

I’ve left out the necessary setupoe step because I typically run this one manually and check that my variables get populated in the OE local.conf successfully. You don’t need to do it this way but it may help you get a better understanding of how the configuration works. After I go through the necessary variables we’ll go back to setupoe.

OPENXT_MIRROR

Due to some of the software used in OpenXT being a bit old the upstream mirrors of a few source tarballs are no longer available. Further we require a number of Intel binary ACM modules for TXT to function properly. Intel hides these zips behind a lawyer wall requiring users to accept license terms before they can be downloaded. That’s great CYA from their legal department but it prevents any automated build from downloading them in the build so we have to mirror them ourselves (which is permitted by their license).

When I did my first build the other day the URL from the example configuration file wouldn’t resolve for me. So I set up my own mirror:

OPENXT_MIRROR="http://www.twobit.us/mirrors/openxt"

This should be fixed so check the default mirror first. If it doesn’t work feel free to clone my mirror in your local environment but do me a favor and go easy on the bandwidth please.

UPDATE: The default mirror is fixed. You should use the default value (http://openxt.xci-test.com/mirror/) and not my mirror … well really you should set up your own mirror because there’s no way to tell how long the default mirror will stay up and it’s nice to keep redundant traffic to a minimum.

Signing Certs

The topic of signing releases is a huge one so for now we’ll just stick to the minimum work required to get our build signed once it’s done. There are several relevant config variables for this:

REPO_PROD_CACERT="/path/to/prod-cacert.pem"
REPO_DEV_CACERT="/path/to/dev-cacert.pem"
REPO_DEV_SIGNING_CERT="/path/to/dev-cacert.pem"
REPO_DEV_SIGNING_KEY="/path/to/dev-cakey.pem"

We require that each builder create their own keys for development and release builds. In this example I’m using self signed certs so it’s as simple as possible. Use the following commands to create your keys and your self signed certs:

openssl genrsa -out cakey.pem 2048
openssl req -new -x509 -key cakey.pem -out cacert.pem -days 1095

You’ll need two key/cert pairs, one for the automated signing of the build (a ‘dev’ key) and the certificate for a production signing key. All protection of the production signing key is the responsibility of whoever is signing the release. I’ll cover this in another post at another time. For now just make the keys and the certs and put the variables in the .config file.

OE_BUILD_CACHE_DL

For those of you who have used OE in a previous life you know how huge an OE build directory can get. Part of this is caused the the OE download cache which is the directory where OE caches downloaded source code. In the OE local.conf file this is specified by the DL_DIR variable. We use OE_BUILD_CACHE_DL instead.

Personally my build system has a RAID SSD set up to keep my builds as fast as possible but I can’t afford enough SSDs to support having my DL_DIR on my SSD storage. Typically I’ll use larger but slower storage (NFS RAID array) for my download mirror that I share between projects. Often times I’ll just link that slower storage mount directly into my build tree to keep everything in one place. Do whatever works best for you and remember this is completely optional. You can leave this out and the build will just use a directory in your build but remember this will make your build much larger:

OE_BUILD_CACHE_DL="/path/to/oe-download"

Even with the download cache on a separate volume an OpenXT build takes up a lot of disk space. This minimal build of just the core service VMs and no in-guest tools weighs in at 74G. My download cache is shared between all of my OE projects so I can’t say exactly how large the cache for a fresh OpenXT build will be. My combined cache is ~20G but I’d expect the files for OpenXT are a small portion of that.

Start Your Engines

That’s about all it should take. Next I run the do_build.sh script explicitly executing the setupoe step:

./do_build.sh -s setupoe

This step clones all of the git repos you need to build OpenXT (mostly OE meta-layers) and then processes your .config file to generate the OE local.conf. You can take a look at this file to see the variables and make sure they’re set to the right thing. This can be very helpful when debugging or if you want to manually change them for some reason.

After OE has been setup you can just run the do_build.sh script and then go drink coffee for the next 6 hours:

./do_build.sh | tee build.log

If all goes well at the end of this you’ll have an ISO with the OpenXT installer on it ready to go here:

./build-output/openxt-dev--master/iso/installer.iso

There’s no magic here. The comma separated list of steps we put into the STEPS variable can all be invoked one at a time the same way we ran setupoe. So if you want to build just the dom0 initramfs you can do so like this:

./do_build.sh -s initramfs | tee initramfs.log

Piping the build output to a file is always useful when things break (and they will break). My recommendation would be to build a few steps individually before you go and do a full build. I’ll probably do another post about the individual build steps and how all that works next.

Now go forth, build, install, and report bugs.

XT is Open

The last few years of my processional life have been, well, interesting … as in the apocryphal Chinese curse. Going into the details of the business end of this is beyond the scope of what belongs on this website. The bits related to the technology of XenClient XT however have been written about here in the past (see my tags for XenClient and my work on measured launch). The technology is always the most interesting and important stuff anyways and this time it’s not different.

This post marks an important mile stone: the core technology that makes up XenClient XT is now available in the open source as OpenXT. In my opinion this moment is long overdue. There have been several attempts to move this project to an open source development model in the past. Most of these efforts were before my time so I can’t claim any fame there. Those working on this project before me who realized that this technology would be best served by an open development model were indeed before their time and well ahead of the “decision makers” who held them back.

My only hope now is that we aren’t too late. That there is still some value in this code and that the world hasn’t passed us by. That the small community of people who care about the intersection of virtualization and security will rally and contribute to the code base to help us pay off the technical debt that has built up over the years and push forward new features and architectural advancements.

The new OpenXT tag will track work on this project. I filed the first bug against our bitbake metadata just now so hopefully it’s the first of many filed and fixed in the future. Happy hacking!

building HVM Xen guests

On my Xen systems I’ve run pretty much 99% of my Linux guests paravirtualized (PV). Mostly this was because I’m lazy. Setting up a PV guest is super simple. No need for partitions, boot loaders or any of that complicated stuff. Setting up a PV Linux guest is generally as simple as setting up a chroot. You don’t even need to install a kernel.

There’s been a lot of work over the past 5+ years to add stuff to processors and Xen to make the PV extensions to Linux unnecessary. After checking out a presentation by Stefano Stabilini a few weeks back I decided I’m long overdue for some HVM learning. Since performance of HVM guests is now better than PV for most cases it’s well worth the effort.

This post will serve as my documentation for setting up HVM Linux guests. My goal was to get an HVM Linux installed using typical Linux tools and methods like LVM and chroots. I explicitly was trying to avoid using RDP or anything that isn’t a command-line utility. I wasn’t completely successful at this but hopefully I’ll figure it out in the next few days and post an update.

Disks and Partitions

Like every good Linux user LVMs are my friend. I’d love a more flexible disk backend (something that could be sparsely populated) but blktap2 is pretty much unmaintained these days. I’ll stop before I fall down that rabbit hole but long story short, I’m using LVMs to back my guests.

There’s a million ways to partition a disk. Generally my VMs are single-purpose and simple so a simple partitioning scheme is all I need. I haven’t bothered with extended partitions as I only need 3. The layout I’m using is best described by the output of sfdisk:

# partition table of /dev/mapper/myvg-hvmdisk
unit: sectors
 
/dev/mapper/myvg-hvmdisk1 : start=     2048, size=  2097152, Id=83
/dev/mapper/myvg-hvmdisk2 : start=  2099200, size=  2097152, Id=82
/dev/mapper/myvg-hvmdisk3 : start=  4196352, size= 16775168, Id=83
/dev/mapper/myvg-hvmdisk4 : start=        0, size=        0, Id= 0

That’s 3 partitions, the first for /boot, the second for swap and the third for the rootfs. Pretty simple. Once the partition table is written to the LVM volume we need to get the kernel to read the new partition table to create devices for these partitions. This can be done with either the partprobe command or kpartx. I went with kpartx:

$ kpartx -a /dev/mapper/myvg-hvmdisk

After this you’ll have the necessary device nodes for all of your partitions. If you use kpartx as I have these device files will have a digit appended to them like the output of sfdisk above. If you use partprobe they’ll have the letter ‘p’ and a digit for the partition number. Other than that I don’t know that there’s a difference between the two methods.

Then get the kernel to refresh the links in /dev/disk/by-uuid (we’ll use these later):

$ udevadm trigger

Now we can set up the filesystems we need:

$ mkfs.ext2 /dev/mapper/myvg-hvmdisk1
$ mkswap /dev/mapper/myvg-hvmdisk2
$ mkfs.ext4 /dev/mapper/myvg-hvmdisk3

Install Linux

Installing Linux on these partitions is just like setting up any other chroot. First step is mounting everything. The following script fragment

# mount VM disks (partitions in new LV)
if [ ! -d /media/hdd0 ]; then mkdir /media/hdd0; fi
mount /dev/mapper/myvg-hvmdisk3 /media/hdd0
if [ ! -d /media/hdd0/boot ]; then mkdir /media/hdd0/boot; fi
mount /dev/mapper/myvg-hvmdisk1 /media/hdd0/boot
 
# bind dev/proc/sys/tmpfs file systems from the host
if [ ! -d /media/hdd0/proc ]; then mkdir /media/hdd0/proc; fi
mount --bind /proc /media/hdd0/proc
if [ ! -d /media/hdd0/sys ]; then mkdir /media/hdd0/sys; fi
mount --bind /sys /media/hdd0/sys
if [ ! -d /media/hdd0/dev ]; then mkdir /media/hdd0/dev; fi
mount --bind /dev /media/hdd0/dev
if [ ! -d /media/hdd0/run ]; then mkdir /media/hdd0/run; fi
mount --bind /run /media/hdd0/run
if [ ! -d /media/hdd0/run/lock ]; then mkdir /media/hdd0/run/lock; fi
mount --bind /run/lock /media/hdd0/run/lock
if [ ! -d /media/hdd0/dev/pts ]; then mkdir /media/hdd0/dev/pts; fi
mount --bind /dev/pts /media/hdd0/dev/pts

Now that all of the mounts are in place we can debootstrap an install into the chroot:

$ sudo debootstrap wheezy /media/hdd0/ http://http.debian.net/debian/

We can then chroot to the mountpoint for our new VMs rootfs and put on the finishing touches:

$ chroot /media/hdd0

Bootloader

Unlike a PV guest, you’ll need a bootloader to get your HVM up and running. A first step in getting the bootloader installed is figuring out which disk will be mounted and where. This requires setting up your fstab file.

At this point we start to run into some awkward differences between our chroot and what our guest VM will look like once it’s booted. Our chroot reflects the device layout of the host on which we’re building the VM. This means that the device names for these disks will be different once the VM boots. On our host they’re all under the LVM /dev/mapper/myvg-hvmdisk and once the VM boots they’ll be something like /dev/xvda.

The easiest way to deal with this is to set our fstab up using UUIDs. This would look something like this:

# / was on /dev/xvda3 during installation
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /               ext4    errors=remount-ro 0       1
# /boot was on /dev/xvda1 during installation
UUID=yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot           ext2    defaults        0       2
# swap was on /dev/xvda2 during installation
UUID=zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz none            swap    sw              0       0

By using UUIDs we can make our fstab accurate even in our chroot.

After this we need to set up the /etc/mtab file needed by lots of Linux utilities. I found that when installing Grub2 I needed this file in place and accurate.

Some data I’ve found on the web says to just copy or link the mtab file from the host into the chroot but this is wrong. If a utility consults this file to find the device file that’s mounted as the rootfs it will find the device holding the rootfs for the host, not the device that contains the rootfs for our chroot.

The way I made this file was to copy it off of the host where I’m building the guest VM and then modify it for the guest. Again I’m using UUIDs to identify the disks / partitions for the rootfs and /boot to keep from having data specific to the host platform leak into the guest. My final /etc/mtab looks like this:

rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=10240k,nr_inodes=253371,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=203892k,mode=755 0 0
/dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx / ext4 rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=617480k 0 0
/dev/disk/by-uuid/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot ext2 rw,relatime,errors=continue,user_xattr,acl 0 0

Finally we need to install both a kernel and the grub2 bootloader:

$ apt-get install linux-image-amd64 grub2

Installing Grub2 is a pain. All of the additional disks kicking around in my host confused the hell out of the grub installer scripts. I was given the option to install grub on a number of these disks and none were the one I wanted to install it on.

In the end I had to select the option to not install grub on any disk and fall back to installing it by hand:

$ grub-install --force --no-floppy --boot-directory=/boot /dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

And then generate the grub config file:

update-grub

If all goes well the grub boot loader should now be installed on your disk and you should have a grub config file in your chroot /boot directory.

Final Fixups

Finally you’ll need to log into the VM. If you’re confident it will boot without you having to do any debugging then you can just configure the ssh server to start up and throw a public key in the root homedir. If you’re like me something will go wrong and you’ll need some boot logs to help you debug. I like enabling the serial emulation provided by qemu for this purpose. It’ll also allow you to login over serial which is convenient.

This is pretty standard stuff. No paravirtual console through the xen console driver. The qemu emulated serial console will show up at ttyS0 like any physical serial hardware. You can enable serial interaction with grub by adding the following fragment to /etc/default/grub:

GRUB_TERMINAL_INPUT=serial
GRUB_TERMINAL_OUTPUT=serial
GRUB_SERIAL_COMMAND="serial --speed=38400 --unit=0 --word=8 --parity=no --stop=1"

To get your kernel to log to the serial console as well set the GRUB_CMDLINE_LINUX variable thusly:

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,38400n8"

Finally to get init to start a getty with a login prompt on the console add the following to your /etc/inittab:

T0:23:respawn:/sbin/getty -L ttyS0 38400 vt100

Stefano Stabilini has done another good write-up on the details of using both the PV and the emulated serial console here: http://xenbits.xen.org/docs/4.2-testing/misc/console.txt. Give it a read for the gory details.

Once this is all done you need to exit the chroot, unmount all of those bind mounts and then unmount your boot and rootfs from the chroot directory. Once we have a VM config file created this VM should be bootable.

VM config

Then we need a configuration file for our VM. This is what my generic HVM template looks like. I’ve disabled all graphical stuff: sdl=0, stdvga=0, and vnc=0, enabled the emulated serial console: serial='pty' and set xen_platform_pci=1 so that my VM can use PV drivers.

The other stuff is standard for HVM guests and stuff like memory, name, and uuid that should be customized for your specific installation. Things like uuid and the mac address for your virtual NIC should be unique. There are websites out there that will generate these values. Xen has it’s own prefix for MAC addresses so use a generator to make a proper one.

builder = "hvm"
memory = "2048"
name = "myvm"
uuid = "uuuuuuuu-uuuu-uuuu-uuuu-uuuuuuuuuuuu"
vcpus = 1
cpus = '0-7'
pae=1
acpi=1
apic=1
boot='c'
xen_platform_pci=1
sdl=0
vnc=0
vnclisten='0.0.0.0'
stdvga=0
serial='pty'
 
disk = [
    '/dev/ssdraid1/wwwhome,raw,xvda,rw'
]
vif = [
    'mac=XX:XX:XX:XX:XX:XX,model=e1000',
]

Boot

Booting this VM is just like booting any PV guest:

xl create -c /etc/xen/vms/myvm.cfg

I’ve included the -c option to attach to the VMs serial console and ideally we’d be able to see grub and the kernel dump a bunch of data as the system boots.

TODO

I’ve tested these instructions twice now on a Debian Wheezy system with Xen 4.3.1 installed from source. Both times Grub installs successfully but fails to boot. After enabling VNC for the VM and connecting with a viewer it’s apparent that the VM hangs when SEABIOS tries to kick off grub.

As a work-around both times I’ve booted the VM from a Debian rescue ISO, setup a chroot much like in these instructions (the disk is now /dev/xvda though) and re-installed Grub. This does the trick and rebooting the VM from the disk now works. So I can only conclude that either something from my instructions w/r to installing Grub is wrong but I think that’s unlikely as they’re confirmed from numerous other “install grub in a chroot” instructions on the web.

The source of the problem is speculation at this point. Part of me wants to dump the first 2M of my disk both after installing it using these instructions and then again after fixing it with the rescue CD. Now that I think about it the version of Grub installed in my chroot is probably a different version than the one on the rescue CD so that could have something to do with it.

Really though, I’ll probably just install syslinux and see if that works first. My experiences with Grub have generally been bad any time I try to do something out of the ordinary. It’s incredibly complicated and generally I just want something simple like syslinux to kick off a very simple VM.

I’ll post an update once I’ve got to the bottom of this mystery. Stay tuned.

tboot 1.8.0 and UEFI

Version 1.8.0 of tboot was released a while back. This is a pretty big deal as the EFI support has been a long time coming. Anyone wanting to use tboot on a modern piece of hardware using EFI has been out of luck till now.

For the past week or so I’ve been slowly figuring out how to build an OE image with grub-efi, building the new version of tboot and then debugging an upgrade in meta-measured. My idea of a good time for sure.

As always the debugging was the hardest part, building the software was easy. For the most part tboot EFI “just worked” … after I figured out all the problems with kernel version and grub configuration. Hard parts were

  • realizing the Linux kernel image had to be the latest 3.14 version
  • debugging new kernel version
  • configuring grub
  • which modules needed to be built into grub

If you want the details you can see the full history on the meta-meausred github. The highlights are pretty simple:

multiboot2 in oe-core grub-efi

The grub-efi recipe in oe-core is a bit rigid. I’ve pushed a patch upstream that allows another layer (like meta-measured) to modify which grub modules are built into the grub EFI executable. It’s a tiny change but it makes all of the difference:

http://lists.openembedded.org/pipermail/openembedded-core/2014-April/091768.html

This lets us add modules to the grub EFI executable. I also had to cobble together a working grub multiboot2 configuration.

linux-yocto v3.14

Pairing this with the older 3.10 Yocto Linux kernel image will allow you to get through grub and tboot but the kernel will panic very early in the boot process. The newer 3.14 doesn’t suffer from this limitation.

The measured reference image in meta-measured used aufs to keep from having to mount the rootfs read/write. This is to keep the rootfs hash from changing across boots. I wrote the whole thing up a while back: http://twobit.us/blog/2013/01/meta-measured/. Anyways aufs doesn’t work in 3.14 so I took the extra few minutes to migrate the image to use the read-only-rootfs IMAGE_FEATURE. This is a good thing regardless, aufs was being used as a shortcut. I hadn’t had the drive to fix this till it broke. Problem solved.

rough edges

I still haven’t figured out all of the details in grub and it’s configuration. The current configuration in meta-measured is sufficient to boot but something gets screwed up in setting up VGA output for tboot and the early kernel output. Currently grub displays an error message indicating that tboot won’t get a console and no VGA output will be shown till the kernel loads the DRM driver. Output is still available on the serial console so if you’ve got a reasonable test setup you can get all the data you need for debugging.

No lies, I’m a bit afraid of grub, guess I’ll have to get over it. The measured-image-bootimg has a menuentry for tboot and a normal linux boot. Booting the kernel using the linux and initrd grub commands provide normal VGA output but the multiboot2 config required by tboot does not. I take this to mean that grub is capable of doing all of the necessary VGA stuff but that it can’t pass this data through to tboot via multiboot2. More to come on this soon hopefully.

Till then, if you build this stuff and have feedback leave it here.

OE image package part 2

I spent the weekend in bed with a cold … and with a laptop hacking on ways to get OE rootfs images packaged and installed in other OE rootfs images. My solution isn’t great but it’s functional and it doesn’t require that every image be built in series. But before I get too far let’s go over what I set out to achieve and get a rough set of requirements:

  • I’m building a rootfs that itself contains other rootfs’. The inner rootfs’ will be VMs. The outer rootfs will be the host with the hypervisor (Xen dom0).
  • Build speeds are important so I’d like to share as much of the build infrastructure between VMs as possible.
  • Running builds in parallel is a good thing. Running all builds as serial operations is a non-starter. Bonus points for being able to distribute them across multiple hosts.
  • Having to implement a pile of shell script outside of bitbake to make this work means you’re doing it wrong. The script that automates this build should be doing little more than calling bitbake.

First things first: my solution isn’t perfect. It does work pretty well though and achieves much of the above. Below is a quick visual of what I intend for the end product to support:

rootfs image relationships

On the left is the simple case I’m working to support currently. The boxes represent the root file systems (rootfs) that bitbake is churning out. The lines pointing from one rootfs to another represent one rootfs being packaged in another. dom0 here would be a live image and it would boot the NDVM automatically. Naturally the NDVM rootfs must be contained within dom0 for this to work. The right hand side is an eventual goal.

To support what most people think of as a ‘distro’ you need an installer to lay things down on a physical disk and if users expect to be able to run arbitrary workloads / VMs then they’ll want the whole disk available. In this scenario the installer image rootfs will have the image packages for the VMs installed in it (including dom0!). The installer when do it’s thing laying dom0 down in a partition but it can also drop the supporting VMs images into another partition. After installation, dom0 is booted from physical media and it will be able to boot these supporting VMs.

Notice the two level hierarchical relationship between the rootfs images in the diagram. The rootfs’ on the lower part of the diagram are completely independent and thus can be built in parallel. This will make them easily distributed across multiple build systems. Now on to some of the methods I tried to realize this goal and eventually one that worked!

Changing DISTRO in a build tree

The first thing I played around with was rewriting my local.conf between image builds to change the DISTRO. I use a different DISTRO configs to make package customizations that differentiate my images. This would allow me to add a simple IMAGE_POSTPROCESS_COMMAND to copy service VM rootfs images into the outer image (again, dom0 or an installer).

I had hoped I’d be able to pull this off and just have bitbake magically pick up the differences so I could build multiple images in the same build tree. This would make my image builds serial but possibly very efficient. This caused several failures in my tests however so I decided it would be best to keep separate builds for my individual images. I should probably find the right mailing lists to help track down the root cause of this but I expect this is well outside of the ‘supported’ bitbake / OE use cases.

Copying data across build trees

As a fall-back I came up with a hack in which I copy the needed build artifacts (rootfs & kernel image) to a common location as a post processing step in the image recipe. I’ve encapsulated this in a bbclass in anticipation of using the same pattern for other VM images. I’ve called this class integral-image-export.bbclass:

inherit core-image
 
do_export() {
    manifest_install() {
        if [ ! -z "$1" ]; then
            install -m 0644 "$1" "$4"
            printf "%s *%s\n" "$(sha256sum --binary $1 | awk '{ print $1 }')" "$2" >> $3
        fi
    }
 
    # only do export if export dir is defined
    if [ ! -z "${INTEGRAL_EXPORT_DIR}" ]; then
        ROOT="${INTEGRAL_EXPORT_DIR}/${PN}-$(date --utc +%Y-%m-%dT%H:%M:%S.%NZ)"
        FS_FILE="${IMAGE_BASENAME}-${MACHINE}.ext3"
        KERN_FILE="${KERNEL_IMAGETYPE}-${MACHINE}.bin"
        KERN_PATH="${DEPLOY_DIR_IMAGE}/${KERN_FILE}"
        MANIFEST="${ROOT}/manifest"
        mkdir -p ${ROOT}
        manifest_install "${KERN_PATH}" "${KERN_FILE}" "${MANIFEST}" "${ROOT}"
        manifest_install "${ROOTFS}" "${FS_FILE}" "${MANIFEST}" "${ROOT}"
    fi
}
 
addtask export before do_build after do_rootfs

It lives here https://github.com/flihp/meta-integral/blob/master/classes/integral-image-export.bbclass. So by having my NDVM image inherit this class, and properly defining the INTEGRAL_EXPORT_DIR in my builds local.conf, the NDVM image recipe will copy these build artifacts out of the build tree.

Notice that the destination directory has an overly precise time stamp as part of its name. This is an attempt to create unique identifiers for images without having to track incrementing build numbers. Also worth noting is the manifest_install function. Basically this generates a file in the same format as the sha*sum utilities with the intent of those programs being able to verify the manifest.

Eventually I think it would be useful for a manifest to contain data about the meta layers that went into building the image and the hashes of the git commit checked out at the time of the build. This later bit will be useful if a build ever has to be recreated. Not something that’s necessary yet however.

Consuming exported images

After exporting these build artifacts we have to cope with other images that want to consume them. My main complaint about using a build script outside of my built tree to place images within one another is that I’d have to re-size existing file systems. Bitbake already builds file systems so resizing them from an external script seemed very ugly. Further changes to the images built by bitbake (ext3/iso/hddimg etc) would have to be coordinated with said external script. Very ugly indeed.

The most logical solution was to create a new recipe as a way to package the existing build artifacts into a package that can be consumed by an image. By ‘package’ I mean your typical ipk or rpm. This allows bitbake to continue to do all of the heavy lifting in image building for us. Assuming the relationships between images shown above, it allows the outer image to include the image package using the standard IMAGE_INSTALL mechanism. That feels borderline elegant compared to rewriting the generated file systems IMHO.

So from the last section we have builds that are pumping out build artifacts and for the case of our example we’ll say they land in /mnt/integral/image-$stamp where $stamp is a unique time stamp. On the other hand we need to create a recipe that consumes the artifacts (I’ll call it an ‘image package recipe’ from here out) in these directories. Typically in a bitbake recipe you’ll provide a URI to your source code in the SRC_URI variable and define the files that go into the image using FILES_${PN}. These are generally defined statically in the recipe. Our case is weird in that we want the image package recipe to grab the latest image exported by some other build. So we must dynamically generate these variables.

Though I’ve never seen these variables generated dynamically (aside from using the PN and PV variables in URIs) but it’s surprisingly easy. bitbake supports anonymous python functions that get run when the recipe is parsed. This happens before any tasks are executed so setting SRC_URI and PV in this function works quite well. The method for determining the latest images that our build has exported is a simple directory listing and sorting operation:

python() {
    import glob, os, subprocess
 
    # check for valid export dir
    INTEGRAL_EXPORT_DIR = d.getVar ('INTEGRAL_EXPORT_DIR', True)
    if INTEGRAL_EXPORT_DIR is None:
        bb.note ('INTEGRAL_EXPORT_DIR is empty')
        return 0
    if not os.path.isdir (INTEGRAL_EXPORT_DIR):
        bb.fatal ('INTEGRAL_EXPORT_DIR is set, but not a directory: {0}'.format (INTEGRAL_EXPORT_DIR))
        return 1
 
    PN = d.getVar ('PN', True)
    LIBDIR = d.getVar ('libdir', True)
    XENDIR = d.getVar ('XENDIR', True)
    VMNAME = d.getVar ('VMNAME', True)
 
    # find latest ndvm and verify hashes
    IMG_NAME = PN[:PN.rfind ('-')]
    DIR_LIST = glob.glob ('{0}/{1}*'.format (INTEGRAL_EXPORT_DIR, IMG_NAME))
    DIR_LIST.sort (reverse=True)
    DIR_SAVE = os.getcwd ()
    os.chdir (DIR_LIST [0])
    try:
        DEV_NULL = open ('/dev/null', 'w')
        subprocess.check_call ('sha256sum -c manifest', stdout=DEV_NULL, shell=True)
    except subprocess.CalledProcessError:
        return 1
    finally:
        DEV_NULL.close ()
        os.chdir (DIR_SAVE)
 
    # build up SRC_URI and FILES_${PN} from latest NDVM image
    d.appendVar ('SRC_URI', 'file://{0}/*'. format (DIR_LIST [0]))
    d.appendVar ('FILES_{0}'.format (PN), ' {0}/{1}/{2}/*'.format (LIBDIR, XENDIR, VMNAME))
 
    # set up ${S}
    WORKDIR = d.getVar ('WORKDIR', True)
    d.setVar ('S', '{0}/{1}'.format (WORKDIR, DIR_LIST [0]))
 
    return 0
}

If you’re interested in the full recipe for this image package you can find it here: https://github.com/flihp/meta-integral/blob/master/recipes-integral/images/integral-image-ndvm-pkg.bb

The ‘manifest’ described above is also verified and processed. Using the file format of the sha256sum utility is a cheap approximation of the OE SRC_URI[sha256sum] metadata. This is a pretty naive approach to finding the “right” image to package as it doesn’t give the outer image much say over which inner image to pull in: It just grabs the latest. Some mechanism for the consuming image to specify which image it consumes would be useful.

So that’s about it. I’m pretty pleased with the outcome but time will tell how useful this approach is. Hopefully I’ll get a chance to see if it scales well in the future. Throw something in the comments if you get a chance to play around with this or have thoughts on the topic.

OE image package

Here’s a fun problem that I don’t yet have a solution to: I want to build a single image with OE. This image will be my dom0. I want to include other images in this image. That is to say I want to package service VMs as part of / in my dom0.

All of the research I’ve done up till now (all 30 minutes of it) points to this having never been done before. I could be using the wrong keywords but I the ones I tried turned up nothing on the respective OE and Yocto mailing lists. There seem to be a huge number of pitfalls here including things like changing the DISTRO_FEATURES in effect for the images as well as selecting image specific files for packages. On a few occasions I’ve used the distro name as a way to select specific configuration files like an fstab or interfaces.

What I want is to run bitbake once for the dom0 image and have it build all the other images and install them as packages in dom0. So I’d need to have recipes that actually package the images so they can be installed in another image. I think that will be the easy part.

The hard part will be making packages specific to each image with different files specific to the image. The only thing I can come up with for this is to play ugly tricks like building each VM image with a different MACHINE type but I’m not even sure if that will work. I guess all I can do for now is to experiment a bit and get on the mailing list to make sure I’m not duplicating work that’s already been done. This could get ugly.

OpenEmbedded Xen Network Driver VM

I wrote about a similar topic what feels like ages ago and I guess it was (8 months is a long time in this business). Since then I’ve been throwing some spare time at this same problem and I’ve actually made measurable progress. There have been a number of serendipitous events that came together to make this possible, the most important of which is the massive update to the Xen recipe in meta-virtualization. With this it’s super easy to crank out a Xen pvops kernel so combining this with an image that has the right plumbing in place it’s not as hard as you might think to build an NDVM.

So armed with the new Xen stuff from meta-virtualization I set out to build a reference NDVM. This isn’t intended to replace the NDVM in a system like XenClient-XT which is far more sophisticated. It’s just intended for experimentation and I don’t intend to build anything more sophisticated than a dumb Ethernet bridge into it.

To host this I’ve started a layer I call ‘meta-integral’. I know, all the good names were taken. Anyways this is intended to be as sort of distro layer where I can experiment with Xen stuff. Currently I’ve got a distro config for dom0 and an NDVM. The dom0 work is still very much a work in progress but the NDVM (much simpler) will actually boot as a PV guest.

To build this just clone my git repo with the build scripts and it’ll do all of the hard work for you:

git clone https://github.com/flihp/oe-build-scripts.git
git checkout ndvm
./build.sh | tee build.log

This will crank out an image suitable to run on an Intel SandyBridge (SNB) system. I’ve only tested PV guests so you’ll have to set up a config like the following:

kernel = "/usr/lib/xen-common/bzImage-sugarbay.bin"
extra = "root=/dev/xvda console=hvc0"
iommu = "soft"
memory = "512"
name = "ndvm"
uuid = "a9ae8853-f1e9-41ca-9904-0e906efeb4af"
vcpus = "1"
 
disk = ['phy:/dev/loop0,xvda,w']
pci = ['0000:04:00.0']

Notice the kernel image and the rootfs image must be copied over to the Xen dom0 that you want to test the NDVM on. The image is listed in the kernel line and this can be found at tmp-eglibc/deploy/images/sugarbay/bzImage-sugarbay.bin relative to your build root. The image will be in the same directory and called something like integral-image-ndvm-sugarbay.ext3. Notice that the disk config is pointing at a loopback. You’ll have to set this up with losetup just like any other loopback device. The part that differentiates this from any other PV guest is that we’re passing a PCI network device through to it and it’ll offer up a bridge to other guest VMs. The definitive documentation on how to do this with Xen is here: http://wiki.xen.org/wiki/Xen_PCI_Passthrough

The bit that I had to wrangle to get the bridge set up properly with OE was the integration between a network interfaces file and the bridge. I’ve been spoiled by Debian and it’s seamless integration between the two. OE has no such niceties. In this situation I had to chose between hacking a script manually or finding the scripts that integrate interfaces configuration with the bridge and baking that into the bridge-utils package from meta-oe. I figured getting bridges integrated with interfaces would be useful to others so I went through the Debian source package, extracted the scripts and baked them into OE directly. Likely this should go ‘upstream’ but for now this specialization is just sitting in my meta-integral layer.

So after fixing up the bridge-utils package so it plays nice with the interfaces file, the interfaces in our NDVM looks like so:

# /etc/network/interfaces -- configuration file for ifup(8), ifdown(8)
 
# The loopback interface
auto lo
iface lo inet loopback
 
# real interface
auto eth0
iface eth0 inet manual
 
# xen bridge
auto xenbr0
iface xenbr0 inet manual
        bridge_ports eth0
        bridge_stp off
        bridge_waitport 0
        bridge_fw 0

So that’s it. Boot up this NDVM and it’ll have a physical network device and a bridge ready for consumption by other guests. I’ve not yet gone through and tested adding additional guests to the bridge so I’m assuming there’s still a bit of work lurking there. I’ll give this last bit a go and hopefully have positive results to post sooner than later. I’ve also not tested this on XenClient-XT as the most recent stable release is getting a bit old and likely there’s going to be incompatibilities between netfront / back stuff. This approach however is likely a great starting point if you’re building a service VM you want to run on our next release of XT though so feel free to fork and experiment.

UPDATE: Gave my NDVM a test just by giving the dom0 that was hosting it a vif. You can do this like so:

# xl network-attach Domain-0 backend=ndvm

The above assumes your NDVM has been named ‘ndvm’ in it’s VM config naturally. Anyways this will pop up a vif in dom0 backed by the NDVM. Pretty slick IMHO. Now to wrap this whole thing up so dom0 and the NDVM can be built as a single image with OE … Sounds easy enough :)

Talk at Xen Developer Summit 2013

UPDATE: Here’s the link: http://www.youtube.com/watch?v=6Q8mlTBn-ZI. I still haven’t been able to bring myself to actually watch it but I’m sure it’s great :\

Just got back from the 2013 Xen Developer Summit where I gave a talk on a few interesting (to me at least) things. If you’re interested you can find my abstract here. My focus was naturally on SELinux / XSM stuff. Mostly my talk focused on the sVirt implementation in XenClient XT and another fun application of the architecture to our management stuff.

Had good chat with a guy from Amazon afterward about all of the other evil stuff someone could do if they compromised QEMU. So while sVirt prevents the specific scenario presented I’ve no doubt there are other hazards. He was specifically concerned over the Xen privcmd driver & the hypercalls it could make. Hard to disagree as QEMU with root permissions in dom0 can execute any hypercall it wants. The only way to address this (other than stubdoms) is to deprivilege QEMU to prevent it from making hypercalls. That would probably require some code-changes in QEMU so it’s no small task.

I also touched briefly on the design for an inter-VM communication (IVC) mechanism that was floated to xen-devel this summer. In XT we have an IVC called ‘V4V’ that isn’t acceptable to upstream. When it came to our XSM policy however V4V had some favorable properties in that we created a new object in the hypervisor that was a ‘first-class’ object in the policy.

The proposal uses the same model as the front/back drivers so there would be no new object specific to the IVC. This means there wouldn’t be way to differentiate the IVC from any other front/back driver. The purpose of the talk was to point this out and hopefully solicit some discussion. Got an even better conversation going on this point so hopefully I’ll have some fun stuff to report on this front soon.

Calculating the MLE hash

My work to calculate PCR[18] from the last post was missing one big piece. I took a short cut and parsed the MLE hash out of the SINIT to MLE data table. This was a stop gap.
The MLE wasn’t being measured directly. We were still extracting the measurement as taken by the SINIT which is a binary blob from Intel. We don’t have a choice in trusting this blob from Intel but we can verify the measurements it takes. With this in mind I’ve gone back and added a tool to the pcr-calc module to calculate the MLE hash directly from the MLE.

The MLE Hash

Calculating the MLE hash is a bit more complicated than just hashing the ELF binary that contains it. There’s already a utility that does this in the tboot project though it’s pretty limited as it only dumps out the hash in a hex string. My end goal is to integrate this work into a bitbake class so having a python class to emit a hash object containing the measurement of the MLE is a lot more convenient.

In the pcr-calc project I’ve added a few things to make this happen. First is a class called mleHeader that parses the MLE header. This is just more of the mundane data parsing that I’ve been doing since this whole thing started. Finding the MLE header is just a matter of searching for the magic MLE UUID: 5aac8290-6f47-a774-0f5c-55a2cb51b642. Having the header isn’t enough though. The MLE must be extracted from the ELF and this is particularly hard because I know nothing about the structure of ELF files.

To do the extracting I basically ported the mlehash utility from tboot to python. The MLE is actually stored in the ELF file program header. This requires parsing and extracting the PT_LOAD segments. Writing a generic ELF parser is way beyond the scope of what I’m qualified to do but thankfully Eli Bendersky already has a handle on this. Check out pyelftools on his github page. You can download the package for pyelftools through the python package system like so:

$ pip install pyelftools

I’ve not yet integrated a check for this package into the pcr-calc autotools stuff yet but I'll get around to it.

So in pcr-calc, the MLEUtil class does a few things. First it unzips the ELF file if necessary. Second, the ELFFile class from pyelftools is used to extract the PT_LOAD segments from the ELF. These are copied to a temporary file and the excess space is zero-filled. Once the ELF is extracted we locate the MLE header by searching for the UUID above. This header is represented and parsed by the mleHeader object.

The end goal is to calculate the SHA1 hash of the MLE. The fields in the header we need to do this are mle_start_off and mle_end_off. These are the offset to the start and end of the MLE respectively. Both offsets are relative to the beginning of the extracted ELF. The hash is then simply calculated over the data in this range.

Housekeeping

With the objects necessary to calculate the MLE hash done I went back and updated the pcr18 utility. Now instead of parsing the hash out of the TXT heap it now hashes the MLE directly. The mlehash program is constructed in a similar way but it is limited to calculating the MLE hash only.

Conclusion

A significant amount of the work in calculating the MLE hash was just code reading, firstly to understand how to extract and measure the MLE, second to understand how use the pyelftools package. Using pyelftools means that pcr-calc has a new dependency but it's a lot better than implementing it myself. Working with pyelftools has been beneficial not only in that it saves me effort but it's also an excellent example to work from. pcr-calc is my first attempt at implementing anything in python and it shows. Having poked around in pyelftools a little bit I've realized that even though my code "works" it's pretty horrible. Future efforts to "clean up" pcr-calc will model significant portions of it after the code in pyelftools.

Having completed calculating the MLE hash we've taken a big step forward in our effort to construct future PCR values by measuring the individual components. It's the last step in removing dependence on the extracted heap. We can now calculate PCR[18] and PCR[19] without any knowledge of or access to the deployed platform hardware and that's pretty great. PCR[17] by contrast contains a whole bunch of stuff like the STM hash that's independent from the Linux OS being run. For now I'm happy to assume PCR[17] is static for a system and doesn't need to be calculated in the build system.

Eventually I'd like to extend pcr-calc to include mechanisms for ingesting an LCP and calculating PCR[17] but that's a long way off. Instead, my next steps will be to clean up the pcr-calc code and integrating it into the meta-measured OE layer. The end goal here is to produce a manifest that a 3rd party (an installer or a remote system) can use to either seal secrets to a future platform state or for appraising an attestation exchange. More on this front next.