Attaching a TPM on the LPC

Just for the funs I recently revived some older work with my PCEngines alix3d2 where I built an OE meta layer with a simple machine and kernel config to build images: meta-alix.

IMG_20150213_151432

IMG_20150216_183119

IMG_20150216_183320

TPMs for all the boards!

I’ve got a soft spot for the older PCEngines WRAP board since it was the first platform I experimented on while building a home router / access point years ago. So meta-alix was fun work but nothing too crazy. While playing around with this I noticed that the alix3d2 has an exposed 20 pin header labeled ‘LPC’. Now that is interesting because I’ve got a few Asus branded Infineon TPMs laying about and they’ve got LPC connectors on them. A home wireless router with a TPM on it? Now that could be interesting.

Attaching an TPM designed to attach to a board on a 20 pin LPC connector should be pretty easy right? That’s what I thought too. But here we are 2 weeks later and I’m just now getting to write this up and I can’t say this work was 100% successful. But before I go too deep into the trials and tribulations let’s start with a bill of materials.

Bill of materials

To start out you’ll need a TPM and one designed to attach to your system on the LPC bus. TPMs are a PITA to buy really. There are 3 main companies that manufacture them but you can’t buy them direct. Thankfully there are some motherboard manufacturers out there that support the TPM via a “daughter-card” and from my experience this is mostly the high end manufacturers like Asus and Supermicro. I had 2 Asus TPMs laying around so this seemed like a good opportunity to put them to use. On Amazon these TPMs go for about $15 but when I bought mine almost a year ago they were less than half that.

The system that started out trying to attach this thing to is an alix3d2. I also picked up one of the newer PCEngines APU but *spoiler alert* only after I had serious problems getting the alix to work.

You’ll also need a soldering iron and the usual soldering / prototyping gear on hand (lights, wire, solder, magnifying glass etc). That’s right I said soldering. It’s been a while for me too. Like 10 years. Don’t worry there isn’t much too this and it was really fun.

Prototyping

As you’ve likely guessed by now, just because a system has an LPC connector doesn’t mean this thing is plug and play. The Asus TPM daughter card has pin 4 blocked / keyed and the ALIX doesn’t so that’s our first hint. The real data is in the respective pin diagrams. Finding these isn’t as easy as I’d hoped so I had to do some digging.

The docs for the ALIX systems are all on the PCEngines website so that part’s easy. The Asus TPM doesn’t seem to have any docs though. If you take the time to dig into the boards that support them though you’ll find the manuals for these boards have the pin assignment documented. I pulled down the manual for the P9D-WS and used this as a reference. Page 2-29 has what we’re looking for.

Pin Layouts

With the pin layouts in hand we can see clearly that plugging the TPM daughter card directly into the board isn’t gonna happen. I’ll reproduce the layouts here so we can view them side by side:

Asus TPM PCEngines LPC
pin signal signal
1 PCICLK PCICLK
2 GND GND
3 FRAME LAD0
4 BLOCKED GND
5 PCIRST# LAD1
6 NC GND
7 LAD3 LAD2
8 LAD2 GND
9 +3V LAD3
10 LAD1 GND
11 LAD0 LFRAME#
12 GND GND
13 NC PCIRST#
14 NC CLK48A
15 +3VSB ISP
16 SERIRQ Vcc (+5V)
17 GND GND
18 CLKRUN V3
19 PWRDWN SERIRQ
20 NC LDRQ#

There’s basically no overlap in the pin layouts here except for a few ground connections. This blew my mind at first but after searching through the Intel Low Pin Count Interface Specification it turns out that this bus was intended for use on-board only and so there’s no pin layout specified for external connectors. First mystery solved. Now let’s figure out how we’re gonna wire this thing up.

To the breadboard!

This isn’t going to be as easy as “plug and play” but it’s not far off. We just need to connect the right pins. With the pin map above and a little help from the spec (to get the minimum required connections) we can pull out our breadboard and prototype this thing.

If you’re like me you’ll have to go out and buy materials as you need them. Luckily I live minutes away from HSC Electronic Supply which is an amazing surplus electronic shop. After an hour or 3 poking around the piles of old electronic gear I managed to scrounge up a 20 pin ribbon cable with a connector that looked like it might fit on my breadboard. With a 20 pin DIP ribbon cable connector I had what I needed to connect the alix to the breadboard.

Next was to get the TPM daughter card wired up to the breadboard. This was harder than I expected. I couldn’t easily find a connector that would suit this purpose that didn’t require waiting for shipping. So I soldered some wires up to breakaway headers and rigged up a horrible TPM-to-breadboard connector. Then we just hook up the two using the following mapping:

TPM ALIX Signal
1 1 PCICLK / LCLK: 33MHz clock
3 11 LFRAME#: Transaction signal
5 13 LRESET#: Bus reset. AKA PCIRST#
7 9 LAD3: Data lane
8 7 LAD2: Data lane
9 & 15 18 3 Volts DC
10 5 LAD1: Data lane
11 3 LAD0: Data lane
16 19 SERIRQ: Serialized interrupt signal

After some fiddling (kicking, screaming and burning myself with a soldering iron) this is what it looked like:

IMG_20150213_093921

IMG_20150212_175548
Now it SHOULD have worked. These are the right connections. But on the alix3d2 I got no love. I didn’t actually get this set-up to work till my apu1d showed up in the mail 3 days later. For whatever reason the external LPC on the alix3d2 just doesn’t work as advertised. Without an oscilloscope I can’t debug much beyond whether the voltage and ground pins are OK (and they are) so for now that will remain a mystery. So the alix3d2 is out and the apu1d is in.

Anyways we can do better than this bootleg breadboard setup. Let’s see about cleaning it up.

IMG_20150213_093453

IMG_20150213_093638

Clean it up

The wiring above was super flaky and that shouldn’t be a surprise. I didn’t get the length of each wire exact and the pins slipped around a bit in the plastic. I ordered some IDC breakout helpers from Adafruit but they were garbage. They plug into the breadboard fine but the pins aren’t long enough and they just pop back out immediately.

So again I hacked up another connector out of DIP/DIL headers and some breakaway headers spaced to span the gap in the breadboard. This is generally a bad idea since the solder is what’s holding the whole thing together but it worked out pretty OK:

IMG_20150213_151417

IMG_20150216_183250

IMG_20150214_134235

Packaging for the APU enclosure

After convincing ourselves that the wiring above is right on the breadboard we need to clean this up so that it fits in the enclosure with the APU. There’s not a lot of space in the PCEngines recommended case1d2 but there’s enough if we’re sufficiently inventive. And by “inventive” I mean “you’ve got a hacksaw”.

Start out by removing the female header from the TPM and trim back the connector pins. If we flip this header on it’s side we can use it to mount the TPM once we reconnect it. This would require either unblocking pin 4 on the connector or cutting pin 4 off of the APU board. Since pin 4 on the APU is ground anyways this shouldn’t be a problem.

I used a 20 pin DIP to ribbon cable connector for my setup. I sanded down the daughter board to expose the copper on the base which happens to be ground and connected this with the even pins on the connector up through 12. This proved to be a pretty solid base as it holds the daughter board nice and tight to the connector.

Then we just cut wires and solder pins per the table above. The wire I had on hand was 28 gauge which was a bit too big and the soldering job is straight up ugly in spots but it’s the first bit of soldering I’ve done in 10 years so that’s good enough for me. I’ve got another TPM on hand so I’ll have another go now that I’ve had some practice.

Testing

I used both a Debian install with the tpm-tools package to test this as well as the core-image-tpm from meta-measured. I’d recommend sticking with Debian unless you feel like falling down the rabbit hole of an OE build. The important thing to keep in mind is that the APU BIOS doesn’t support the TPM so it won’t do the necessary setup for us.

The BIOS is supposed to do a number of things to set things up so that the OS can use the TPM. This includes running the TPM self test, enabling it and setting up ACPI entries to make it easy for the OS to talk to it. With the stock BIOS on the APU we won’t get any of this. Thankfully the number of platforms that have implemented TPM support wrong in the BIOS over the years is quite high so the Linux TPM TIS driver can do all of this for us if we give it the right parameters:

root@apu:~# modprobe tpm_tis force=1
[   74.027383] tpm_tis tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
[   74.063388] tpm_tis tpm_tis: Issuing TPM_STARTUP
[   74.260392] tpm_tis tpm_tis: TPM is disabled/deactivated (0x7)
[   74.308465] genirq: Flags mismatch irq 4. 00000080 (tpm0) vs. 00000000 (serial)
[   74.315956] tpm_tis tpm_tis: Unable to request irq: 4 for probe
[   74.436459] genirq: Flags mismatch irq 8. 00000080 (tpm0) vs. 00000000 (rtc0)
[   74.443753] tpm_tis tpm_tis: Unable to request irq: 8 for probe

The modinfo command will tell you all of the gory details about what these parameters do if you’re interested. The short version is that force=1 causes the driver to ignore ACPI and probe for the TPM device. You can also add the interrupts=0 argument to disables interrupts which will get rid of all of the genirq errors. After this you should see /dev/tpm0 appear magically. You can then start tcsd and get some version info out of the TPM:

root@apu:~# tpm_version 
  TPM 1.2 Version Info:
  Chip Version:        1.2.3.19
  Spec Level:          2
  Errata Revision:     2
  TPM Vendor ID:       IFX
  Vendor Specific data: 0313000b 00
  TPM Version:         01010000
  Manufacturer Info:   49465800

You won’t be able to do much more than this though since the BIOS hasn’t enabled the TPM for us. We’ll get past this in my next post.

TXT and tboot on the IVB NUC

I wrote a few days back about getting serial output from tboot on my new-ish Ivy Bridge vPro NUC. This was a means to and end and so this is where we’ll cover actually using this serial hardware to do something meaningful.

tboot log from the DC53427HYE NUC

Testing TXT and tboot on a new system is often painful. If there’s something wrong and tboot can’t execute SENTER successfully the system just reboots and will continually. This confuses the hell out of most and necessitates either serial hardware to capture log output for trouble shooting, or a patch to bypass this reboot logic.

The bit about the patch is interesting and I’ve hacked one out here. I hacked this together without much thought about the security implications as a work-around so I don’t recommend it for production use. It was only intended as a way to bring up a system with a borked TXT implementation so data could be collected with txt-stat.

Regardless of how you go about getting your tboot output it’s your first step in debugging. Here’s the tboot log captured from my DC53427HYE NUC. You can see the automated reboot after SENTER fails.

Debugging my tboot failure

I’ve jacked up the logging level of tboot so there’s a lot of data go dig through in the log. Generally though the data we need is in the TXT.ERRORCODE. Don’t forget though that on the first boot this value will be 0x0 since it’s only set once the failure occurs. The interested reader can, well, read all about this register in the MLE developers guide section B.1.3.

So after the failed boot the TXT.ERRORCODE gets set and we can grab it from the log. The relevant line is:

TBOOT: TXT.ERRORCODE: 0xc0021041

Not a particularly helpful error message but then again, there isn’t much space for a helpful textual description of the error in a 32bit register. So the next step is to decode this thing.

Decoding the TXT.ERRORCODE

The MLE developers guide describes the general structure of the data in this register but the error code itself is specific to the ACM used by the platform. Again this data is in the tboot log file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
TBOOT: checking if module /acm_hsw.bin is an SINIT for this platform...
TBOOT: chipset production fused: 1
TBOOT: chipset ids: vendor: 0x8086, device: 0xb001, revision: 0x1
TBOOT: processor family/model/stepping: 0x306a9
TBOOT: platform id: 0x10000000000000
TBOOT:   1 ACM chipset id entries:
TBOOT:       vendor: 0x8086, device: 0xb002, flags: 0x1, revision: 0x1, extended: 0x0
TBOOT:   chipset id mismatch
TBOOT: checking if module /acm_ivb.bin is an SINIT for this platform...
TBOOT:   1 ACM chipset id entries:
TBOOT:       vendor: 0x8086, device: 0xb001, flags: 0x1, revision: 0x1, extended: 0x0
TBOOT:   4 ACM processor id entries:
TBOOT:       fms: 0x206a0, fms_mask: 0xfff3ff0, platform_id: 0x10000000000000, platform_mask: 0x1c000000000000
TBOOT:       fms: 0x206a0, fms_mask: 0xfff3ff0, platform_id: 0x4000000000000, platform_mask: 0x1c000000000000
TBOOT:       fms: 0x306a0, fms_mask: 0xfff3ff0, platform_id: 0x10000000000000, platform_mask: 0x1c000000000000
TBOOT: SINIT matches platform

You can see on line 1 that tboot is trying the HSW / Haswell ACM which doesn’t match the platform. Then on line 9 it gives the IVB / Ivy Bridge a try and that one matches the platform. So our error code is specific to the IVB ACM so we’ll have to dig through the docs from that tarball. If you’ve built meta-measured the appropriate PDF will be located at:

${TOPDIR}/tmp-glibc/work/corei7-64-oe-linux/3rd-gen-i5-i7-sinit/67-r0/3rd_gen_i5_i7-SINIT_67/SINIT_Errors.pdf.

But before we can make use of this we’ve gotta parse out the error code into its component parts. From this doc they’re defined as (from MSB to LSB):

  • bit 31 – Valid
  • bit 30 – External
  • bits 29:25 – Reserved
  • bits 24:16 – Minor Error Code
  • bit 15 – SW Source
  • bits 14:10 – Major Error Code
  • bits 9:4 – Class Code
  • bits 3:0 – Module Type

So we need to divide the error code 0xc0021041 on these boundaries and then go back to the docs to figure out what each field means:

Valid: 0x1 - The error code is valid.
External: 0x1 - Error state induced by external software.
Reserved: 0x0 - No significance. 
Minor Error Code: 0x2 - Fatal and TPM specific.
SW Source: 0x0 - Generated by the ACM.
Major Error Code: 0x4 - TPM NV is unlocked.
Class Code: 0x4 - TPM Access
Module Type: 0x1 - SINIT Module

Going through each of these takes a while so we’ll focus on the important stuff: The Major error code. Actually the error text says it all: The TPM NV RAM is unlocked and it shouldn’t be. With the TPM in this state tboot also complains in the boot log, see line 3 below from the log linked above:

1
2
3
4
TBOOT: TPM: TPM Family 0x0
TBOOT: TPM is ready
TBOOT: TPM nv_locked: FALSE
TBOOT: TPM timeout values: A: 750, B: 750, C: 750, D: 750

So there’s the problem. Now what’s the solution?

Unlocked TPM NVRAM

The TPM NVRAM is described in the relevant TCG TPM 1.2 spec, section 19: “NV Storage Structures”. The parts relevant to us is 19.1.1 where the required TPM_NV_INDEX values are described. These are the NV indexes that “must be found on each TPM regardless of platform”.

The first index listed in this section is TPM_NV_INDEX_LOCK and given the error code we’re getting that looks relevant. A bit of reading and you’ll see why having this index defined on a TPM is so important and why shipping a platform with it undefined is considered a security issue. Turns out that without this index defined the TPM doesn’t enforce authorization protections. In this state an attacker can write to the NVRAM repeatedly, wear it out (since NVRAM can be written to a finite number of times) and effectively DOS the TPM completely by making the NVRAM unusable. Very surprising that Intel is shipping the NUC in this state. Anyways, nothing we can’t fix …

At this point I went back in to the NUC and used tpm_nvindex to dump the NVRAM indexes defined on my platform:

NVRAM index   : 0x10000001 (268435457)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : ALL
Permissions   : 0x00001002 (WRITEALL|OWNERWRITE)
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 20 (0x14)
 
NVRAM index   : 0x1000f000 (268496896)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : ALL
Permissions   : 0x00020002 (OWNERREAD|OWNERWRITE)
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 1129 (0x469)
 
NVRAM index   : 0x50010000 (1342242816)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : ALL
Permissions   : 0x00000001 (PPWRITE)
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 10 (0xa)
 
NVRAM index   : 0x50000003 (1342177283)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : 0x18
Permissions   : 0x00000000 ()
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 64 (0x40)
 
NVRAM index   : 0x50000001 (1342177281)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : ALL
Permissions   : 0x00002000 (WRITEDEFINE)
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 54 (0x36)

I was hoping that the TPM_NV_INDEX_LOCK (defined in index 0xffffffff) would be missing and that defining it would solve my problem. From the output above you can see that it’s not listed. The only relevant data I could find on the web about defining this index was a post on the tboot devel list with somone trying to use the tpmj utility. Digging into all of that java seemed like way too much work so I gave the tpm_nvdefine utility a go:

root@intel-core-i7-64:~# tpm_nvdefine --index=0xffffffff --size=0
Successfully created NVRAM area at index 0xffffffff (4294967295).

Success! I had hoped then that executing tpm_nvindex again would show this new index … but it doesn’t. So other than the “success” message from tpm_nvdefine we have no way of knowing whether or not the new index was actually defined. The easiest way to test this is to try booting with tboot again and hope that the error goes away. And it does:

TBOOT: TPM: TPM Family 0x0
TBOOT: TPM is ready
TBOOT: TPM nv_locked: TRUE
TBOOT: TPM timeout values: A: 750, B: 750, C: 750, D: 750

Conclusion

So that’s how you define the TPM_NV_INDEX_LOCK TPM index on your IVB NUC. This effectively locks the TPM NVRAM on a platform that ships with the the TPM NVRAM unlocked. Until now I had only seen this on Lenovo systems (lots of them) but I guess Intel is shipping platforms like this too. Having some automated way to detect and fix platforms in this state would be really nice …

Getting serial output on my Ivy Bridge NUC

I’d been using a rather old Sandy bridge system (Intel DQ67EP + i7 2600S) to test my work on meta-measured for a long time. Very nice, very stable system. But with Intel getting out of the motherboard business I started eyeing their new venture: the NUC.

The DC53427HYE vPro IVB NUC

Everything is getting smaller and thankfully Intel has finally caught on. Better yet they’re supporting TXT on some of these systems and so when the Haswell NUC was released over the summer the price on thevPro Ivy Bridge NUC (DC53427HYE) finally dropped enough to put it in my price range. Intel opted to skip the vPro NUC for Haswell anyways so it was my only option.

Let the fun of testing TXT on a new system begin! Like any new system we hope it works “out of the box”. But with TXT, odds are it won’t. My SNB system was great but this NUC … not so much, yet. The kicker though is that as systems get smaller something’s got to give. Space ain’t free and … well who needs a serial port anyways right?

NUC IVB guts

Where’s my serial?

So without serial hardware, debugging TXT / tboot is pretty much a lost cause. Sure you can slow down the VGA output with the vga_delay command line option. But if you want to actually analyze the output you need to be able to capture the text somehow and setting vga_delay to a large value and then copying the output by hand doesn’t scale (and it’s a stupid idea to boot). So the search for serial output continues.

To get TXT we must … ::cough:: … endure the presence of the Management Engine (ME) and it’s supposed to have a serial console built in. The docs for the system even say you can get BIOS output from the ME serial console. But for whatever reason, I spent an afternoon messing about with it and made no progress.

I’ve no way to know where the problem with this lies. There are tools for accessing the ME serial console for Linux but I couldn’t get early boot output. Setting up a serial console login for a bare metal Linux system worked but no early boot stuff (BIOS, grub or tboot). Judging by the AMT docs for Linux: you can’t count on the ME serial interface for much. The docs state that if you use Xen then the ME will get the DHCP address all messed up and that setting a static address in the ME interface just doesn’t work. So long story short, the ME serial interface is limited at best and these limitations preclude getting early boot messages like those from tboot.

Now that the ME bashing is done we must fall back on real serial hardware. Thankfully this thing has both a half height and a full height mini-PCIe slot and a market for these arcane serial things still exists. StarTech fills this need with the 2s1p mini PCIe card. This is a great little piece of hardware but the I/O ports aren’t the default (likely to prevent conflict with on-board serial hardware) so we’ve gotta do some work before tboot will use it for ouput messages.

StarTech mini-PCIe serial card

NUC IVB with serial card

We have serial! Now what?

With some real serial hardware we’re half way there. Now we need to get tboot to talk to it. Unfortunately just adding serial to the logging= parameter in the boot config isn’t sufficient. The default base address for the serial I/O port used by tboot is 0x3F8 (see the README). This address corresponds to the “default serial port” aka COM1. So our shiny new mini-PCIe serial hardware must be using a different port.

tboot will log to an alternative port but we need to find the right I/O port address for the add on card. If you’re like me you keep a bootable Linux image on a USB drive handy for times like these. So we boot up the NUC and break out lspci to dump some data about our new serial card:

02:00.0 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
02:00.1 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])

Not a bad start. This card has two serial ports and it shows up as two distinct serial devices. To get the I/O port base address we need to throw some -vvv at lspci. I’ll trim off the irrelevent bits:

02:00.0 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
        Subsystem: Device a000:1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at e030 [size=8]
        Region 1: Memory at f7d05000 (32-bit, non-prefetchable) [size=4K]
        Region 5: Memory at f7d04000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
.
.
.
02:00.1 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
        Subsystem: Device a000:1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 18
        Region 0: I/O ports at e020 [size=8]
        Region 1: Memory at f7d03000 (32-bit, non-prefetchable) [size=4K]
        Region 5: Memory at f7d02000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
.
.
.

The lines we care about here are:

Region 0: I/O ports at e030 [size=8]
Region 0: I/O ports at e020 [size=8]

So the I/O port address for 02:00.0 is 0xe030 and 02:00.1 is 0xe020. The 9 pin headers on the board are labeled S1 and S2 so you can probably guess which is which. With the NUC booted off my Linux USB key we can dump more data bout the hardware so we know for sure but with a serial cable hooked up to S1 I just threw some text at the device to see if something would come out the other end:

echo "test" > /dev/ttyS0

Sure enough I got "test" out. So I know my cable is hooked up to ttyS0. Now to associate /dev/ttyS0 with one of the PCI devices so we can get the I/O port. Poking around in sysfs is the thing to do here:

ls /sys/bus/pci/devices/02:00.0/tty/
ttyS0

With all of this we know we want tboot to log data to I/O port 0xe030 so we need the following options on the command line: logging=serial serial=115200,8n1,0xe030.

Next time

Now that I’ve got some real serial hardware and a way to get tboot to dump data out to it I can finally debug TXT / tboot. We’ll save that for next time.

twobit auto builder part 2

In my last post on automating my project builds with buildbot I covered the relevant buildbot configs. The issue that I left unresolved was the triggering of the builds. Buildbot has tons of infrastructure to do this and their simple would be more than sufficient for handling my OE projects that typically have 3-5 git repos. But with OpenXT there are enough git repos to make polling with buildbot impractical. This post covers this last bit necessary to get builds triggering with a client side hook model.

twobit-git-poller

Now that we’ve worked out the steps necessary to build the software we need to work out how to triggers builds when the source code is updated. Buildbot has extensive infrastructure for triggering builds ranging from classes to poll various SCM systems in a “pull” approach (pulled by buildbot) to scripts that plug into existing SCM repos to run as hook scripts in a “push” approach (pushed by the SCM).

Both approaches have their limitations. The easiest way to trigger events is to use the built in buildbot polling classes. I did use this method for a while with my OE projects but with OpenXT it’s a lost cause. This is because the xenclient-oe.git repo clones the other OpenXT repos at the head of their respective master branches. This means that unlike other OE layers the software being built may change without a subsequent change being made to the meta layer. To use the built in buildbot pollers you’d have to configure one for each OXT git repo and there’s about 60 of them. That’s a lot of git pollers and would require a really long and redundant master config. The alternative here is to set up a push mechanism like the one provided by the buildbot in the form of the git_buildbot.py script.

This sounds easy enough but given the distributed nature of OpenXT it’s a bit harder than I expected but not in a technical sense. Invoking a hook script from a git server is easy enough: just drop the script into the git repo generally as a ‘post-commit’ hook and every time someone pushes code to it the script will fire. If you control the git server where everyone pushes code this is easy enough. Github provides a similar mechanism but that would still makes for a tight coupling of the build master to the OpenXT Github org and here in lies the problem.

Tightly coupling the build master to the OpenXT Github infrastructure is the last thing I want to do. If there’s just one builder for a project this approach would work but we’d have to come up with a way to accommodate others wanting to trigger their build master as well. This could produce a system where there’s an “official” builder and then everyone else would be left hanging. Building something that leaves us in this situation knowingly would be a very bad thing as it would solve a technical problem but produce a “people problem” and those are exponentially harder. The ideal solution here is one that provides equal access / tooling such that anyone can stand up a build master with equal access.

Client-side hooks

The only alternative I could come up with here is to mirror every OpenXT repo (all 60 of them) on my build system and then invent a client side hook / build triggering mechanism. I had to “invent” the client side hooks because Git has almost no triggers for events that happen on the client side (including mirrors). My solution for this is in the twobit-git-poller.

It’s really nothing special. Just a simple daemon that takes a config specifying a pile of git repos to mirror. I took a cue from Dickon and his work here by using a small bit of the Github API to walk through all of the OpenXT repos so that my config file doesn’t become huge.

The thing that makes this unique is some magic I do to fake a hook mechanism as much like post-receive as possible. This allows us to use a script like the git_buildbot.py unmodified. So I expanded the config file to specify an arbitrary script so others can expand on this idea but this script is expected to take input identical to that produced by the post-receive and expected by the git_buildbot.py script.

This is implemented with a naive algorithm: every time the git poller daemon is run we just iterate over the available branches, grab the HEAD of each before and after the fetch and then dump this data into the hook script. I don’t do anything to detect whether or not there was a change even, I just dump data into the hook script. This allows us to use the git_buildbot.py script unmodified and buildbot is smart enough to ignore a hook even where the start and end hashes are the same. There are some details around passing authentication parameters and connection strings but those are just details and they’re in the code for the interested reader.

With this client side hooking mechanism we no longer need the poller objects in buildbot. We can just hook up a PBChangeSource that listens for data from the twobit-git-poller or any other change source. AFAIK this mechanism can live side by side with standard git hooks so if you’ve got an existing buildbot that’s triggered by some git repos that your team is pushing to using this poller shouldn’t interfere. If it does let me know about it so we can sort it out … or you could send me a pull request :)

Wrap Up

When I started this work I expected the git poller to be something that others may pick up and use and that maybe it would be adopted as part of the OpenXT build stuff. Now that I think about it though I expect the whole twobit-git-poller to be mostly disposable in the long run. We’ve already made huge progress in aligning the project with upstream Open Embedded and I expect that we’ll have a proper OE meta layer sooner than later. If this actually comes to pass there won’t be enough git repos to warrant such a heavy weight approach. The simple polling mechanisms in in buildbot should be sufficient eventually.

Still I plan to maintain this work for however long it’s useful. I’ll also be maintaining the renevant buildbot config which I hope to generalize a bit. It may even become as general as the Yocto autobuilder but time will tell.

twobit auto builder part 1

I’ve been a bit out of commission for the past month: starting a new job and moved a long way to be close to it. Before the move I had a bit of a backlog of work to write about and things have finally settled down enough for me to start working that off. This first post is about the work I’ve done to learn a bit about buildbot and how to use it to build some stuff I’ve been working on.

I’ve always been a bit of a “build junkie” and I’m not ashamed to admit that I really like to write Makefiles by hand. Learning buildbot was only partially motivated by masochism though. The other half was motivated by my need to build a few different projects including the meta-measured OE layer, the meta-selinux OE layer and OpenXT.

I’ve put this work up on github and it still needs some work before it’s portable to anything beyond my build setup. I hope to make this reproducible by others but before I do that I want to:

  1. introduce these repos
  2. cover some of the buildbot internals but only the bits required to make the build work

In doing this work I definitely fell down the rabbit hole a bit and started reading a pile of buildbot code but a lot of it was above and beyond what’s necessary to get this stuff building. I’m often guilty of wanting to understand everything about a piece of code before I start using it but that’s not always the right approach. I’ll do you all a solid and leave this stuff out to save you the hassle. This first post will cover the repo containing the buildbot config I’m using. My next post will cover some support code I’m using to trigger builds. I’ll focus on the bits necessary to build OpenXT initially and will include a quick write-up of my other side projects soon after.

twobit-buildbot

There are two repos that make up this work. The first is the actual buildbot config for both the build master and slave(s). It’s up on github here: twobit-buildbot. The second repo is a twisted python daemon that supports the build master and slave. I’m calling this last repo twobit-git-poller. This post will discuss this prior. My next post will cover the git poller.

Build Master

The buildbot config for both the master and the slave(s) I’m using are available on Github here: twobit-buildbot Buildbot configs are a bit hard to call “configs” with any seriousness since they’re proper python code. Still, it’s “config-like” in that there’s really no business logic here. It’s just a pile of classes that you filled with data and passed to the buildbot guts for execution.

If I go line by line through the config then this will quickly digress into a buildbot tutorial. There are plenty of those out there and if you’re interested in getting a deeper understanding then you should start with the buildbot docs. The steps that buildbot goes through to build a project are all wrapped up in a container called a BuildFactory. In my repo the BuildFactory for OpenXT is defined here. This class wraps an array of BuildStep objects that define a series of steps required to build the software.

The BuildStep objects are processed in the order they’re passed to the BuildFactory. I’ll run through these in order just to give you a feeling for the build progression:

  1. The first thing in building OXT is to clone the build scripts. Buildbot knows where on the build slave to check this stuff out so all we do is provide some metadata to tell the buildbot git fetcher how we want the checkout to happen. OpenXT doesn’t yet support incremental builds so for now each new build requires a clean checkout.
  2. Next we execute a shell command on the slave to set up the build config. To do this we use the ShellCommand class which is exactly what you’d expect: it executes shell commands. But with buildbot it’s always a question of where: on the master or on the slave? AFAIK for most build bot stuff you should assume that the action will happen on the slave unless otherwise noted. I’ve rigged the build master configs such that the script name isn’t used explicitly. Instead it’s a shell variable that the slave is expected to set here. I’ll describe this a bit more in the next section describing the build slave.
  3. The next build steps are a series of shell commands that execute the series of build steps that make up the build of the OXT build. These are hard coded as the minimum necessary to create an installer iso. These are all invocations of the do_build.sh script. We could just as well just execute the do_build.sh script and have it get the build steps from the .config file but it’s nice to execute them one by one so that the buildbot UI shows them as unique steps. This also makes it easier to go through the output logs when things go wrong.
  4. Once the main build steps are done we need to get the build output to a place where people can download it. So the next step runs a MasterShellCommand to create a directory on the build master. In the case of my build master this is on an NFS mount that gets served up by my webserver. That should tell you how we serve up the build output once it’s on the build master.
  5. Now that we’ve got a place to stash the installer iso we need to move it from the build slave to the build master. OXT has some script functions in the build script to rsync build artifacts to various places. Buildbot has a build step to handle this for us so I’ve chosen to use that instead. Hopefully we’ll retire the bulk of the custom OXT scripts and use the tools available like the FileUpload class in buildbot.
  6. Unfortunately I ran into a nasty buildbot limitation after getting the build upload to work. The version of buildbot packaged for Debian Wheezy runs the buildbot daemon with very paranoid umask (022). This was resolved 3 years ago but Wheezy doesn’t have the fix. I opted to hack around this with an extra build step instead of requiring people to install buildbot from source or apply a patch to the Debian packages. This hack is ugly but it just runs a scrip on the build master and fixes up the permissions on the newly uploaded file and directory housing it.

The BuildFactory describes how to build something but we still need something to do the building. This is the job of the build slave. The BuildlerConfig is an object that associates a build slave with a BuildFactory. When the build master determines that a build needs to be run the buildbot code will go through the available BuilderConfig objects, find the right config, and then walk the build slave through the right series of BuildSteps.

Build Slave

Naturally we assume that the build slave executing these steps is capable of building the project which in this case is OpenXT. That is to say you’ve followed the steps on the OpenXT wiki to set up a build machine and that you’ve tested that it works. Further, I’ve only tested this with the buildbot package from Debian Wheezy. The build master can be a simple Wheezy system, no need to jump through the hoops to make it an OXT build system since it won’t be building any code.

The config for a build slave is vastly more simple than the master. It only needs to be able to connect to the master to receive commands. The only really interesting bit of the build slave config is in a few of the environment variables I set.

Each of the OE images build my slave is building requires a sort of generic “setup” step where the slave sets some values in a config file. This is simple stuff like the location of a shared OE download directory or something specific to OpenXT like the location of the release signing keys. In each case there’s a script in a bin directory. For OpenXT that’s here.

The path to these scripts doesn’t need to be known by the build master. Instead the build master expects the slave to have a specific environment variable pointing to this script. The build master then tells the slave to execute whatever this variable points to. The build slave only has to set this variable properly in its config like this.

Wrap Up

That’s just a quick rundown of the buildbot bits I’ve strung together to build my pet projects and OpenXT. There’s one glaring gap in this: how buildbot knows when it needs to run builds. This is usually a pretty simple topic but with OpenXT there were a few things I had to work around. I’ll fill this in next time.

Till then I’ll be keeping the configs for my build master and slave on github and as up to date as possible. Feel free to reproduce this work in your own environment if you’re interested. I may or may not keep the build master exposed publicly in the future. For now it’s just intended to be an example and a convenient tool.

OpenXT: Contingencies Abandoned

This post is long overdue. I’ve been experimenting with using OE as a means to building measured systems for a while now. Back before Openxt became a reality I was hedging bets and working on some overlapping tech in parallel. Now that OpenXT is available as OSS I’m going to abandon some of this work and shift focus to OpenXT. I do however feel like there should be some record of the work that was done and some explanation as to why I did it and how it relates to OpenXT.

Here goes …

Building systems with security properties

All of this nonsense began with some experimentation in using OE as a means to build measured systems. For some reason I think that a sound integrity measurement architecture is necessary for the construction of software systems with meaningful security properties. All of the necessary component parts were available as open source but there were few examples showing how they could be integrated into a functional whole. Those that did were generally research prototypes and weren’t maintained actively (need references). My work on meta-measured was the first step in my attempt to make the construction of an example measured system public and easily buildable.

Building a measured systems with the Xen hypervisor as a primary component was another part of this work. I wasn’t using virtualization for the sake of virtualization though. Xen was a means to an end: its architecture allows for system partitioning in part through the Isolated Driver Domain model like the example I describe here. The NDVM is just the “low hanging fruit” here but it serves as a good example of how OE can be used to build very small Linux VMs that can serve as example device domains. There are ways to build smaller IDDs but IMHO a Linux image < 100MB is probably the point of diminishing returns currently. Hopefully in the future this will no longer be the case and we'll have IDDs based on unikernels or even smaller things.

Small, single purpose systems are important in that they allow us to extend the integrity measurement architecture into more fine-grained system components. Ideally these IDDs can be restarted so that the integrity state of the system can be refreshed on a periodic basis. By disaggregating the Xen dom0 we increase the granularity of our measurements from 1 (dom0) to 1 + the number of disaggregated components. By restarting and remeasuring these components we provide better "freshness" properties for systems that remain on-line for long periods of time.

This of course is all built on the initial root of trust established by hardware and the software components in meta-measured. Disaggregation on the scale of previously published academic work is the end goal though with the function of dom0 reduced to domain construction.

The final piece of this work is to use the available mandatory access control mechanisms to restrict the interactions between disaggregated components. We get this by using the XSM and the reference policy from Xen. Further, there will always be cases where it’s either impossible or impractical to decompose some functions into separate VMs. In these cases the use of the SELinux MAC policy within Linux guests is necessary.

The Plan

So my plan went something like this: Construct OE layers for individual components. Provide reference images for independent test. One of these layers will be a “distro” where the other components can be integrated to become the final product. This ended up taking the form of the following meta layers:

  • meta-measured: boot time measurements using the D-RTM method and TPM utilities
  • meta-virtualization: recipes to build the Xen hypervisor and XSM policy
  • meta-selinux: recipes to build SELinux toolstack and MAC policy
  • meta-integral: distro layer to build platform and service VM images

Some of these meta layers provide a lot more functionality than the description given but I only list the bits that are relevant here.

Where I left off

I had made significant progress on this front but never really finished and didn’t write about the work as a whole. It’s been up on Github in a layer called ‘meta-integral‘ (i know, all the good names were taken) for a while now and the last time I built it (~5 months ago) it produced a Xen dom0 and an NDVM that boots and runs guests. The hardest work was still ahead: I hadn’t yet integrated SELinux into dom0 and the NDVM, XSM was buildable but again, not yet integrated and the bulk of disaggregating dom0 hadn’t even yet begun.

This work was a contingency though. When I began working on this there had been no progress made or even discussion of getting OpenXT released as OSS. This side project was an outlet for work that I believe needs to be done in the open so that the few of us who think this is important could some day collaborate in a meaningful way. Now that OpenXT is a real thing I believe that this collaboration should happen there.

Enter OpenXT: aka the Future

Now that OpenXT is a reality the need for a distro layer to tie all of this together has largely gone away. The need for ‘meta-integral’ is no more and I’ll probably pull it down off of Github in the near future. The components and OE meta layers that I’ve enumerated above are all still necessary though. As far as my designs are concerned OpenXT will take over only as the distro and this means eliminating a lot of duplication.

In a world where OpenXT hadn’t made it out as OSS I would have had the luxury of building the distro from scratch and keeping it very much in line with the upstream components. But that’s not how it happened (a good thing) so things are a bit different. The bulk of the work that needs to be done for the project to gain momentum now is disentangling these components so that they can be developed in parallel with limited dependencies.

Specifically we duplicate recipes that are upstream in meta-virtualization, meta-selinux and meta-measured. To be fair, OpenXT actually had a lot of these recipes first but there was never any focus on upstreaming them. Eventually someone else duplicated this work in the open source and now we must pay off this technical debt and bring ourselves in-line with the upstream that has formed despite us.

What’s next?

So my work on meta-integral is over before it really started. No tragedy there but I am a bit sad that it never really got off the ground. OpenXT is the future of this work however so goal number one is getting that off the ground.

More to come on that front soon …

Apache VirtualHost config gone after Wheezy to Jessie upgrade

Here’s a fun one that had me running in circles for a while today:

I’ve been running deluge and the deluge-webui on Debian Wheezy for a while now. Pretty solid. I needed to download a torrent using a magnet URI today and deluge-webui on Wheezy won’t do it. This feature was added to the webui in 1.3.4 though so the version in Jessie should work.

I did the typical dist-upgrade song and dance per the usual but after the upgrade Apache was all hosed up. It was just showing the default example page. All of access logs that would normally go to my configured virtual host were landing in /var/log/apache2/other_vhosts_access.log which is all wrong. I started out thinking it was the hostname of the system that got messed up but that was a dead end.

I started making progress when I found the command

apache2ctl -S

This dumps out a bunch of data about your configuration and it basically said that my VirtualHostM configuration was empty:

VirtualHost configuration:

Yeah it was basically an empty string. This seemed wrong but I wasn’t sure what to expect really. After banging around a bit longer and getting no where I finally decided to just disable and re-enable my site configuration. This was desperation because my site config was already linked into /etc/apache2/sites-enabled so it must have been enabled … right?

a2dissite mysite

But disabling it failed! It gave me some sort of “no such file” error. Whaaaaaa?. So I ran the commend through strace and it turns out that the new apache2 package on Jessie expects the site config file to have the suffix .conf. Changing the name of my site config fragment fixed this and I was then able to enable the config as expected.

That was unbelievably annoying. Hopefully this will save someone else a few minutes.

First OpenXT build

UPDATE; 2014-11-26 I’ve disabled comments on this post. All discussion about building OpenXT should be on the mailing list: https://groups.google.com/forum/#!forum/openxt
UPDATE; 2014-07-25 This page has been superseded by the documentation available on the OpenXT github wiki: https://github.com/OpenXT/openxt/wiki
UPDATE: 2014-06-20 added section about using bash instead of dash
UPDATE: 2014-06-19 fix git clone URI
UPDATE: 2014-06-18 add genisoimage package to list of required packages.
UPDATE: 2014-06-18 remove the section on mangling the manifest file now that I’ve upstreamed a patch.
UPDATE: 2014-06-18 to reflect default mirror being available now.
UPDATE: 2014-06-18 to clarify location of STEPS variable and the setupoe step.

With the transition of XT to OpenXT I’m realizing that the mark of most successful open source projects is good tools and great documentation. Right now we’re a bit short on both. The tools to build the core of OpenXT aren’t our own, they’re maintained by the upstream OpenEmbedded and Yocto communities. The documentation to build OpenXT on the other hand is our responsibility. This is a quick recap of my first build of the code that’s up on github with step-by-step instructions so that you can follow along at home.

The Existing Docs

The closest thing to build docs that we have on github are a README that was left over from a previous attempt to open source this code. That work was done under the name “XenClient Initiative” or XCI for short. This project was before my time on the project but my understanding is that it wasn’t particularly successful.

I guess you can call OpenXT our second go at the open source thing. The instructions in this file are way out of date. They need to be replaced and hopefully this write-up will be the first step in fixing this.

Build Machine

There’s a lot of technical debt that’s built up over the years in the OpenXT code base. The first and most obvious bit of technical debt is in our build system. We require that the build be done on a 32 bit Debian Squeeze system. The 64 bit architecture may work but it’s untested AFAIK.

We require Squeeze for a number of reasons the most obvious of which is our dependency on the GHC 6.12 compiler. Wheezy ships with version 7 and our toolstack hasn’t been updated to work with the new compiler yet. To the Haskell hackers out there: your help would be much appreciated. No doubt there are other dependencies and issues that would need to be worked around in an upgrade but we know this one to be a specific and prominent issue.

The requirement for a 32 bit build host is likely that only 32 bit build hosts have been tested. Any one out there who tries a 64 bit build or a build on Wheezy please report your results so we can get documentation together for the tested and supported build hosts.

Required Packages

The initial list of packages required to build OpenXT can be obtained from the OE wiki. The requirements are:

sed wget cvs subversion git-core coreutils unzip
texi2html texinfo docbook-utils gawk python-pysqlite2
diffstat help2man make gcc build-essential g++
desktop-file-utils chrpath

The list is pretty short as far as build requirements go because OE builds nearly all of the required tools as part of the build. This is a big part of what makes OE so great.

Additionally we require a few extra packages:

ghc guilt iasl quilt bin86 bcc libsdl1.2-dev liburi-perl genisoimage

Packages like guilt and quilt are used in our bitbake patch queue class in the expected way. ghc is the Hasekll compiler which is required to … build the Haskell compiler (much like OE requires gcc to build gcc). genisoimage is used by our build scripts to put the final installer ISO together.

The remaining dependencies: iasl, bin86, bcc, libsdl1.2-dev, and liburi-perl are another instance of technical debt. These packages should be built in OE as dependencies of other packages. Instead our recipes take a short cut and require they be installed on the build host. This seems like a smart shortcut but it’s a shortcut that ends in cross-compile misery. This may be what causes issues between 32 and 64 bit build hosts.

A good example of how to fix these issues already exists. If you’ve been following upstream Yocto development the Xen recipe contributed there gets this right depending on the dev86-native and iasl-native packages. OpenXT would benefit from pulling in the meta-virtualization layer and using this recipe (thanks Chris!)

Bash vs Bourne

Bitbake recipes contain a lot of shell functions and fragments. Per the these will be executed by the host systems /bin/sh. Unfortunately lots of build metadata (including the OpenXT build metadata) is rife with ‘bashisms’. Because of this, Linux distros that don’t link /bin/sh to /bin/bash will cause builds to fail.

The way to resolve this is laid out in the Ubuntu section of the “OE and Your Distro” docs as Debian and thus Ubuntu use the dash shell instead of bash by default. Switching dash for bash is pretty easy thankfully:

sudo dpkg-reconfigure dash

Executing the command above will result in a dialog screen asking you whether or not you want to use dash as your default shell. Select ‘No’ and your system will use bash instead.

I’ve gone ahead and filed a ticket to get ‘bashisms’ out of the first place I ran into them in OpenXT: https://github.com/OpenXT/xenclient-oe/issues/1. If you’ve got some time to kill it would be helpful if someone could track down more of our dirty laundry like this, or better yet, send in a pull request to sort some of this out.

Clone and Configure

If you’re following along you should now have a 32 bit Debian Squeeze build host with some additional packages installed. The next step is to clone the base OpenXT git repo:

git clone git://github.com/OpenXT/openxt.git

This will give you a directory named openxt that contains our build scripts. Change into this directory and we’ll take a look at the important files.

Firstly the file you’ll most often customize in here is the .config but you don’t have one yet. Copy the example-config file to .config so we can modify it for our environment:

cp example-config .config

The .config file is read by the script do_build.sh that … does the build. There are a hand full of variables in .config that are interesting for our first build. For now we’ll ignore the rest.

STEPS

We’ll start with STEPS. This one isn’t defined in the example-config but it’s initialized in the do_build.sh script. Since this script imports all variables from the config we can add it to the config manually to get the desired effect.

This variable defines the build steps that will be carried out when do_config.sh is run with no options. The default steps don’t all work yet so we’ll set this to the minimum that we can get away with for now:

STEPS="initramfs,stubinitramfs,dom0,uivm,ndvm,syncvm,installer,installer2,ship,copy"

I’ve left out the necessary setupoe step because I typically run this one manually and check that my variables get populated in the OE local.conf successfully. You don’t need to do it this way but it may help you get a better understanding of how the configuration works. After I go through the necessary variables we’ll go back to setupoe.

OPENXT_MIRROR

Due to some of the software used in OpenXT being a bit old the upstream mirrors of a few source tarballs are no longer available. Further we require a number of Intel binary ACM modules for TXT to function properly. Intel hides these zips behind a lawyer wall requiring users to accept license terms before they can be downloaded. That’s great CYA from their legal department but it prevents any automated build from downloading them in the build so we have to mirror them ourselves (which is permitted by their license).

When I did my first build the other day the URL from the example configuration file wouldn’t resolve for me. So I set up my own mirror:

OPENXT_MIRROR="http://www.twobit.us/mirrors/openxt"

This should be fixed so check the default mirror first. If it doesn’t work feel free to clone my mirror in your local environment but do me a favor and go easy on the bandwidth please.

UPDATE: The default mirror is fixed. You should use the default value (http://openxt.xci-test.com/mirror/) and not my mirror … well really you should set up your own mirror because there’s no way to tell how long the default mirror will stay up and it’s nice to keep redundant traffic to a minimum.

Signing Certs

The topic of signing releases is a huge one so for now we’ll just stick to the minimum work required to get our build signed once it’s done. There are several relevant config variables for this:

REPO_PROD_CACERT="/path/to/prod-cacert.pem"
REPO_DEV_CACERT="/path/to/dev-cacert.pem"
REPO_DEV_SIGNING_CERT="/path/to/dev-cacert.pem"
REPO_DEV_SIGNING_KEY="/path/to/dev-cakey.pem"

We require that each builder create their own keys for development and release builds. In this example I’m using self signed certs so it’s as simple as possible. Use the following commands to create your keys and your self signed certs:

openssl genrsa -out cakey.pem 2048
openssl req -new -x509 -key cakey.pem -out cacert.pem -days 1095

You’ll need two key/cert pairs, one for the automated signing of the build (a ‘dev’ key) and the certificate for a production signing key. All protection of the production signing key is the responsibility of whoever is signing the release. I’ll cover this in another post at another time. For now just make the keys and the certs and put the variables in the .config file.

OE_BUILD_CACHE_DL

For those of you who have used OE in a previous life you know how huge an OE build directory can get. Part of this is caused the the OE download cache which is the directory where OE caches downloaded source code. In the OE local.conf file this is specified by the DL_DIR variable. We use OE_BUILD_CACHE_DL instead.

Personally my build system has a RAID SSD set up to keep my builds as fast as possible but I can’t afford enough SSDs to support having my DL_DIR on my SSD storage. Typically I’ll use larger but slower storage (NFS RAID array) for my download mirror that I share between projects. Often times I’ll just link that slower storage mount directly into my build tree to keep everything in one place. Do whatever works best for you and remember this is completely optional. You can leave this out and the build will just use a directory in your build but remember this will make your build much larger:

OE_BUILD_CACHE_DL="/path/to/oe-download"

Even with the download cache on a separate volume an OpenXT build takes up a lot of disk space. This minimal build of just the core service VMs and no in-guest tools weighs in at 74G. My download cache is shared between all of my OE projects so I can’t say exactly how large the cache for a fresh OpenXT build will be. My combined cache is ~20G but I’d expect the files for OpenXT are a small portion of that.

Start Your Engines

That’s about all it should take. Next I run the do_build.sh script explicitly executing the setupoe step:

./do_build.sh -s setupoe

This step clones all of the git repos you need to build OpenXT (mostly OE meta-layers) and then processes your .config file to generate the OE local.conf. You can take a look at this file to see the variables and make sure they’re set to the right thing. This can be very helpful when debugging or if you want to manually change them for some reason.

After OE has been setup you can just run the do_build.sh script and then go drink coffee for the next 6 hours:

./do_build.sh | tee build.log

If all goes well at the end of this you’ll have an ISO with the OpenXT installer on it ready to go here:

./build-output/openxt-dev--master/iso/installer.iso

There’s no magic here. The comma separated list of steps we put into the STEPS variable can all be invoked one at a time the same way we ran setupoe. So if you want to build just the dom0 initramfs you can do so like this:

./do_build.sh -s initramfs | tee initramfs.log

Piping the build output to a file is always useful when things break (and they will break). My recommendation would be to build a few steps individually before you go and do a full build. I’ll probably do another post about the individual build steps and how all that works next.

Now go forth, build, install, and report bugs.

XT is Open

The last few years of my processional life have been, well, interesting … as in the apocryphal Chinese curse. Going into the details of the business end of this is beyond the scope of what belongs on this website. The bits related to the technology of XenClient XT however have been written about here in the past (see my tags for XenClient and my work on measured launch). The technology is always the most interesting and important stuff anyways and this time it’s not different.

This post marks an important mile stone: the core technology that makes up XenClient XT is now available in the open source as OpenXT. In my opinion this moment is long overdue. There have been several attempts to move this project to an open source development model in the past. Most of these efforts were before my time so I can’t claim any fame there. Those working on this project before me who realized that this technology would be best served by an open development model were indeed before their time and well ahead of the “decision makers” who held them back.

My only hope now is that we aren’t too late. That there is still some value in this code and that the world hasn’t passed us by. That the small community of people who care about the intersection of virtualization and security will rally and contribute to the code base to help us pay off the technical debt that has built up over the years and push forward new features and architectural advancements.

The new OpenXT tag will track work on this project. I filed the first bug against our bitbake metadata just now so hopefully it’s the first of many filed and fixed in the future. Happy hacking!

building HVM Xen guests

On my Xen systems I’ve run pretty much 99% of my Linux guests paravirtualized (PV). Mostly this was because I’m lazy. Setting up a PV guest is super simple. No need for partitions, boot loaders or any of that complicated stuff. Setting up a PV Linux guest is generally as simple as setting up a chroot. You don’t even need to install a kernel.

There’s been a lot of work over the past 5+ years to add stuff to processors and Xen to make the PV extensions to Linux unnecessary. After checking out a presentation by Stefano Stabilini a few weeks back I decided I’m long overdue for some HVM learning. Since performance of HVM guests is now better than PV for most cases it’s well worth the effort.

This post will serve as my documentation for setting up HVM Linux guests. My goal was to get an HVM Linux installed using typical Linux tools and methods like LVM and chroots. I explicitly was trying to avoid using RDP or anything that isn’t a command-line utility. I wasn’t completely successful at this but hopefully I’ll figure it out in the next few days and post an update.

Disks and Partitions

Like every good Linux user LVMs are my friend. I’d love a more flexible disk backend (something that could be sparsely populated) but blktap2 is pretty much unmaintained these days. I’ll stop before I fall down that rabbit hole but long story short, I’m using LVMs to back my guests.

There’s a million ways to partition a disk. Generally my VMs are single-purpose and simple so a simple partitioning scheme is all I need. I haven’t bothered with extended partitions as I only need 3. The layout I’m using is best described by the output of sfdisk:

# partition table of /dev/mapper/myvg-hvmdisk
unit: sectors
 
/dev/mapper/myvg-hvmdisk1 : start=     2048, size=  2097152, Id=83
/dev/mapper/myvg-hvmdisk2 : start=  2099200, size=  2097152, Id=82
/dev/mapper/myvg-hvmdisk3 : start=  4196352, size= 16775168, Id=83
/dev/mapper/myvg-hvmdisk4 : start=        0, size=        0, Id= 0

That’s 3 partitions, the first for /boot, the second for swap and the third for the rootfs. Pretty simple. Once the partition table is written to the LVM volume we need to get the kernel to read the new partition table to create devices for these partitions. This can be done with either the partprobe command or kpartx. I went with kpartx:

$ kpartx -a /dev/mapper/myvg-hvmdisk

After this you’ll have the necessary device nodes for all of your partitions. If you use kpartx as I have these device files will have a digit appended to them like the output of sfdisk above. If you use partprobe they’ll have the letter ‘p’ and a digit for the partition number. Other than that I don’t know that there’s a difference between the two methods.

Then get the kernel to refresh the links in /dev/disk/by-uuid (we’ll use these later):

$ udevadm trigger

Now we can set up the filesystems we need:

$ mkfs.ext2 /dev/mapper/myvg-hvmdisk1
$ mkswap /dev/mapper/myvg-hvmdisk2
$ mkfs.ext4 /dev/mapper/myvg-hvmdisk3

Install Linux

Installing Linux on these partitions is just like setting up any other chroot. First step is mounting everything. The following script fragment

# mount VM disks (partitions in new LV)
if [ ! -d /media/hdd0 ]; then mkdir /media/hdd0; fi
mount /dev/mapper/myvg-hvmdisk3 /media/hdd0
if [ ! -d /media/hdd0/boot ]; then mkdir /media/hdd0/boot; fi
mount /dev/mapper/myvg-hvmdisk1 /media/hdd0/boot
 
# bind dev/proc/sys/tmpfs file systems from the host
if [ ! -d /media/hdd0/proc ]; then mkdir /media/hdd0/proc; fi
mount --bind /proc /media/hdd0/proc
if [ ! -d /media/hdd0/sys ]; then mkdir /media/hdd0/sys; fi
mount --bind /sys /media/hdd0/sys
if [ ! -d /media/hdd0/dev ]; then mkdir /media/hdd0/dev; fi
mount --bind /dev /media/hdd0/dev
if [ ! -d /media/hdd0/run ]; then mkdir /media/hdd0/run; fi
mount --bind /run /media/hdd0/run
if [ ! -d /media/hdd0/run/lock ]; then mkdir /media/hdd0/run/lock; fi
mount --bind /run/lock /media/hdd0/run/lock
if [ ! -d /media/hdd0/dev/pts ]; then mkdir /media/hdd0/dev/pts; fi
mount --bind /dev/pts /media/hdd0/dev/pts

Now that all of the mounts are in place we can debootstrap an install into the chroot:

$ sudo debootstrap wheezy /media/hdd0/ http://http.debian.net/debian/

We can then chroot to the mountpoint for our new VMs rootfs and put on the finishing touches:

$ chroot /media/hdd0

Bootloader

Unlike a PV guest, you’ll need a bootloader to get your HVM up and running. A first step in getting the bootloader installed is figuring out which disk will be mounted and where. This requires setting up your fstab file.

At this point we start to run into some awkward differences between our chroot and what our guest VM will look like once it’s booted. Our chroot reflects the device layout of the host on which we’re building the VM. This means that the device names for these disks will be different once the VM boots. On our host they’re all under the LVM /dev/mapper/myvg-hvmdisk and once the VM boots they’ll be something like /dev/xvda.

The easiest way to deal with this is to set our fstab up using UUIDs. This would look something like this:

# / was on /dev/xvda3 during installation
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /               ext4    errors=remount-ro 0       1
# /boot was on /dev/xvda1 during installation
UUID=yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot           ext2    defaults        0       2
# swap was on /dev/xvda2 during installation
UUID=zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz none            swap    sw              0       0

By using UUIDs we can make our fstab accurate even in our chroot.

After this we need to set up the /etc/mtab file needed by lots of Linux utilities. I found that when installing Grub2 I needed this file in place and accurate.

Some data I’ve found on the web says to just copy or link the mtab file from the host into the chroot but this is wrong. If a utility consults this file to find the device file that’s mounted as the rootfs it will find the device holding the rootfs for the host, not the device that contains the rootfs for our chroot.

The way I made this file was to copy it off of the host where I’m building the guest VM and then modify it for the guest. Again I’m using UUIDs to identify the disks / partitions for the rootfs and /boot to keep from having data specific to the host platform leak into the guest. My final /etc/mtab looks like this:

rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=10240k,nr_inodes=253371,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=203892k,mode=755 0 0
/dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx / ext4 rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=617480k 0 0
/dev/disk/by-uuid/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot ext2 rw,relatime,errors=continue,user_xattr,acl 0 0

Finally we need to install both a kernel and the grub2 bootloader:

$ apt-get install linux-image-amd64 grub2

Installing Grub2 is a pain. All of the additional disks kicking around in my host confused the hell out of the grub installer scripts. I was given the option to install grub on a number of these disks and none were the one I wanted to install it on.

In the end I had to select the option to not install grub on any disk and fall back to installing it by hand:

$ grub-install --force --no-floppy --boot-directory=/boot /dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

And then generate the grub config file:

update-grub

If all goes well the grub boot loader should now be installed on your disk and you should have a grub config file in your chroot /boot directory.

Final Fixups

Finally you’ll need to log into the VM. If you’re confident it will boot without you having to do any debugging then you can just configure the ssh server to start up and throw a public key in the root homedir. If you’re like me something will go wrong and you’ll need some boot logs to help you debug. I like enabling the serial emulation provided by qemu for this purpose. It’ll also allow you to login over serial which is convenient.

This is pretty standard stuff. No paravirtual console through the xen console driver. The qemu emulated serial console will show up at ttyS0 like any physical serial hardware. You can enable serial interaction with grub by adding the following fragment to /etc/default/grub:

GRUB_TERMINAL_INPUT=serial
GRUB_TERMINAL_OUTPUT=serial
GRUB_SERIAL_COMMAND="serial --speed=38400 --unit=0 --word=8 --parity=no --stop=1"

To get your kernel to log to the serial console as well set the GRUB_CMDLINE_LINUX variable thusly:

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,38400n8"

Finally to get init to start a getty with a login prompt on the console add the following to your /etc/inittab:

T0:23:respawn:/sbin/getty -L ttyS0 38400 vt100

Stefano Stabilini has done another good write-up on the details of using both the PV and the emulated serial console here: http://xenbits.xen.org/docs/4.2-testing/misc/console.txt. Give it a read for the gory details.

Once this is all done you need to exit the chroot, unmount all of those bind mounts and then unmount your boot and rootfs from the chroot directory. Once we have a VM config file created this VM should be bootable.

VM config

Then we need a configuration file for our VM. This is what my generic HVM template looks like. I’ve disabled all graphical stuff: sdl=0, stdvga=0, and vnc=0, enabled the emulated serial console: serial='pty' and set xen_platform_pci=1 so that my VM can use PV drivers.

The other stuff is standard for HVM guests and stuff like memory, name, and uuid that should be customized for your specific installation. Things like uuid and the mac address for your virtual NIC should be unique. There are websites out there that will generate these values. Xen has it’s own prefix for MAC addresses so use a generator to make a proper one.

builder = "hvm"
memory = "2048"
name = "myvm"
uuid = "uuuuuuuu-uuuu-uuuu-uuuu-uuuuuuuuuuuu"
vcpus = 1
cpus = '0-7'
pae=1
acpi=1
apic=1
boot='c'
xen_platform_pci=1
sdl=0
vnc=0
vnclisten='0.0.0.0'
stdvga=0
serial='pty'
 
disk = [
    '/dev/ssdraid1/wwwhome,raw,xvda,rw'
]
vif = [
    'mac=XX:XX:XX:XX:XX:XX,model=e1000',
]

Boot

Booting this VM is just like booting any PV guest:

xl create -c /etc/xen/vms/myvm.cfg

I’ve included the -c option to attach to the VMs serial console and ideally we’d be able to see grub and the kernel dump a bunch of data as the system boots.

TODO

I’ve tested these instructions twice now on a Debian Wheezy system with Xen 4.3.1 installed from source. Both times Grub installs successfully but fails to boot. After enabling VNC for the VM and connecting with a viewer it’s apparent that the VM hangs when SEABIOS tries to kick off grub.

As a work-around both times I’ve booted the VM from a Debian rescue ISO, setup a chroot much like in these instructions (the disk is now /dev/xvda though) and re-installed Grub. This does the trick and rebooting the VM from the disk now works. So I can only conclude that either something from my instructions w/r to installing Grub is wrong but I think that’s unlikely as they’re confirmed from numerous other “install grub in a chroot” instructions on the web.

The source of the problem is speculation at this point. Part of me wants to dump the first 2M of my disk both after installing it using these instructions and then again after fixing it with the rescue CD. Now that I think about it the version of Grub installed in my chroot is probably a different version than the one on the rescue CD so that could have something to do with it.

Really though, I’ll probably just install syslinux and see if that works first. My experiences with Grub have generally been bad any time I try to do something out of the ordinary. It’s incredibly complicated and generally I just want something simple like syslinux to kick off a very simple VM.

I’ll post an update once I’ve got to the bottom of this mystery. Stay tuned.