Cleaning up my TPM hack

A few weeks back I posted some stuff about hacking up an Asus TPM daughter card to hook it up to a PCEngines APU system. That was the first I’ve soldered in pretty much forever. It was ugly. Seriously ugly. But it worked. Still, we can do better.

Desoldering

In my first pass at this I left the pins in the board after cutting off the connector. To hook up the new connector I just soldered the wires directly to the existing pins. The right way to do this is to desolder the existing pins, pull them out, and run the new wires into the holes left by the old pins.

After discovering what a solder pump is this isn’t the impossible task I thought it was. There were two pins that wouldn’t budge though. These two were hooked up to the ground plate on the card so when I applied heat the whole card absorbed it. Since I didn’t need these pins I just left them alone. This is what the card looks like with the pins desoldered:

IMG_20150303_152024

IMG_20150303_152006

New Wiring

Now we just add some new wires following the mapping from the last post. Business as usual. I did up ground in the same way running it directly to the plate on the daughter card. The sanding job I did this time was much better since I used real sand paper. Actually I picked up a few of those nail files at the grocery store and they worked out great.

IMG_20150308_184619

Last time I ran all the wires over the top and they piled up something awful. This time I ran the problematic ones under the card to keep things clean. I’m pretty happy with the end result.

IMG_20150309_195408

IMG_20150309_195358

Programming the DS1077L Oscillator

My last post documented my first bit of hardware hacking in probably 10 years. I had hoped to have a full post for a related bit of work ready for posting by now but, to stick with our theme, this stuff is hard so I’ve had to break it down into more than one post. Hopefully this won’t be too much of a spoiler but my current project requires a 33MHz oscillator. Here’s a bit of data about the programmable once I picked up along with a few tools I wrote to program it from Linux.

DS1077L Oscillator

Fixed frequency crystal oscillators are pretty easy to buy, but if you’re prototyping on a breadboard these are a bit of a PITA to hook up. As an alternative, there are surfance mount programmable oscillators available and the DS1077L from Maxim seems to be the most popular in the “maker” community. Sparkfun sells these pre-packaged on breakout boards so if you just just want to buy something and start using it this is the route to go.

I ordered the parts individually so I could practice my soldering but the end result is the same. Take a few DS1077L+66s from Mouser Electronics along with a few 8-pin SparkFun SOIC to DIP Adapters, solder them together and then plug them into your breadboard. The only difference is that the breakout board from Sparkfun comes with a filter capacitor so we’ll have to supply our own. This oscillator is programmable via the I2C bus so to program the oscillator to the frequency we need we need to wire it up to the I2C bus on a Linux system. In the end the circuit looks something like this.

IMG_20150307_115107

Top quality CAD drawing for you there. If you’re interested in the details of how this chip works the spec sheet is your friend. The values for the pull up resistors on the I2C lines and the bypass capacitor are specifically defined there. My purposes are simple in that I just need a 33MHz clock so the CTRL pins can be ignored, as can the reference output OUT0. OUT1 will output the clock and SCL/SDA are the two I2C clock and data pins. Initially the DS1077L+66 is programmed to output a 66MHz clock so we’ll need to program the chip to cut this in half.

Programming the DS1077L

The first step in this process is wiring the circuit and attaching it to the I2C bus on a Linux system. I soldered an I2C header on to my PCEngines ALIX3d2 and hooked it up like so:

IMG_20150225_200211

This chip is pretty sophisticated so I won’t cover all possible configurations. The data sheet is readily available here so if you want the details have a read (you should anyways if you’re using this chip). All I’m going to cover here is scaling down the clock frequency and the tool I’ve written to do so.

The primary output (OUT1) has two mechanisms for controlling its output. First is a prescaler circuit that can divide the native frequency (66MHz) by 1, 2, 4 or 8. The second is a divider circuit that can divide the frequency by integers between 2 and 1025. The divider circuit can be bypassed completely if we wish so we can get a 33MHz clock in two ways: 1) set the prescaler to 2 and bypass the divider or 2) set the prescalar to 1 and the divider to 2. The DS1077L defines a low power state where much of it’s internal circuitry is disabled including the divider so I opted for this low power state and the first option above.

To aid in programming this thing I’ve put together a small set of tools I’m calling ds1077l-ctrl. These tools expose a simple command interface for modifying the state of the control registers in the DS1077l. As an example we would program this chip to output the 33MHz clock that I need like so:

$ ./ds1077l-mux --bus-dev=/dev/i2c-0 --set --pdn0=0 --sel0=0 --en0=0 --div=1 --p1=2

That’s it. The –help / –usage messages for the tools combined with data from the spec sheet should be all that you need to set the DS1077L to any possible state. In the command above we’ve set (–set) the DS1077L device on I2C bus 0 (–bus-dev=/dev/i2c-0) such that the divider is disabled (–div=1), the prescalar to 2 (–p1=2). The other options (–pdn0=0 –sel0=0 –en0=0) set the oscillator to a low power state disabling the circuitry we’re not using.

To get a sort of before and after I hooked up this chip to an oscilloscope to capture the default 66MHz waveform and the 33MHz one we get after executing the command above:

IMG_20150225_200236
IMG_20150225_193909


Bugs

This software is not without bugs (as always). I couldn’t get the WRITEE2 command to work properly but it’s only necessary if you change the default behavior of the chip and set the WC bit to 1. This causes the oscillator to write register state to EEPROM only when the WRITEE2 command is invoked. By default it will save the register state every time it changes so just leave the WC bit alone and everything should work as expected. Alternatively feel free to fix this bug and send me a pull request.

Attaching a TPM on the LPC

Just for the funs I recently revived some older work with my PCEngines alix3d2 where I built an OE meta layer with a simple machine and kernel config to build images: meta-alix.

IMG_20150213_151432

IMG_20150216_183119

IMG_20150216_183320

TPMs for all the boards!

I’ve got a soft spot for the older PCEngines WRAP board since it was the first platform I experimented on while building a home router / access point years ago. So meta-alix was fun work but nothing too crazy. While playing around with this I noticed that the alix3d2 has an exposed 20 pin header labeled ‘LPC’. Now that is interesting because I’ve got a few Asus branded Infineon TPMs laying about and they’ve got LPC connectors on them. A home wireless router with a TPM on it? Now that could be interesting.

Attaching an TPM designed to attach to a board on a 20 pin LPC connector should be pretty easy right? That’s what I thought too. But here we are 2 weeks later and I’m just now getting to write this up and I can’t say this work was 100% successful. But before I go too deep into the trials and tribulations let’s start with a bill of materials.

Bill of materials

To start out you’ll need a TPM and one designed to attach to your system on the LPC bus. TPMs are a PITA to buy really. There are 3 main companies that manufacture them but you can’t buy them direct. Thankfully there are some motherboard manufacturers out there that support the TPM via a “daughter-card” and from my experience this is mostly the high end manufacturers like Asus and Supermicro. I had 2 Asus TPMs laying around so this seemed like a good opportunity to put them to use. On Amazon these TPMs go for about $15 but when I bought mine almost a year ago they were less than half that.

The system that started out trying to attach this thing to is an alix3d2. I also picked up one of the newer PCEngines APU but *spoiler alert* only after I had serious problems getting the alix to work.

You’ll also need a soldering iron and the usual soldering / prototyping gear on hand (lights, wire, solder, magnifying glass etc). That’s right I said soldering. It’s been a while for me too. Like 10 years. Don’t worry there isn’t much too this and it was really fun.

Prototyping

As you’ve likely guessed by now, just because a system has an LPC connector doesn’t mean this thing is plug and play. The Asus TPM daughter card has pin 4 blocked / keyed and the ALIX doesn’t so that’s our first hint. The real data is in the respective pin diagrams. Finding these isn’t as easy as I’d hoped so I had to do some digging.

The docs for the ALIX systems are all on the PCEngines website so that part’s easy. The Asus TPM doesn’t seem to have any docs though. If you take the time to dig into the boards that support them though you’ll find the manuals for these boards have the pin assignment documented. I pulled down the manual for the P9D-WS and used this as a reference. Page 2-29 has what we’re looking for.

Pin Layouts

With the pin layouts in hand we can see clearly that plugging the TPM daughter card directly into the board isn’t gonna happen. I’ll reproduce the layouts here so we can view them side by side:

Asus TPM PCEngines LPC
pin signal signal
1 PCICLK PCICLK
2 GND GND
3 FRAME LAD0
4 BLOCKED GND
5 PCIRST# LAD1
6 NC GND
7 LAD3 LAD2
8 LAD2 GND
9 +3V LAD3
10 LAD1 GND
11 LAD0 LFRAME#
12 GND GND
13 NC PCIRST#
14 NC CLK48A
15 +3VSB ISP
16 SERIRQ Vcc (+5V)
17 GND GND
18 CLKRUN V3
19 PWRDWN SERIRQ
20 NC LDRQ#

There’s basically no overlap in the pin layouts here except for a few ground connections. This blew my mind at first but after searching through the Intel Low Pin Count Interface Specification it turns out that this bus was intended for use on-board only and so there’s no pin layout specified for external connectors. First mystery solved. Now let’s figure out how we’re gonna wire this thing up.

To the breadboard!

This isn’t going to be as easy as “plug and play” but it’s not far off. We just need to connect the right pins. With the pin map above and a little help from the spec (to get the minimum required connections) we can pull out our breadboard and prototype this thing.

If you’re like me you’ll have to go out and buy materials as you need them. Luckily I live minutes away from HSC Electronic Supply which is an amazing surplus electronic shop. After an hour or 3 poking around the piles of old electronic gear I managed to scrounge up a 20 pin ribbon cable with a connector that looked like it might fit on my breadboard. With a 20 pin DIP ribbon cable connector I had what I needed to connect the alix to the breadboard.

Next was to get the TPM daughter card wired up to the breadboard. This was harder than I expected. I couldn’t easily find a connector that would suit this purpose that didn’t require waiting for shipping. So I soldered some wires up to breakaway headers and rigged up a horrible TPM-to-breadboard connector. Then we just hook up the two using the following mapping:

TPM ALIX Signal
1 1 PCICLK / LCLK: 33MHz clock
3 11 LFRAME#: Transaction signal
5 13 LRESET#: Bus reset. AKA PCIRST#
7 9 LAD3: Data lane
8 7 LAD2: Data lane
9 & 15 18 3 Volts DC
10 5 LAD1: Data lane
11 3 LAD0: Data lane
16 19 SERIRQ: Serialized interrupt signal

After some fiddling (kicking, screaming and burning myself with a soldering iron) this is what it looked like:

IMG_20150213_093921

IMG_20150212_175548
Now it SHOULD have worked. These are the right connections. But on the alix3d2 I got no love. I didn’t actually get this set-up to work till my apu1d showed up in the mail 3 days later. For whatever reason the external LPC on the alix3d2 just doesn’t work as advertised. Without an oscilloscope I can’t debug much beyond whether the voltage and ground pins are OK (and they are) so for now that will remain a mystery. So the alix3d2 is out and the apu1d is in.

Anyways we can do better than this bootleg breadboard setup. Let’s see about cleaning it up.

IMG_20150213_093453

IMG_20150213_093638

Clean it up

The wiring above was super flaky and that shouldn’t be a surprise. I didn’t get the length of each wire exact and the pins slipped around a bit in the plastic. I ordered some IDC breakout helpers from Adafruit but they were garbage. They plug into the breadboard fine but the pins aren’t long enough and they just pop back out immediately.

So again I hacked up another connector out of DIP/DIL headers and some breakaway headers spaced to span the gap in the breadboard. This is generally a bad idea since the solder is what’s holding the whole thing together but it worked out pretty OK:

IMG_20150213_151417

IMG_20150216_183250

IMG_20150214_134235

Packaging for the APU enclosure

After convincing ourselves that the wiring above is right on the breadboard we need to clean this up so that it fits in the enclosure with the APU. There’s not a lot of space in the PCEngines recommended case1d2 but there’s enough if we’re sufficiently inventive. And by “inventive” I mean “you’ve got a hacksaw”.

Start out by removing the female header from the TPM and trim back the connector pins. If we flip this header on it’s side we can use it to mount the TPM once we reconnect it. This would require either unblocking pin 4 on the connector or cutting pin 4 off of the APU board. Since pin 4 on the APU is ground anyways this shouldn’t be a problem.

I used a 20 pin DIP to ribbon cable connector for my setup. I sanded down the daughter board to expose the copper on the base which happens to be ground and connected this with the even pins on the connector up through 12. This proved to be a pretty solid base as it holds the daughter board nice and tight to the connector.

Then we just cut wires and solder pins per the table above. The wire I had on hand was 28 gauge which was a bit too big and the soldering job is straight up ugly in spots but it’s the first bit of soldering I’ve done in 10 years so that’s good enough for me. I’ve got another TPM on hand so I’ll have another go now that I’ve had some practice.

Testing

I used both a Debian install with the tpm-tools package to test this as well as the core-image-tpm from meta-measured. I’d recommend sticking with Debian unless you feel like falling down the rabbit hole of an OE build. The important thing to keep in mind is that the APU BIOS doesn’t support the TPM so it won’t do the necessary setup for us.

The BIOS is supposed to do a number of things to set things up so that the OS can use the TPM. This includes running the TPM self test, enabling it and setting up ACPI entries to make it easy for the OS to talk to it. With the stock BIOS on the APU we won’t get any of this. Thankfully the number of platforms that have implemented TPM support wrong in the BIOS over the years is quite high so the Linux TPM TIS driver can do all of this for us if we give it the right parameters:

root@apu:~# modprobe tpm_tis force=1
[   74.027383] tpm_tis tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
[   74.063388] tpm_tis tpm_tis: Issuing TPM_STARTUP
[   74.260392] tpm_tis tpm_tis: TPM is disabled/deactivated (0x7)
[   74.308465] genirq: Flags mismatch irq 4. 00000080 (tpm0) vs. 00000000 (serial)
[   74.315956] tpm_tis tpm_tis: Unable to request irq: 4 for probe
[   74.436459] genirq: Flags mismatch irq 8. 00000080 (tpm0) vs. 00000000 (rtc0)
[   74.443753] tpm_tis tpm_tis: Unable to request irq: 8 for probe

The modinfo command will tell you all of the gory details about what these parameters do if you’re interested. The short version is that force=1 causes the driver to ignore ACPI and probe for the TPM device. You can also add the interrupts=0 argument to disables interrupts which will get rid of all of the genirq errors. After this you should see /dev/tpm0 appear magically. You can then start tcsd and get some version info out of the TPM:

root@apu:~# tpm_version 
  TPM 1.2 Version Info:
  Chip Version:        1.2.3.19
  Spec Level:          2
  Errata Revision:     2
  TPM Vendor ID:       IFX
  Vendor Specific data: 0313000b 00
  TPM Version:         01010000
  Manufacturer Info:   49465800

You won’t be able to do much more than this though since the BIOS hasn’t enabled the TPM for us. We’ll get past this in my next post.

TXT and tboot on the IVB NUC

I wrote a few days back about getting serial output from tboot on my new-ish Ivy Bridge vPro NUC. This was a means to and end and so this is where we’ll cover actually using this serial hardware to do something meaningful.

tboot log from the DC53427HYE NUC

Testing TXT and tboot on a new system is often painful. If there’s something wrong and tboot can’t execute SENTER successfully the system just reboots and will continually. This confuses the hell out of most and necessitates either serial hardware to capture log output for trouble shooting, or a patch to bypass this reboot logic.

The bit about the patch is interesting and I’ve hacked one out here. I hacked this together without much thought about the security implications as a work-around so I don’t recommend it for production use. It was only intended as a way to bring up a system with a borked TXT implementation so data could be collected with txt-stat.

Regardless of how you go about getting your tboot output it’s your first step in debugging. Here’s the tboot log captured from my DC53427HYE NUC. You can see the automated reboot after SENTER fails.

Debugging my tboot failure

I’ve jacked up the logging level of tboot so there’s a lot of data go dig through in the log. Generally though the data we need is in the TXT.ERRORCODE. Don’t forget though that on the first boot this value will be 0x0 since it’s only set once the failure occurs. The interested reader can, well, read all about this register in the MLE developers guide section B.1.3.

So after the failed boot the TXT.ERRORCODE gets set and we can grab it from the log. The relevant line is:

TBOOT: TXT.ERRORCODE: 0xc0021041

Not a particularly helpful error message but then again, there isn’t much space for a helpful textual description of the error in a 32bit register. So the next step is to decode this thing.

Decoding the TXT.ERRORCODE

The MLE developers guide describes the general structure of the data in this register but the error code itself is specific to the ACM used by the platform. Again this data is in the tboot log file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
TBOOT: checking if module /acm_hsw.bin is an SINIT for this platform...
TBOOT: chipset production fused: 1
TBOOT: chipset ids: vendor: 0x8086, device: 0xb001, revision: 0x1
TBOOT: processor family/model/stepping: 0x306a9
TBOOT: platform id: 0x10000000000000
TBOOT:   1 ACM chipset id entries:
TBOOT:       vendor: 0x8086, device: 0xb002, flags: 0x1, revision: 0x1, extended: 0x0
TBOOT:   chipset id mismatch
TBOOT: checking if module /acm_ivb.bin is an SINIT for this platform...
TBOOT:   1 ACM chipset id entries:
TBOOT:       vendor: 0x8086, device: 0xb001, flags: 0x1, revision: 0x1, extended: 0x0
TBOOT:   4 ACM processor id entries:
TBOOT:       fms: 0x206a0, fms_mask: 0xfff3ff0, platform_id: 0x10000000000000, platform_mask: 0x1c000000000000
TBOOT:       fms: 0x206a0, fms_mask: 0xfff3ff0, platform_id: 0x4000000000000, platform_mask: 0x1c000000000000
TBOOT:       fms: 0x306a0, fms_mask: 0xfff3ff0, platform_id: 0x10000000000000, platform_mask: 0x1c000000000000
TBOOT: SINIT matches platform

You can see on line 1 that tboot is trying the HSW / Haswell ACM which doesn’t match the platform. Then on line 9 it gives the IVB / Ivy Bridge a try and that one matches the platform. So our error code is specific to the IVB ACM so we’ll have to dig through the docs from that tarball. If you’ve built meta-measured the appropriate PDF will be located at:

${TOPDIR}/tmp-glibc/work/corei7-64-oe-linux/3rd-gen-i5-i7-sinit/67-r0/3rd_gen_i5_i7-SINIT_67/SINIT_Errors.pdf.

But before we can make use of this we’ve gotta parse out the error code into its component parts. From this doc they’re defined as (from MSB to LSB):

  • bit 31 – Valid
  • bit 30 – External
  • bits 29:25 – Reserved
  • bits 24:16 – Minor Error Code
  • bit 15 – SW Source
  • bits 14:10 – Major Error Code
  • bits 9:4 – Class Code
  • bits 3:0 – Module Type

So we need to divide the error code 0xc0021041 on these boundaries and then go back to the docs to figure out what each field means:

Valid: 0x1 - The error code is valid.
External: 0x1 - Error state induced by external software.
Reserved: 0x0 - No significance. 
Minor Error Code: 0x2 - Fatal and TPM specific.
SW Source: 0x0 - Generated by the ACM.
Major Error Code: 0x4 - TPM NV is unlocked.
Class Code: 0x4 - TPM Access
Module Type: 0x1 - SINIT Module

Going through each of these takes a while so we’ll focus on the important stuff: The Major error code. Actually the error text says it all: The TPM NV RAM is unlocked and it shouldn’t be. With the TPM in this state tboot also complains in the boot log, see line 3 below from the log linked above:

1
2
3
4
TBOOT: TPM: TPM Family 0x0
TBOOT: TPM is ready
TBOOT: TPM nv_locked: FALSE
TBOOT: TPM timeout values: A: 750, B: 750, C: 750, D: 750

So there’s the problem. Now what’s the solution?

Unlocked TPM NVRAM

The TPM NVRAM is described in the relevant TCG TPM 1.2 spec, section 19: “NV Storage Structures”. The parts relevant to us is 19.1.1 where the required TPM_NV_INDEX values are described. These are the NV indexes that “must be found on each TPM regardless of platform”.

The first index listed in this section is TPM_NV_INDEX_LOCK and given the error code we’re getting that looks relevant. A bit of reading and you’ll see why having this index defined on a TPM is so important and why shipping a platform with it undefined is considered a security issue. Turns out that without this index defined the TPM doesn’t enforce authorization protections. In this state an attacker can write to the NVRAM repeatedly, wear it out (since NVRAM can be written to a finite number of times) and effectively DOS the TPM completely by making the NVRAM unusable. Very surprising that Intel is shipping the NUC in this state. Anyways, nothing we can’t fix …

At this point I went back in to the NUC and used tpm_nvindex to dump the NVRAM indexes defined on my platform:

NVRAM index   : 0x10000001 (268435457)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : ALL
Permissions   : 0x00001002 (WRITEALL|OWNERWRITE)
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 20 (0x14)
 
NVRAM index   : 0x1000f000 (268496896)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : ALL
Permissions   : 0x00020002 (OWNERREAD|OWNERWRITE)
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 1129 (0x469)
 
NVRAM index   : 0x50010000 (1342242816)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : ALL
Permissions   : 0x00000001 (PPWRITE)
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 10 (0xa)
 
NVRAM index   : 0x50000003 (1342177283)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : 0x18
Permissions   : 0x00000000 ()
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 64 (0x40)
 
NVRAM index   : 0x50000001 (1342177281)
PCR read  selection:
 Localities   : ALL
PCR write selection:
 Localities   : ALL
Permissions   : 0x00002000 (WRITEDEFINE)
bReadSTClear  : FALSE
bWriteSTClear : FALSE
bWriteDefine  : FALSE
Size          : 54 (0x36)

I was hoping that the TPM_NV_INDEX_LOCK (defined in index 0xffffffff) would be missing and that defining it would solve my problem. From the output above you can see that it’s not listed. The only relevant data I could find on the web about defining this index was a post on the tboot devel list with somone trying to use the tpmj utility. Digging into all of that java seemed like way too much work so I gave the tpm_nvdefine utility a go:

root@intel-core-i7-64:~# tpm_nvdefine --index=0xffffffff --size=0
Successfully created NVRAM area at index 0xffffffff (4294967295).

Success! I had hoped then that executing tpm_nvindex again would show this new index … but it doesn’t. So other than the “success” message from tpm_nvdefine we have no way of knowing whether or not the new index was actually defined. The easiest way to test this is to try booting with tboot again and hope that the error goes away. And it does:

TBOOT: TPM: TPM Family 0x0
TBOOT: TPM is ready
TBOOT: TPM nv_locked: TRUE
TBOOT: TPM timeout values: A: 750, B: 750, C: 750, D: 750

Conclusion

So that’s how you define the TPM_NV_INDEX_LOCK TPM index on your IVB NUC. This effectively locks the TPM NVRAM on a platform that ships with the the TPM NVRAM unlocked. Until now I had only seen this on Lenovo systems (lots of them) but I guess Intel is shipping platforms like this too. Having some automated way to detect and fix platforms in this state would be really nice …

Getting serial output on my Ivy Bridge NUC

I’d been using a rather old Sandy bridge system (Intel DQ67EP + i7 2600S) to test my work on meta-measured for a long time. Very nice, very stable system. But with Intel getting out of the motherboard business I started eyeing their new venture: the NUC.

The DC53427HYE vPro IVB NUC

Everything is getting smaller and thankfully Intel has finally caught on. Better yet they’re supporting TXT on some of these systems and so when the Haswell NUC was released over the summer the price on thevPro Ivy Bridge NUC (DC53427HYE) finally dropped enough to put it in my price range. Intel opted to skip the vPro NUC for Haswell anyways so it was my only option.

Let the fun of testing TXT on a new system begin! Like any new system we hope it works “out of the box”. But with TXT, odds are it won’t. My SNB system was great but this NUC … not so much, yet. The kicker though is that as systems get smaller something’s got to give. Space ain’t free and … well who needs a serial port anyways right?

NUC IVB guts

Where’s my serial?

So without serial hardware, debugging TXT / tboot is pretty much a lost cause. Sure you can slow down the VGA output with the vga_delay command line option. But if you want to actually analyze the output you need to be able to capture the text somehow and setting vga_delay to a large value and then copying the output by hand doesn’t scale (and it’s a stupid idea to boot). So the search for serial output continues.

To get TXT we must … ::cough:: … endure the presence of the Management Engine (ME) and it’s supposed to have a serial console built in. The docs for the system even say you can get BIOS output from the ME serial console. But for whatever reason, I spent an afternoon messing about with it and made no progress.

I’ve no way to know where the problem with this lies. There are tools for accessing the ME serial console for Linux but I couldn’t get early boot output. Setting up a serial console login for a bare metal Linux system worked but no early boot stuff (BIOS, grub or tboot). Judging by the AMT docs for Linux: you can’t count on the ME serial interface for much. The docs state that if you use Xen then the ME will get the DHCP address all messed up and that setting a static address in the ME interface just doesn’t work. So long story short, the ME serial interface is limited at best and these limitations preclude getting early boot messages like those from tboot.

Now that the ME bashing is done we must fall back on real serial hardware. Thankfully this thing has both a half height and a full height mini-PCIe slot and a market for these arcane serial things still exists. StarTech fills this need with the 2s1p mini PCIe card. This is a great little piece of hardware but the I/O ports aren’t the default (likely to prevent conflict with on-board serial hardware) so we’ve gotta do some work before tboot will use it for ouput messages.

StarTech mini-PCIe serial card

NUC IVB with serial card

We have serial! Now what?

With some real serial hardware we’re half way there. Now we need to get tboot to talk to it. Unfortunately just adding serial to the logging= parameter in the boot config isn’t sufficient. The default base address for the serial I/O port used by tboot is 0x3F8 (see the README). This address corresponds to the “default serial port” aka COM1. So our shiny new mini-PCIe serial hardware must be using a different port.

tboot will log to an alternative port but we need to find the right I/O port address for the add on card. If you’re like me you keep a bootable Linux image on a USB drive handy for times like these. So we boot up the NUC and break out lspci to dump some data about our new serial card:

02:00.0 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
02:00.1 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])

Not a bad start. This card has two serial ports and it shows up as two distinct serial devices. To get the I/O port base address we need to throw some -vvv at lspci. I’ll trim off the irrelevent bits:

02:00.0 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
        Subsystem: Device a000:1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at e030 [size=8]
        Region 1: Memory at f7d05000 (32-bit, non-prefetchable) [size=4K]
        Region 5: Memory at f7d04000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
.
.
.
02:00.1 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
        Subsystem: Device a000:1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 18
        Region 0: I/O ports at e020 [size=8]
        Region 1: Memory at f7d03000 (32-bit, non-prefetchable) [size=4K]
        Region 5: Memory at f7d02000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
.
.
.

The lines we care about here are:

Region 0: I/O ports at e030 [size=8]
Region 0: I/O ports at e020 [size=8]

So the I/O port address for 02:00.0 is 0xe030 and 02:00.1 is 0xe020. The 9 pin headers on the board are labeled S1 and S2 so you can probably guess which is which. With the NUC booted off my Linux USB key we can dump more data bout the hardware so we know for sure but with a serial cable hooked up to S1 I just threw some text at the device to see if something would come out the other end:

echo "test" > /dev/ttyS0

Sure enough I got "test" out. So I know my cable is hooked up to ttyS0. Now to associate /dev/ttyS0 with one of the PCI devices so we can get the I/O port. Poking around in sysfs is the thing to do here:

ls /sys/bus/pci/devices/02:00.0/tty/
ttyS0

With all of this we know we want tboot to log data to I/O port 0xe030 so we need the following options on the command line: logging=serial serial=115200,8n1,0xe030.

Next time

Now that I’ve got some real serial hardware and a way to get tboot to dump data out to it I can finally debug TXT / tboot. We’ll save that for next time.

twobit auto builder part 2

In my last post on automating my project builds with buildbot I covered the relevant buildbot configs. The issue that I left unresolved was the triggering of the builds. Buildbot has tons of infrastructure to do this and their simple would be more than sufficient for handling my OE projects that typically have 3-5 git repos. But with OpenXT there are enough git repos to make polling with buildbot impractical. This post covers this last bit necessary to get builds triggering with a client side hook model.

twobit-git-poller

Now that we’ve worked out the steps necessary to build the software we need to work out how to triggers builds when the source code is updated. Buildbot has extensive infrastructure for triggering builds ranging from classes to poll various SCM systems in a “pull” approach (pulled by buildbot) to scripts that plug into existing SCM repos to run as hook scripts in a “push” approach (pushed by the SCM).

Both approaches have their limitations. The easiest way to trigger events is to use the built in buildbot polling classes. I did use this method for a while with my OE projects but with OpenXT it’s a lost cause. This is because the xenclient-oe.git repo clones the other OpenXT repos at the head of their respective master branches. This means that unlike other OE layers the software being built may change without a subsequent change being made to the meta layer. To use the built in buildbot pollers you’d have to configure one for each OXT git repo and there’s about 60 of them. That’s a lot of git pollers and would require a really long and redundant master config. The alternative here is to set up a push mechanism like the one provided by the buildbot in the form of the git_buildbot.py script.

This sounds easy enough but given the distributed nature of OpenXT it’s a bit harder than I expected but not in a technical sense. Invoking a hook script from a git server is easy enough: just drop the script into the git repo generally as a ‘post-commit’ hook and every time someone pushes code to it the script will fire. If you control the git server where everyone pushes code this is easy enough. Github provides a similar mechanism but that would still makes for a tight coupling of the build master to the OpenXT Github org and here in lies the problem.

Tightly coupling the build master to the OpenXT Github infrastructure is the last thing I want to do. If there’s just one builder for a project this approach would work but we’d have to come up with a way to accommodate others wanting to trigger their build master as well. This could produce a system where there’s an “official” builder and then everyone else would be left hanging. Building something that leaves us in this situation knowingly would be a very bad thing as it would solve a technical problem but produce a “people problem” and those are exponentially harder. The ideal solution here is one that provides equal access / tooling such that anyone can stand up a build master with equal access.

Client-side hooks

The only alternative I could come up with here is to mirror every OpenXT repo (all 60 of them) on my build system and then invent a client side hook / build triggering mechanism. I had to “invent” the client side hooks because Git has almost no triggers for events that happen on the client side (including mirrors). My solution for this is in the twobit-git-poller.

It’s really nothing special. Just a simple daemon that takes a config specifying a pile of git repos to mirror. I took a cue from Dickon and his work here by using a small bit of the Github API to walk through all of the OpenXT repos so that my config file doesn’t become huge.

The thing that makes this unique is some magic I do to fake a hook mechanism as much like post-receive as possible. This allows us to use a script like the git_buildbot.py unmodified. So I expanded the config file to specify an arbitrary script so others can expand on this idea but this script is expected to take input identical to that produced by the post-receive and expected by the git_buildbot.py script.

This is implemented with a naive algorithm: every time the git poller daemon is run we just iterate over the available branches, grab the HEAD of each before and after the fetch and then dump this data into the hook script. I don’t do anything to detect whether or not there was a change even, I just dump data into the hook script. This allows us to use the git_buildbot.py script unmodified and buildbot is smart enough to ignore a hook even where the start and end hashes are the same. There are some details around passing authentication parameters and connection strings but those are just details and they’re in the code for the interested reader.

With this client side hooking mechanism we no longer need the poller objects in buildbot. We can just hook up a PBChangeSource that listens for data from the twobit-git-poller or any other change source. AFAIK this mechanism can live side by side with standard git hooks so if you’ve got an existing buildbot that’s triggered by some git repos that your team is pushing to using this poller shouldn’t interfere. If it does let me know about it so we can sort it out … or you could send me a pull request :)

Wrap Up

When I started this work I expected the git poller to be something that others may pick up and use and that maybe it would be adopted as part of the OpenXT build stuff. Now that I think about it though I expect the whole twobit-git-poller to be mostly disposable in the long run. We’ve already made huge progress in aligning the project with upstream Open Embedded and I expect that we’ll have a proper OE meta layer sooner than later. If this actually comes to pass there won’t be enough git repos to warrant such a heavy weight approach. The simple polling mechanisms in in buildbot should be sufficient eventually.

Still I plan to maintain this work for however long it’s useful. I’ll also be maintaining the renevant buildbot config which I hope to generalize a bit. It may even become as general as the Yocto autobuilder but time will tell.

twobit auto builder part 1

I’ve been a bit out of commission for the past month: starting a new job and moved a long way to be close to it. Before the move I had a bit of a backlog of work to write about and things have finally settled down enough for me to start working that off. This first post is about the work I’ve done to learn a bit about buildbot and how to use it to build some stuff I’ve been working on.

I’ve always been a bit of a “build junkie” and I’m not ashamed to admit that I really like to write Makefiles by hand. Learning buildbot was only partially motivated by masochism though. The other half was motivated by my need to build a few different projects including the meta-measured OE layer, the meta-selinux OE layer and OpenXT.

I’ve put this work up on github and it still needs some work before it’s portable to anything beyond my build setup. I hope to make this reproducible by others but before I do that I want to:

  1. introduce these repos
  2. cover some of the buildbot internals but only the bits required to make the build work

In doing this work I definitely fell down the rabbit hole a bit and started reading a pile of buildbot code but a lot of it was above and beyond what’s necessary to get this stuff building. I’m often guilty of wanting to understand everything about a piece of code before I start using it but that’s not always the right approach. I’ll do you all a solid and leave this stuff out to save you the hassle. This first post will cover the repo containing the buildbot config I’m using. My next post will cover some support code I’m using to trigger builds. I’ll focus on the bits necessary to build OpenXT initially and will include a quick write-up of my other side projects soon after.

twobit-buildbot

There are two repos that make up this work. The first is the actual buildbot config for both the build master and slave(s). It’s up on github here: twobit-buildbot. The second repo is a twisted python daemon that supports the build master and slave. I’m calling this last repo twobit-git-poller. This post will discuss this prior. My next post will cover the git poller.

Build Master

The buildbot config for both the master and the slave(s) I’m using are available on Github here: twobit-buildbot Buildbot configs are a bit hard to call “configs” with any seriousness since they’re proper python code. Still, it’s “config-like” in that there’s really no business logic here. It’s just a pile of classes that you filled with data and passed to the buildbot guts for execution.

If I go line by line through the config then this will quickly digress into a buildbot tutorial. There are plenty of those out there and if you’re interested in getting a deeper understanding then you should start with the buildbot docs. The steps that buildbot goes through to build a project are all wrapped up in a container called a BuildFactory. In my repo the BuildFactory for OpenXT is defined here. This class wraps an array of BuildStep objects that define a series of steps required to build the software.

The BuildStep objects are processed in the order they’re passed to the BuildFactory. I’ll run through these in order just to give you a feeling for the build progression:

  1. The first thing in building OXT is to clone the build scripts. Buildbot knows where on the build slave to check this stuff out so all we do is provide some metadata to tell the buildbot git fetcher how we want the checkout to happen. OpenXT doesn’t yet support incremental builds so for now each new build requires a clean checkout.
  2. Next we execute a shell command on the slave to set up the build config. To do this we use the ShellCommand class which is exactly what you’d expect: it executes shell commands. But with buildbot it’s always a question of where: on the master or on the slave? AFAIK for most build bot stuff you should assume that the action will happen on the slave unless otherwise noted. I’ve rigged the build master configs such that the script name isn’t used explicitly. Instead it’s a shell variable that the slave is expected to set here. I’ll describe this a bit more in the next section describing the build slave.
  3. The next build steps are a series of shell commands that execute the series of build steps that make up the build of the OXT build. These are hard coded as the minimum necessary to create an installer iso. These are all invocations of the do_build.sh script. We could just as well just execute the do_build.sh script and have it get the build steps from the .config file but it’s nice to execute them one by one so that the buildbot UI shows them as unique steps. This also makes it easier to go through the output logs when things go wrong.
  4. Once the main build steps are done we need to get the build output to a place where people can download it. So the next step runs a MasterShellCommand to create a directory on the build master. In the case of my build master this is on an NFS mount that gets served up by my webserver. That should tell you how we serve up the build output once it’s on the build master.
  5. Now that we’ve got a place to stash the installer iso we need to move it from the build slave to the build master. OXT has some script functions in the build script to rsync build artifacts to various places. Buildbot has a build step to handle this for us so I’ve chosen to use that instead. Hopefully we’ll retire the bulk of the custom OXT scripts and use the tools available like the FileUpload class in buildbot.
  6. Unfortunately I ran into a nasty buildbot limitation after getting the build upload to work. The version of buildbot packaged for Debian Wheezy runs the buildbot daemon with very paranoid umask (022). This was resolved 3 years ago but Wheezy doesn’t have the fix. I opted to hack around this with an extra build step instead of requiring people to install buildbot from source or apply a patch to the Debian packages. This hack is ugly but it just runs a scrip on the build master and fixes up the permissions on the newly uploaded file and directory housing it.

The BuildFactory describes how to build something but we still need something to do the building. This is the job of the build slave. The BuildlerConfig is an object that associates a build slave with a BuildFactory. When the build master determines that a build needs to be run the buildbot code will go through the available BuilderConfig objects, find the right config, and then walk the build slave through the right series of BuildSteps.

Build Slave

Naturally we assume that the build slave executing these steps is capable of building the project which in this case is OpenXT. That is to say you’ve followed the steps on the OpenXT wiki to set up a build machine and that you’ve tested that it works. Further, I’ve only tested this with the buildbot package from Debian Wheezy. The build master can be a simple Wheezy system, no need to jump through the hoops to make it an OXT build system since it won’t be building any code.

The config for a build slave is vastly more simple than the master. It only needs to be able to connect to the master to receive commands. The only really interesting bit of the build slave config is in a few of the environment variables I set.

Each of the OE images build my slave is building requires a sort of generic “setup” step where the slave sets some values in a config file. This is simple stuff like the location of a shared OE download directory or something specific to OpenXT like the location of the release signing keys. In each case there’s a script in a bin directory. For OpenXT that’s here.

The path to these scripts doesn’t need to be known by the build master. Instead the build master expects the slave to have a specific environment variable pointing to this script. The build master then tells the slave to execute whatever this variable points to. The build slave only has to set this variable properly in its config like this.

Wrap Up

That’s just a quick rundown of the buildbot bits I’ve strung together to build my pet projects and OpenXT. There’s one glaring gap in this: how buildbot knows when it needs to run builds. This is usually a pretty simple topic but with OpenXT there were a few things I had to work around. I’ll fill this in next time.

Till then I’ll be keeping the configs for my build master and slave on github and as up to date as possible. Feel free to reproduce this work in your own environment if you’re interested. I may or may not keep the build master exposed publicly in the future. For now it’s just intended to be an example and a convenient tool.

OpenXT: Contingencies Abandoned

This post is long overdue. I’ve been experimenting with using OE as a means to building measured systems for a while now. Back before Openxt became a reality I was hedging bets and working on some overlapping tech in parallel. Now that OpenXT is available as OSS I’m going to abandon some of this work and shift focus to OpenXT. I do however feel like there should be some record of the work that was done and some explanation as to why I did it and how it relates to OpenXT.

Here goes …

Building systems with security properties

All of this nonsense began with some experimentation in using OE as a means to build measured systems. For some reason I think that a sound integrity measurement architecture is necessary for the construction of software systems with meaningful security properties. All of the necessary component parts were available as open source but there were few examples showing how they could be integrated into a functional whole. Those that did were generally research prototypes and weren’t maintained actively (need references). My work on meta-measured was the first step in my attempt to make the construction of an example measured system public and easily buildable.

Building a measured systems with the Xen hypervisor as a primary component was another part of this work. I wasn’t using virtualization for the sake of virtualization though. Xen was a means to an end: its architecture allows for system partitioning in part through the Isolated Driver Domain model like the example I describe here. The NDVM is just the “low hanging fruit” here but it serves as a good example of how OE can be used to build very small Linux VMs that can serve as example device domains. There are ways to build smaller IDDs but IMHO a Linux image < 100MB is probably the point of diminishing returns currently. Hopefully in the future this will no longer be the case and we'll have IDDs based on unikernels or even smaller things.

Small, single purpose systems are important in that they allow us to extend the integrity measurement architecture into more fine-grained system components. Ideally these IDDs can be restarted so that the integrity state of the system can be refreshed on a periodic basis. By disaggregating the Xen dom0 we increase the granularity of our measurements from 1 (dom0) to 1 + the number of disaggregated components. By restarting and remeasuring these components we provide better "freshness" properties for systems that remain on-line for long periods of time.

This of course is all built on the initial root of trust established by hardware and the software components in meta-measured. Disaggregation on the scale of previously published academic work is the end goal though with the function of dom0 reduced to domain construction.

The final piece of this work is to use the available mandatory access control mechanisms to restrict the interactions between disaggregated components. We get this by using the XSM and the reference policy from Xen. Further, there will always be cases where it’s either impossible or impractical to decompose some functions into separate VMs. In these cases the use of the SELinux MAC policy within Linux guests is necessary.

The Plan

So my plan went something like this: Construct OE layers for individual components. Provide reference images for independent test. One of these layers will be a “distro” where the other components can be integrated to become the final product. This ended up taking the form of the following meta layers:

  • meta-measured: boot time measurements using the D-RTM method and TPM utilities
  • meta-virtualization: recipes to build the Xen hypervisor and XSM policy
  • meta-selinux: recipes to build SELinux toolstack and MAC policy
  • meta-integral: distro layer to build platform and service VM images

Some of these meta layers provide a lot more functionality than the description given but I only list the bits that are relevant here.

Where I left off

I had made significant progress on this front but never really finished and didn’t write about the work as a whole. It’s been up on Github in a layer called ‘meta-integral‘ (i know, all the good names were taken) for a while now and the last time I built it (~5 months ago) it produced a Xen dom0 and an NDVM that boots and runs guests. The hardest work was still ahead: I hadn’t yet integrated SELinux into dom0 and the NDVM, XSM was buildable but again, not yet integrated and the bulk of disaggregating dom0 hadn’t even yet begun.

This work was a contingency though. When I began working on this there had been no progress made or even discussion of getting OpenXT released as OSS. This side project was an outlet for work that I believe needs to be done in the open so that the few of us who think this is important could some day collaborate in a meaningful way. Now that OpenXT is a real thing I believe that this collaboration should happen there.

Enter OpenXT: aka the Future

Now that OpenXT is a reality the need for a distro layer to tie all of this together has largely gone away. The need for ‘meta-integral’ is no more and I’ll probably pull it down off of Github in the near future. The components and OE meta layers that I’ve enumerated above are all still necessary though. As far as my designs are concerned OpenXT will take over only as the distro and this means eliminating a lot of duplication.

In a world where OpenXT hadn’t made it out as OSS I would have had the luxury of building the distro from scratch and keeping it very much in line with the upstream components. But that’s not how it happened (a good thing) so things are a bit different. The bulk of the work that needs to be done for the project to gain momentum now is disentangling these components so that they can be developed in parallel with limited dependencies.

Specifically we duplicate recipes that are upstream in meta-virtualization, meta-selinux and meta-measured. To be fair, OpenXT actually had a lot of these recipes first but there was never any focus on upstreaming them. Eventually someone else duplicated this work in the open source and now we must pay off this technical debt and bring ourselves in-line with the upstream that has formed despite us.

What’s next?

So my work on meta-integral is over before it really started. No tragedy there but I am a bit sad that it never really got off the ground. OpenXT is the future of this work however so goal number one is getting that off the ground.

More to come on that front soon …

Apache VirtualHost config gone after Wheezy to Jessie upgrade

Here’s a fun one that had me running in circles for a while today:

I’ve been running deluge and the deluge-webui on Debian Wheezy for a while now. Pretty solid. I needed to download a torrent using a magnet URI today and deluge-webui on Wheezy won’t do it. This feature was added to the webui in 1.3.4 though so the version in Jessie should work.

I did the typical dist-upgrade song and dance per the usual but after the upgrade Apache was all hosed up. It was just showing the default example page. All of access logs that would normally go to my configured virtual host were landing in /var/log/apache2/other_vhosts_access.log which is all wrong. I started out thinking it was the hostname of the system that got messed up but that was a dead end.

I started making progress when I found the command

apache2ctl -S

This dumps out a bunch of data about your configuration and it basically said that my VirtualHostM configuration was empty:

VirtualHost configuration:

Yeah it was basically an empty string. This seemed wrong but I wasn’t sure what to expect really. After banging around a bit longer and getting no where I finally decided to just disable and re-enable my site configuration. This was desperation because my site config was already linked into /etc/apache2/sites-enabled so it must have been enabled … right?

a2dissite mysite

But disabling it failed! It gave me some sort of “no such file” error. Whaaaaaa?. So I ran the commend through strace and it turns out that the new apache2 package on Jessie expects the site config file to have the suffix .conf. Changing the name of my site config fragment fixed this and I was then able to enable the config as expected.

That was unbelievably annoying. Hopefully this will save someone else a few minutes.

First OpenXT build

UPDATE; 2014-11-26 I’ve disabled comments on this post. All discussion about building OpenXT should be on the mailing list: https://groups.google.com/forum/#!forum/openxt
UPDATE; 2014-07-25 This page has been superseded by the documentation available on the OpenXT github wiki: https://github.com/OpenXT/openxt/wiki
UPDATE: 2014-06-20 added section about using bash instead of dash
UPDATE: 2014-06-19 fix git clone URI
UPDATE: 2014-06-18 add genisoimage package to list of required packages.
UPDATE: 2014-06-18 remove the section on mangling the manifest file now that I’ve upstreamed a patch.
UPDATE: 2014-06-18 to reflect default mirror being available now.
UPDATE: 2014-06-18 to clarify location of STEPS variable and the setupoe step.

With the transition of XT to OpenXT I’m realizing that the mark of most successful open source projects is good tools and great documentation. Right now we’re a bit short on both. The tools to build the core of OpenXT aren’t our own, they’re maintained by the upstream OpenEmbedded and Yocto communities. The documentation to build OpenXT on the other hand is our responsibility. This is a quick recap of my first build of the code that’s up on github with step-by-step instructions so that you can follow along at home.

The Existing Docs

The closest thing to build docs that we have on github are a README that was left over from a previous attempt to open source this code. That work was done under the name “XenClient Initiative” or XCI for short. This project was before my time on the project but my understanding is that it wasn’t particularly successful.

I guess you can call OpenXT our second go at the open source thing. The instructions in this file are way out of date. They need to be replaced and hopefully this write-up will be the first step in fixing this.

Build Machine

There’s a lot of technical debt that’s built up over the years in the OpenXT code base. The first and most obvious bit of technical debt is in our build system. We require that the build be done on a 32 bit Debian Squeeze system. The 64 bit architecture may work but it’s untested AFAIK.

We require Squeeze for a number of reasons the most obvious of which is our dependency on the GHC 6.12 compiler. Wheezy ships with version 7 and our toolstack hasn’t been updated to work with the new compiler yet. To the Haskell hackers out there: your help would be much appreciated. No doubt there are other dependencies and issues that would need to be worked around in an upgrade but we know this one to be a specific and prominent issue.

The requirement for a 32 bit build host is likely that only 32 bit build hosts have been tested. Any one out there who tries a 64 bit build or a build on Wheezy please report your results so we can get documentation together for the tested and supported build hosts.

Required Packages

The initial list of packages required to build OpenXT can be obtained from the OE wiki. The requirements are:

sed wget cvs subversion git-core coreutils unzip
texi2html texinfo docbook-utils gawk python-pysqlite2
diffstat help2man make gcc build-essential g++
desktop-file-utils chrpath

The list is pretty short as far as build requirements go because OE builds nearly all of the required tools as part of the build. This is a big part of what makes OE so great.

Additionally we require a few extra packages:

ghc guilt iasl quilt bin86 bcc libsdl1.2-dev liburi-perl genisoimage

Packages like guilt and quilt are used in our bitbake patch queue class in the expected way. ghc is the Hasekll compiler which is required to … build the Haskell compiler (much like OE requires gcc to build gcc). genisoimage is used by our build scripts to put the final installer ISO together.

The remaining dependencies: iasl, bin86, bcc, libsdl1.2-dev, and liburi-perl are another instance of technical debt. These packages should be built in OE as dependencies of other packages. Instead our recipes take a short cut and require they be installed on the build host. This seems like a smart shortcut but it’s a shortcut that ends in cross-compile misery. This may be what causes issues between 32 and 64 bit build hosts.

A good example of how to fix these issues already exists. If you’ve been following upstream Yocto development the Xen recipe contributed there gets this right depending on the dev86-native and iasl-native packages. OpenXT would benefit from pulling in the meta-virtualization layer and using this recipe (thanks Chris!)

Bash vs Bourne

Bitbake recipes contain a lot of shell functions and fragments. Per the these will be executed by the host systems /bin/sh. Unfortunately lots of build metadata (including the OpenXT build metadata) is rife with ‘bashisms’. Because of this, Linux distros that don’t link /bin/sh to /bin/bash will cause builds to fail.

The way to resolve this is laid out in the Ubuntu section of the “OE and Your Distro” docs as Debian and thus Ubuntu use the dash shell instead of bash by default. Switching dash for bash is pretty easy thankfully:

sudo dpkg-reconfigure dash

Executing the command above will result in a dialog screen asking you whether or not you want to use dash as your default shell. Select ‘No’ and your system will use bash instead.

I’ve gone ahead and filed a ticket to get ‘bashisms’ out of the first place I ran into them in OpenXT: https://github.com/OpenXT/xenclient-oe/issues/1. If you’ve got some time to kill it would be helpful if someone could track down more of our dirty laundry like this, or better yet, send in a pull request to sort some of this out.

Clone and Configure

If you’re following along you should now have a 32 bit Debian Squeeze build host with some additional packages installed. The next step is to clone the base OpenXT git repo:

git clone git://github.com/OpenXT/openxt.git

This will give you a directory named openxt that contains our build scripts. Change into this directory and we’ll take a look at the important files.

Firstly the file you’ll most often customize in here is the .config but you don’t have one yet. Copy the example-config file to .config so we can modify it for our environment:

cp example-config .config

The .config file is read by the script do_build.sh that … does the build. There are a hand full of variables in .config that are interesting for our first build. For now we’ll ignore the rest.

STEPS

We’ll start with STEPS. This one isn’t defined in the example-config but it’s initialized in the do_build.sh script. Since this script imports all variables from the config we can add it to the config manually to get the desired effect.

This variable defines the build steps that will be carried out when do_config.sh is run with no options. The default steps don’t all work yet so we’ll set this to the minimum that we can get away with for now:

STEPS="initramfs,stubinitramfs,dom0,uivm,ndvm,syncvm,installer,installer2,ship,copy"

I’ve left out the necessary setupoe step because I typically run this one manually and check that my variables get populated in the OE local.conf successfully. You don’t need to do it this way but it may help you get a better understanding of how the configuration works. After I go through the necessary variables we’ll go back to setupoe.

OPENXT_MIRROR

Due to some of the software used in OpenXT being a bit old the upstream mirrors of a few source tarballs are no longer available. Further we require a number of Intel binary ACM modules for TXT to function properly. Intel hides these zips behind a lawyer wall requiring users to accept license terms before they can be downloaded. That’s great CYA from their legal department but it prevents any automated build from downloading them in the build so we have to mirror them ourselves (which is permitted by their license).

When I did my first build the other day the URL from the example configuration file wouldn’t resolve for me. So I set up my own mirror:

OPENXT_MIRROR="http://www.twobit.us/mirrors/openxt"

This should be fixed so check the default mirror first. If it doesn’t work feel free to clone my mirror in your local environment but do me a favor and go easy on the bandwidth please.

UPDATE: The default mirror is fixed. You should use the default value (http://openxt.xci-test.com/mirror/) and not my mirror … well really you should set up your own mirror because there’s no way to tell how long the default mirror will stay up and it’s nice to keep redundant traffic to a minimum.

Signing Certs

The topic of signing releases is a huge one so for now we’ll just stick to the minimum work required to get our build signed once it’s done. There are several relevant config variables for this:

REPO_PROD_CACERT="/path/to/prod-cacert.pem"
REPO_DEV_CACERT="/path/to/dev-cacert.pem"
REPO_DEV_SIGNING_CERT="/path/to/dev-cacert.pem"
REPO_DEV_SIGNING_KEY="/path/to/dev-cakey.pem"

We require that each builder create their own keys for development and release builds. In this example I’m using self signed certs so it’s as simple as possible. Use the following commands to create your keys and your self signed certs:

openssl genrsa -out cakey.pem 2048
openssl req -new -x509 -key cakey.pem -out cacert.pem -days 1095

You’ll need two key/cert pairs, one for the automated signing of the build (a ‘dev’ key) and the certificate for a production signing key. All protection of the production signing key is the responsibility of whoever is signing the release. I’ll cover this in another post at another time. For now just make the keys and the certs and put the variables in the .config file.

OE_BUILD_CACHE_DL

For those of you who have used OE in a previous life you know how huge an OE build directory can get. Part of this is caused the the OE download cache which is the directory where OE caches downloaded source code. In the OE local.conf file this is specified by the DL_DIR variable. We use OE_BUILD_CACHE_DL instead.

Personally my build system has a RAID SSD set up to keep my builds as fast as possible but I can’t afford enough SSDs to support having my DL_DIR on my SSD storage. Typically I’ll use larger but slower storage (NFS RAID array) for my download mirror that I share between projects. Often times I’ll just link that slower storage mount directly into my build tree to keep everything in one place. Do whatever works best for you and remember this is completely optional. You can leave this out and the build will just use a directory in your build but remember this will make your build much larger:

OE_BUILD_CACHE_DL="/path/to/oe-download"

Even with the download cache on a separate volume an OpenXT build takes up a lot of disk space. This minimal build of just the core service VMs and no in-guest tools weighs in at 74G. My download cache is shared between all of my OE projects so I can’t say exactly how large the cache for a fresh OpenXT build will be. My combined cache is ~20G but I’d expect the files for OpenXT are a small portion of that.

Start Your Engines

That’s about all it should take. Next I run the do_build.sh script explicitly executing the setupoe step:

./do_build.sh -s setupoe

This step clones all of the git repos you need to build OpenXT (mostly OE meta-layers) and then processes your .config file to generate the OE local.conf. You can take a look at this file to see the variables and make sure they’re set to the right thing. This can be very helpful when debugging or if you want to manually change them for some reason.

After OE has been setup you can just run the do_build.sh script and then go drink coffee for the next 6 hours:

./do_build.sh | tee build.log

If all goes well at the end of this you’ll have an ISO with the OpenXT installer on it ready to go here:

./build-output/openxt-dev--master/iso/installer.iso

There’s no magic here. The comma separated list of steps we put into the STEPS variable can all be invoked one at a time the same way we ran setupoe. So if you want to build just the dom0 initramfs you can do so like this:

./do_build.sh -s initramfs | tee initramfs.log

Piping the build output to a file is always useful when things break (and they will break). My recommendation would be to build a few steps individually before you go and do a full build. I’ll probably do another post about the individual build steps and how all that works next.

Now go forth, build, install, and report bugs.