Got to put stuff on XSM in somewhere.
The problem is that, while it seems interesting and there's obviously some interest in getting it running (since it's been merged,) there's no documentation. I'm not inherently interested enough in security to wade through the snippets i can find and figure out what capabilities XSM buys me, and how to enable them.
Nonetheless, got to be done. Probably in the increasingly mythical "tips" chapter.
"As of recent Xen versions, the dom0 administrator can use the ionice command to set i/o priorities for domUs."
Got to figure out where to put that.
Productive day, in a sense. Not much editing got done, but we spent a lot of time testing stuff in the provisioning and profiling chapters. I'm adding a section on using pypxeboot directly, which I'm. . . kind of enthusiastic about. Worked out some bugs in our explanation of multiple-domain profiling.
I don't think I ever feel as if I'm faking it more intensely than when I write about profiling. I mean, this is Xen for sysadmins. I'm a sysadmin. If I need to use the profiler something has almost certainly gone wrong.
But we did hit a _perfect_ example, which I'm rewriting the section around. (Mirrored LVM spends way too much time in IOwait.) I've got to compose that email to the xen-devel list, see if there's something related to the pit_read_counter function that's causing this to happen. That's what my oprofile runs suggest. Now we just need some confirmation, maybe a happy ending to this story.
Troubleshooting.
Of course, the troubleshooting chapter isn't supposed to be an exhaustive compendium of errors. That's what the Internet is for. We've listed only error messages that lend themselves to easy solutions, or to illustrating some troubleshooting technique.
That means that messages like the following, which is a flat-out Xen / Linux bug (fixed in RedHat's .14, not sure about other distros) simply don't appear:
Bad pte = e5707067, process = ???, vm_flags = 100073, vaddr = 252000
[<c0453809>] vm_normal_page+0xb7/0xd3
[<c045454c>] unmap_vmas+0x3d1/0x761
[<c0458f3c>] exit_mmap+0x6d/0xe4
[<c041abd4>] mmput+0x25/0x69
[<c047084f>] flush_old_exec+0x62c/0x8b2
[<c046fcd7>] kernel_read+0x32/0x43
[<c048d0f1>] load_elf_binary+0x494/0x15e4
[<c0467043>] do_sync_read+0xb6/0xf1
[<c044d4af>] __alloc_pages+0x57/0x282
[<c04dcc45>] copy_from_user+0x31/0x5d
[<c04dcc45>] copy_from_user+0x31/0x5d
[<c046fa8a>] search_binary_handler+0x99/0x219
[<c04713bf>] do_execve+0x158/0x1f5
[<c040337d>] sys_execve+0x2a/0x4a
[<c040534f>] syscall_call+0x7/0xb
======================
It seems like it's dishonest to describe only problems that we've solved. But what we don't know -- well, that would fill volumes and be dispiriting.
Okay, so storage wasn't quite ready to go.
We had been a bit too trusting and not properly vetted certain whispered rumors of the dark and horrible consequences of letting your LVM snapshots fill up. In our defense, the way that people generally use LVM snapshots makes it very unlikely that they'll fill, and apparently it's not a commonly-seen failure, disk being cheap. . .
Anyway. No excuse. Experimental verification is the cornerstone of science, so we tested it.
# lvcreate -n origin -L 1G LogVol01
# lvcreate -n snap -L 100M --snapshot LogVol01/origin
Now, once we've made 100M of changes to origin, testsnap should fill up. If you've been reading the LVM snapshot warnings, the earth will then erupt in fire, pitch will rain from the sky, the crust will split and a cavernous maw with teeth the size of the Tokyo Tower will emerge to consume humanity. The lucky portion of it, anyway.
This did not happen. We filled it by making a filesystem and copying stuff in from /usr . Turns out that the machine keeps running merrily. There are still errors, of course:
device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception
Followed by a bunch of errors of the form:
Buffered I/O error on device dm-3, logical block 585
We unmounted the snapshot, tried to remount it, no dice. It was, as McCoy would say, dead. The original LV was fine, however.
Today I worked on the chapter that purports to tell people how to create domU images from scratch. It's got some minor technical issues -- I don't know what version of cobbler I used to test the stuff I wrote, but it's like nothing that I can find any reference for today. Wrote maybe 400 words, tested some cobbler and pypxeboot related stuff. I should also toss in some stuff about making a distro mirror. Maybe.
We also need to test pypxeboot again. I swear I've seen it work, but my memory has been. . . less than reliable lately. Luke claims that it doesn't work and never has. It'll all end in tears, I know it.
Apart from that, the other stuff looks ready to go -- tar, using the distro package manager, installing via qemu, even systemimager. It's mostly just cobbler that I'm worried about. Damn redhatisms. Oh, look, my spell check claims "redhatism" isn't even a word. TAKE THAT, REDHAT. BELIEF IN YOU IS TANTAMOUNT TO ERROR.