-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Make it possible to reboot into the final bootstrapped system #328
Conversation
This improves partition alignment, and therefore performance on SSDs and AF (4K-sectored) HDDs. Also, it leaves some space before the partition to allow installing GRUB.
The kexec-fiwix part is a band-aid, since this hardcodes an additional 4GiB RAM right at the start of PAE address space. Proper fix would be passing on the real e820 memory map from BIOS.
I have a few questions with this PR.
(it is still a WIP is a valid answer to all of these!) I like the idea of making the final image bootable, as this would allow one to come back to their live-bootstrap environment once it finishes. If you don't mind, I might cherry-pick a few of these earlier fixes, namely Speed up sysa.img generation Change openssl source URL because the old one redirects to HTTPS Start Linux rootfs partition at 1M, not sector 1 separately to this PR, as they aren't really related to this PR but are still useful fixes. I have a fairly large PR coming in the next few days that unfortunately will conflict a fair bit with this PR but should also allow this PR to be a bit cleaner.. apologies in advance, I will help refactor once that is in. |
While we don't technically need all of sysa to be brought into sysc, to make the final image bootable, it's necessary to bring over at least the Linux kernel. Bringing over the rest was something I did mainly so I can inspect the binaries created during the earlier stages, i.e. I wanted a "nothing gets left behind" build. Of course, all of this should be eventually put behind a flag, but for this draft, I haven't bothered to do that yet. (In fact, I'm planning to securely erase srcfs from both RAM and disk once it's no longer needed, as part of an additional anti-Trusting-Trust measure.) The additional RAM and swap are only tangentially related to getting a bootable image - I was testing on a 16-thread CPU, and I had several bootstrap attempts fail due to OOM in sysc. Eventually, I want to make both swap (whether or not to create it, and if so, how much) and RAM configurable, as well as the size of the image created. (RAM is already configurable when kernel bootstrap is disabled, but hardcoded when it's enabled.) "PAE passthrough" is really a fix for what I consider to be a bug in the Fiwix kernel. Fiwix doesn't support PAE (not that it would have much use for it with its 1GB RAM hard limit), but Linux does, and PAE is a requirement for the 32-bit Linux kernel to manage more than 4GB of RAM. PAE is already enabled in the kernel config, but Fiwix's memory map parsing code assumes that nothing downstream will support PAE, and deletes memory blocks >4GiB from the memory map. The PAE passthrough patches fix this: Fiwix now keeps track of memory blocks in PAE address space, internally ignoring them, but passing them on to any kexec target that can use them. (The code for this is here.) |
OK, that makes sense.
Could you elaborate on this? I'm not certain how this helps with trusting trust (I feel I may be misinterpreting this)
Ahh, alright, understood. That's actually very useful, thank you! |
The anti-Trusting-Trust plan involves modifying builder-hex0 stage 1 (the boot sector) to print out the hex0 code of stage 2 as it gets loaded, and then having stage 2 likewise print out the contents of files in the srcfs (excluding gz/bz2 archives - those can get printed instead as they are decompressed). This way, everything in srcfs and before gets printed out (& can be securely recorded on a separate trusted medium where it can be read back for auditing) before it has a chance to execute and therefore go back to modify its own source; with the exception of the stage 1 boot sector (limited to 512 bytes). Then, as we jump into sysc, we erase srcfs, before we start downloading any other source code. At the end of sysc, we redownload the sources needed for sysa, and build another sysa.img for bare metal usage - if the initial bootstrap was bare metal, this should be bit-for-bit identical to the one we originally built on our host system, and since we erased the original srcfs, it must have been really built from source, and not merely stashed away & then revealed by (potentially malicious) new code downloaded within sysc. Of course, since we can't quite trust any tool either on the original host system, or in the bootstrapped environment, with comparing the 2 sysa images, we instead compare by writing the 2nd sysa.img to another drive, and bootstrapping again from it (ideally on new hardware never used for bootstrapping before, to exclude the possibility of a backdoor from a prior compromised bootstrap getting stashed within BIOS or other hidden storage). Record the output securely as before. This 2nd bootstrap can then be stopped once we can verify that sysc has been reached. If we now compare the records from both bootstraps, and the sources printed out match, revealing no backdoor, then we can be certain the sysa.img we built is truly clean - even if our original host system was compromised. As an additional security measure, the storage device we boot the 2 bootstrap machines from is built in a special way, to allow its capacity to be limited to 512 bytes in hardware - e.g. if the storage has an address bus, we put switches on address lines necessary to access storage beyond 512 bytes, such that with the switches open, reading beyond the 1st sector wraps around. Now, we further modify stage0 such that it first checks if it can see itself, hex0 code, or something else in the next sector:
We program the secure storage device with the switches closed, so the entire area is accessible. Then, we open the switches, and boot from it. We watch the printouts to see the boot sector printing itself in hex, over and over again. If we see anything else - STOP, compromised! Assuming we got the expected repeating print behavior, we close the switches. The moment we close the switch on the lowest relevant address line, the 2nd sector becomes accessible, and we see the boot sector move on to compiling our hex2. If anything else happens, or the timing is off - STOP, compromised! Assuming we didn't have to stop, we can now be certain that no more than 512 bytes were executed (excluding BIOS, microcode, etc. which we unfortunately cannot fully control) without being printed first. That means, any self-propagating backdoor would need to fit itself, a compressed copy of the clean boot sector, and the decompression code, in 512 bytes. This is really hard, and we can make it even harder, hopefully to the point of impossibility, by replacing the padding 0s in the boot sector with random, incompressible data - just make sure it's actually random, and not deterministically generated from a seed smaller than 512 bytes. |
A heads up: 5bd1bad works fine in qemu, but I've seen it fail on bare metal with "I don't like these options". It seems when the medium is an actual ATA hard drive that can be queried for geometry, sfdisk will reject -S32 -H64 unless --force is also specified. |
All of this was merged in other PRs in an updated form. |
GRUB build still needs to be converted into a true build step, with hash checking and proper package management.