Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Make it possible to reboot into the final bootstrapped system #328

Closed
wants to merge 12 commits into from

Conversation

Googulator
Copy link
Collaborator

@Googulator Googulator commented Nov 13, 2023

GRUB build still needs to be converted into a true build step, with hash checking and proper package management.

@fosslinux
Copy link
Owner

I have a few questions with this PR.

  1. Why is there: increased RAM requirement to 8G, increased kexec target size to 256MB, 8G swap in sysc? This is not something that is particularly desirable arbitrarily.
  2. What is "PAE passthrough support"?
  3. Why does the entirety of sysa need to be copied into sysc?

(it is still a WIP is a valid answer to all of these!)

I like the idea of making the final image bootable, as this would allow one to come back to their live-bootstrap environment once it finishes.

If you don't mind, I might cherry-pick a few of these earlier fixes, namely Speed up sysa.img generation Change openssl source URL because the old one redirects to HTTPS Start Linux rootfs partition at 1M, not sector 1 separately to this PR, as they aren't really related to this PR but are still useful fixes.

I have a fairly large PR coming in the next few days that unfortunately will conflict a fair bit with this PR but should also allow this PR to be a bit cleaner.. apologies in advance, I will help refactor once that is in.

@Googulator
Copy link
Collaborator Author

While we don't technically need all of sysa to be brought into sysc, to make the final image bootable, it's necessary to bring over at least the Linux kernel. Bringing over the rest was something I did mainly so I can inspect the binaries created during the earlier stages, i.e. I wanted a "nothing gets left behind" build. Of course, all of this should be eventually put behind a flag, but for this draft, I haven't bothered to do that yet. (In fact, I'm planning to securely erase srcfs from both RAM and disk once it's no longer needed, as part of an additional anti-Trusting-Trust measure.)

The additional RAM and swap are only tangentially related to getting a bootable image - I was testing on a 16-thread CPU, and I had several bootstrap attempts fail due to OOM in sysc. Eventually, I want to make both swap (whether or not to create it, and if so, how much) and RAM configurable, as well as the size of the image created. (RAM is already configurable when kernel bootstrap is disabled, but hardcoded when it's enabled.)

"PAE passthrough" is really a fix for what I consider to be a bug in the Fiwix kernel. Fiwix doesn't support PAE (not that it would have much use for it with its 1GB RAM hard limit), but Linux does, and PAE is a requirement for the 32-bit Linux kernel to manage more than 4GB of RAM. PAE is already enabled in the kernel config, but Fiwix's memory map parsing code assumes that nothing downstream will support PAE, and deletes memory blocks >4GiB from the memory map. The PAE passthrough patches fix this: Fiwix now keeps track of memory blocks in PAE address space, internally ignoring them, but passing them on to any kexec target that can use them. (The code for this is here.)

@fosslinux
Copy link
Owner

While we don't technically need all of sysa to be brought into sysc, to make the final image bootable, it's necessary to bring over at least the Linux kernel. Bringing over the rest was something I did mainly so I can inspect the binaries created during the earlier stages, i.e. I wanted a "nothing gets left behind" build

OK, that makes sense.

In fact, I'm planning to securely erase srcfs from both RAM and disk once it's no longer needed, as part of an additional anti-Trusting-Trust measure

Could you elaborate on this? I'm not certain how this helps with trusting trust (I feel I may be misinterpreting this)

"PAE passthrough" is really a fix for what I consider to be a bug in the Fiwix kernel. Fiwix doesn't support PAE (not that it would have much use for it with its 1GB RAM hard limit), but Linux does, and PAE is a requirement for the 32-bit Linux kernel to manage more than 4GB of RAM. PAE is already enabled in the kernel config, but Fiwix's memory map parsing code assumes that nothing downstream will support PAE, and deletes memory blocks >4GiB from the memory map. The PAE passthrough patches fix this: Fiwix now keeps track of memory blocks in PAE address space, internally ignoring them, but passing them on to any kexec target that can use them.

Ahh, alright, understood. That's actually very useful, thank you!

@Googulator
Copy link
Collaborator Author

Googulator commented Nov 14, 2023

The anti-Trusting-Trust plan involves modifying builder-hex0 stage 1 (the boot sector) to print out the hex0 code of stage 2 as it gets loaded, and then having stage 2 likewise print out the contents of files in the srcfs (excluding gz/bz2 archives - those can get printed instead as they are decompressed). This way, everything in srcfs and before gets printed out (& can be securely recorded on a separate trusted medium where it can be read back for auditing) before it has a chance to execute and therefore go back to modify its own source; with the exception of the stage 1 boot sector (limited to 512 bytes). Then, as we jump into sysc, we erase srcfs, before we start downloading any other source code.

At the end of sysc, we redownload the sources needed for sysa, and build another sysa.img for bare metal usage - if the initial bootstrap was bare metal, this should be bit-for-bit identical to the one we originally built on our host system, and since we erased the original srcfs, it must have been really built from source, and not merely stashed away & then revealed by (potentially malicious) new code downloaded within sysc.

Of course, since we can't quite trust any tool either on the original host system, or in the bootstrapped environment, with comparing the 2 sysa images, we instead compare by writing the 2nd sysa.img to another drive, and bootstrapping again from it (ideally on new hardware never used for bootstrapping before, to exclude the possibility of a backdoor from a prior compromised bootstrap getting stashed within BIOS or other hidden storage). Record the output securely as before. This 2nd bootstrap can then be stopped once we can verify that sysc has been reached.

If we now compare the records from both bootstraps, and the sources printed out match, revealing no backdoor, then we can be certain the sysa.img we built is truly clean - even if our original host system was compromised.

As an additional security measure, the storage device we boot the 2 bootstrap machines from is built in a special way, to allow its capacity to be limited to 512 bytes in hardware - e.g. if the storage has an address bus, we put switches on address lines necessary to access storage beyond 512 bytes, such that with the switches open, reading beyond the 1st sector wraps around. Now, we further modify stage0 such that it first checks if it can see itself, hex0 code, or something else in the next sector:

  • If it sees itself, it prints the entire sector encoded in hex, and then tries again reading the same sector (this means the switches are still open).
  • If it sees hex0 code, it starts compiling it, printing out the hex0 source as it goes, and then executing the compiled binary (happens when the switches are closed).
  • If it sees anything else, it stops with a big error message (as this indicates compromise or corruption or the secure environment).

We program the secure storage device with the switches closed, so the entire area is accessible. Then, we open the switches, and boot from it. We watch the printouts to see the boot sector printing itself in hex, over and over again. If we see anything else - STOP, compromised!

Assuming we got the expected repeating print behavior, we close the switches. The moment we close the switch on the lowest relevant address line, the 2nd sector becomes accessible, and we see the boot sector move on to compiling our hex2. If anything else happens, or the timing is off - STOP, compromised!

Assuming we didn't have to stop, we can now be certain that no more than 512 bytes were executed (excluding BIOS, microcode, etc. which we unfortunately cannot fully control) without being printed first. That means, any self-propagating backdoor would need to fit itself, a compressed copy of the clean boot sector, and the decompression code, in 512 bytes. This is really hard, and we can make it even harder, hopefully to the point of impossibility, by replacing the padding 0s in the boot sector with random, incompressible data - just make sure it's actually random, and not deterministically generated from a seed smaller than 512 bytes.

@Googulator
Copy link
Collaborator Author

A heads up: 5bd1bad works fine in qemu, but I've seen it fail on bare metal with "I don't like these options". It seems when the medium is an actual ATA hard drive that can be queried for geometry, sfdisk will reject -S32 -H64 unless --force is also specified.

@Googulator
Copy link
Collaborator Author

All of this was merged in other PRs in an updated form.

@Googulator Googulator closed this Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants