Skip to content

Commit

Permalink
Update docs with new changes
Browse files Browse the repository at this point in the history
  • Loading branch information
fosslinux committed Dec 3, 2023
1 parent 2d372b4 commit 785dfa7
Show file tree
Hide file tree
Showing 3 changed files with 211 additions and 218 deletions.
60 changes: 29 additions & 31 deletions DEVEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,31 +14,40 @@ and that a full build completes.

## Structure

Each system corresponds to a reboot of the live environment. There is only one
appropriate structure as shown below (eg for sysa):

```
sysa
├── any-global-files.sh
seed
├── seed.kaem
├── script-generator.c
├── ...
└── stage0-posix
steps
├── manifest
├── any-global-files
├── jump
│   └── linux.sh
├── improve
│   └── x.sh
├── somepackage-version
│   ├── somepackage-version.kaem (or .sh)
│   ├── pass1.kaem
│   ├── pass2.sh
│   ├── files
│   ├── simple-patches
│   ├── mk
│   └── patches
└── tmp
```

Global scripts that drive the entire system go directly under `sysx`. `tmp`
contains the temporary system used for QEMU or a chroot.

Then, each package is in its own specific directory, named `package-version`.
It then diverges based upon which driver is being used:
The `seed` directory contains everything required for `script-generator` to be
run.

- `kaem`: A file named `package-version.kaem` is called by the master script.
- `bash`: The `build` function from helper.sh is called from the master script.
There are default functions run which can be overridden by an optional script
`package-version.sh` within the package-specific directory.
In the `steps` directory, the bootstrap process is defined in `manifest`.
Each package to be built is named `package-version`.
Each subsequent build of a package is the nth pass. Scripts are named
accordingly; eg, the first build would be called `pass1.sh`, the second would be
`pass2.sh`, etc.
Scripts run in kaem era should be denoted as such in their filename;
`pass1.kaem`, for example. Pass numbers do not reset after kaem, ie, you cannot
have both `pass1.kaem` and `pass1.sh`.

In this folder, there are other folders/files. `*.checksums` are
required for early packages that are build with kaem, others are optional.
Expand All @@ -51,21 +60,16 @@ Permissible folders/files:
- `simple-patches`: patches for the source that use the before/after convention of simple-patch.c
- `*.checksums`: files containing the checksums for the resulting binaries and
libraries that are compiled and installed.
- Up to and including `coreutils-6.10`, `sha256sum` from `stage0-posix`
is used for the checksumming. Later we switch to GNU version.
- To extract checksums of the binaries, use of qemu mode is recommended
(i.e. `./rootfs.py -q -qk $kernel --update-checksums`).
- compilation script

The directory m2-functions is used for M2-Planet functions (currently).
- Otherwise, the package's checksum is in SHA256SUMS.pkgs.
- compilation script(s)

## Conventions

- **Patches:**
- all patches are `-p0`
- all patches begin with a patch header
- **README:**
- all stages are explained in README
- **parts.rst:**
- all packages are explained in `parts.rst`
- **General:**
- Where possible, all blocks of text should be limited to a length of 80
characters.
Expand All @@ -79,9 +83,3 @@ The directory m2-functions is used for M2-Planet functions (currently).
- Patches are licensed under the license of the project which they are
patching.
- All files (excluding files within submodules) must comply with REUSE v3.0.

## git

All changes must be submitted as PRs. Pushing to master is disallowed, even if
push access is granted to a user. Only pushes to master should be merging of
patches into master.
222 changes: 90 additions & 132 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,94 +12,90 @@ An attempt to provide a reproducible, automatic, complete end-to-end
bootstrap from a minimal number of binary seeds to a supported fully
functioning operating system.

Get me started!
---------------
How do I use this?
------------------

Quick start:

See ``./rootfs.py --help`` and follow the instructions given there.
This uses a variety of userland tools to prepare the bootstrap.

(*Currently, there is no way to perform the bootstrap without external
preparations! This is a currently unsolved problem.*)

Without using Python:

1. ``git clone https://github.com/fosslinux/live-bootstrap``
2. ``git submodule update --init --recursive``
3. Provide a kernel (vmlinuz file) as the name ``kernel`` in the root of the
repository. **This must be a 32-bit kernel.**
4. ``./rootfs.py --qemu`` - ensure your account has kvm privileges and qemu
installed.

a. Alternatively, run ``./rootfs.py --chroot`` to run it in a chroot.
b. Alternatively, run ``./rootfs.py --bwrap`` to run it in a bubblewrap
sandbox. When user namespaces are supported, this mode is rootless.
c. Alternatively, run ``./rootfs.py`` but don’t run the actual
virtualization and instead copy sysa/tmp/initramfs to a USB or
some other device and boot from bare metal. NOTE: we now require
a hard drive. This is currently hardcoded as sda. You also need
to put ``sysc/tmp/disk.img`` onto your sda on the bootstrapping
machine.
d. Alternatively, do not use python at all, see "Python-less build"
below.

5. Wait.
6. If you can, observe the many binaries in ``/usr/bin``! When the
bootstrap is completed ``bash`` is launched providing a shell to
explore the system.

3. Consider whether you are going to run this in a chroot, in QEMU, or on bare
metal. (All of this *can* be automated, but not in a trustable way. See
further below.)
a. **chroot:** Create a directory where the chroot will reside, run
``./download-distfiles.sh``, and copy:
* The entire contents of ``seed/stage0-posix`` into that directory.
* All other files in ``seed`` into that directory.
* ``steps/`` and ``distfiles/`` into that directory.
* At least all files listed in ``steps/pre-network-sources`` must be
copied in. All other files will be obtained from the network.
* Run ``/bootstrap-seeds/POSIX/x86/kaem-optional-seed`` in the chroot.
(Eg, ``chroot rootfs /bootstrap-seeds/POSIX/x86/kaem-optional-seed``).
b. **QEMU:** Create two blank disk images.
* On the first image, write
``seed/stage0-posix/bootstrap-seeds/NATIVE/x86/builder-hex0-x86-stage1.img``
to it, followed by ``kernel-bootstrap/builder-hex0-x86-stage2.hex0``,
followed by zeros padding the disk to the next sector.
* distfiles can be obtained using ``./download-distfiles.sh``.
* See the list in part a. For every file within that list, write a line to
the disk ``src <size-of-file> <path-to-file>``, followed by the contents
of the file.
* *Only* copy distfiles listed in ``steps/pre-network-sources`` into
this disk.
* Optionally (if you don't do this, distfiles will be network downloaded):
* On the second image, create an MSDOS partition table and one ext3
partition.
* Copy ``distfiles/`` into this disk.
* Run QEMU, with 4+G RAM, optionally SMP (multicore), both drives (in the
order introduced above), a NIC with model E1000 (``-nic
user,model=e1000``), and ``-machine kernel-irqchip=split``.
c. **Bare metal:** Follow the same steps as QEMU, but the disks need to be
two different *physical* disks, and boot from the first disk.

Background
----------

This project is a part of the bootstrappable project, a project that
aims to be able to build complete computing platforms through the use of
source code. When you build a compiler like GCC, you need another C
compiler to compile the compiler - turtles all the way down. Even the
first GCC compiler was written in C. There has to be a way to break the
chain…

There has been significant work on this over the last 5 years, from
Jeremiah Orians’ stage0, hex2 and M2-Planet to janneke’s Mes. We have a
currently, fully-functioning chain of bootstrapping from the 357-byte
hex0 seed to a complete GCC compiler and hence a full Linux operating
system. From there, it is trivial to move to other UNIXes. However,
there is only currently one vector through which this can be
automatically done, GNU Guix.

While the primary author of this project does not believe Guix is a bad
project, the great reliance on Guile, the complexity of many of the
scripts and the rather steep learning curve to install and run Guix make
it a very non plug-and-play solution. Furthermore, there is currently
(Jan 2021) no possible way to run the bootstrap from outside of a
pre-existing Linux environment. Additionally, Guix uses many scripts and
distributed files that cannot be considered source code.

(NOTE: Guix is working on a Full Source Bootstrap, but I’m not
completely sure what that entails).

Furthermore, having an alternative bootstrap automation tool allows
people to have greater trust in the bootstrap procedure.

Comparison between GNU Guix and live-bootstrap
----------------------------------------------

+----------------------+----------------------+----------------------+
| Item | Guix | live-bootstrap |
+======================+======================+======================+
| Total size of seeds | ~30MB (Reduced | ~1KB |
| [1] | Source Bootstrap) | |
| | [2] | |
+----------------------+----------------------+----------------------+
| Use of kernel | Linux-Libre Kernel | Any Linux Kernel |
| | | (2.6+) [3] |
+----------------------+----------------------+----------------------+
| Implementation | Yes | No (in development) |
| complete | | |
+----------------------+----------------------+----------------------+
| Automation | Almost fully | Optional user |
| | automatic | customization |
+----------------------+----------------------+----------------------+

[1]: Both projects only use software licensed under a FSF-approved
free software license. Kernel is excluded from seed.
[2]: Reiterating that Guix is working on a full source bootstrap,
although that still uses guile (~12 MB). [3]: Work is ongoing to use
other, smaller POSIX kernels.

Why would I want bootstrapping?
-------------------------------
Problem statement
=================

live-bootstrap's overarching problem statement is;

> How can a usable Linux system be created with only human-auditable, and
wherever possible, human-written, source code?

Clarifications:

* "usable" means a modern toolchain, with appropriate utilities, that can be
used to expand the amount of software on the system, interactively, or
non-interactively.
* "human-auditable" is discretionary, but is usually fairly strict. See
"Specific things to be bootstrapped" below.

Why is this difficult?
======================

The core of a modern Linux system is primarily written in C and C++. C and C++
are **self-hosting**, ie, nearly every single C compiler is written in C.

Every single version of GCC was written in C. To avoid using an existing
toolchain, we need some way to be able to compile a GCC version without C. We
can use a less well-featured compiler, TCC, to do this. And so forth, until we
get to a fairly primitive C compiler written in assembly, ``cc_x86``.

Going up through this process requires a bunch of other utilities as well; the
autotools suite, guile and autogen, etc. These also have to be matched
appropriately to the toolchain available.

Why should I care?
------------------

That is outside of the scope of this README. Here’s a few things you can
look at:
Expand All @@ -117,7 +113,7 @@ bootstrapping. However, there are a number of non-auditable files used
in many of their packages. Here is a list of file types that we deem
unsuitable for bootstrapping.

1. Binaries (apart from seed hex0, kaem, kernel).
1. Binaries (apart from seed hex0, kaem, builder-hex0).
2. Any pre-generated configure scripts, or Makefile.in’s from autotools.
3. Pre-generated bison/flex parsers (identifiable through a ``.y``
file).
Expand All @@ -131,56 +127,18 @@ How does this work?

**For a more in-depth discussion, see parts.rst.**

sysa
~~~~

sysa is the first ‘system’ used in live-bootstrap. We move to a new
system after a reboot, which often occurs after the movement to a new
kernel. It is run by the seed Linux kernel provided by the user. It
compiles everything we need to be able to compile our own Linux kernel.
It runs fully in an initramfs and does not rely on disk support in the
seed Linux kernel.

sysb
~~~~

sysb is the second 'system' of live-bootstrap. This uses the Linux 4.9.10
kernel compiled within sysa. As we do not rely on disk support in sysa, we
need this intermediate system to be able to add the missing binaries to sysc
before moving into it. This is executed through kexec from sysa. At this point,
a SATA disk IS required.

sysc
~~~~

sysc is the (current) last 'system' of live-bootstrap. This is a continuation
from sysb, executed through util-linux's ``switch_root`` command which moves
the entire rootfs without a reboot. Every package from here on out is compiled
under this system, taking binaries from sysa. Chroot and bubblewrap modes skip
sysb, as it is obviously irrelevant to them.

Python-less build
-----------------

Python is no longer a requirement to set up the build system. The
repository is almost completely in a form where it can be used as the
source of a build.

1. Download required tarballs into ``sysa/distfiles`` and ``sysc/distfiles``.
You can use the ``download-distfiles.sh`` script.
2. Copy sysa/stage0-posix/src/* to the root of the repository.
3. Copy sysa/stage0-posix/src/bootstrap-seeds/POSIX/x86/kaem-optional-seed
to init in the root of the repository.
4. Copy sysa/after.kaem to after.kaem
5. Create a CPIO archive (eg, ``cpio --format newc --create --directory . > ../initramfs``).
6. Boot your initramfs and kernel.

chroot builds
~~~~~~~~~~~~~
Firstly, ``builder-hex0`` is launched. ``builder-hex0`` is a minimal kernel that is
written in ``hex0``, existing in 3 self-bootstrapping stages.

For chroot based bootstraps you can skip creation of initramfs and instead start bootstrap with
This is capable of executing the entirety of ``stage0-posix``, (see
``seed/stage0-posix``), which produces a variety of useful utilities and a basic
C language, ``M2-Planet``.

``sudo chroot . bootstrap-seeds/POSIX/x86/kaem-optional-seed``
``stage0-posix`` runs a file called ``after.kaem``. This is a shell script that
builds and runs a small program called ``script-generator``. This program reads
``steps/manifest`` and converts it into a series of shell scripts that can be
executed in sequence to complete the bootstrap.

It is also recommended to copy everything to a new directory as bootstrapping messes up with files
in git repository and cannot be re-run again.
From this point forward, ``steps/manifest`` is effectively self documenting.
Each package built exists in ``steps/<pkg>``, and the build scripts can be seen
there.
Loading

0 comments on commit 785dfa7

Please sign in to comment.