efrecon · efrecon · Mar 27, 2024 · Mar 27, 2024 · Mar 27, 2024 · Mar 27, 2024
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -11,40 +11,43 @@ This document contains notes about the internals of the implementation.
 
 When environment isolation is turned on, i.e. when the variable
 `ORCHESTRATOR_ISOLATION` is turned on, the processes will communicate through a
-common (temporary) directory created in the orchestrator and stored in the
-variable `ORCHESTRATOR_ENVIRONMENT`. That directory is mounted into the microVM
-at `/_environment`.
+common (temporary) directory created by the orchestrator and stored in the
+variable `ORCHESTRATOR_ENVIRONMENT`. Each runner loop will be associated to a
+separate sub-directory (the `RUNNER_ENVIRONMENT` variable) and that directory is
-separate sub-directory (the `RUNNER_ENVIRONMENT` variable) and that directory is
+Each runner loop will be associated with a separate sub-directory (the `RUNNER_ENVIRONMENT` variable) and that directory is
-separate sub-directory (the `RUNNER_ENVIRONMENT` variable) and that directory is
+Each runner loop will be associated with a separate sub-directory (the `RUNNER_ENVIRONMENT` variable) and that directory is
+mounted into the microVM at `/_environment`. This provides isolation between the
+different running loops.
 
 Runners are identified using a loop iteration, e.g. `1`, `2`, etc. followed by a
 random string (and separated by a `-` (dash sign))
 
 The orchestrator will wait for a file with the `.tkn` extension and named after
 the loop iteration, i.e. independently of the random string. That token file is
-set by the `runner.sh` script running inside the microVM. This file is created
-by the microVM once the runner has been registered, but not started, at GitHub.
-It contains the result of the `token.sh` script, i.e. the runner registration
-token.
+set by the `entrypoint.sh` script running inside the microVM. This file is
+created by the microVM once the runner has been registered, but not started, at
+GitHub. It contains the result of the `token.sh` script, i.e. the runner
+registration token.
 
 Each runner loop implemented in the `runner.sh` script is allocated a "secret"
-(a random string). When a termination signal is caught inside the `runner.sh`
-script inside the microVM, a file with the same name (and location) as the token
-file, but the extension `.brk` (break) is created with the content of the
-secret. Once a microVM has ended, the `runner.sh` loop script will detect if the
-`.brk` file exists and contains the secret. If it does, it will abort the loop
--- instead of creating yet another runner. Using a random secret is for security
-and to avoid that workflows are able to actually force end the runner loop.
-Since the value of the secret is passed through the `.env` file that is
-automatically removed as soon as the microVM has booted is running the
-`runner.sh` script, workflows are not able to break the external loop: they are
-able to create files in the `/_environment` directory, but they cannot know the
-value of the secret to put into the file to force the exiting handshake.
+(a random string). When a termination signal is caught inside the
+`entrypoint.sh` script inside the microVM, a file with the same name (and
+location) as the token file, but the extension `.brk` -- for "break" -- is
+created with the content of the secret. Once a microVM has ended, the
+`runner.sh` loop script will detect if the `.brk` file exists and contains the
+secret. If it does, it will abort the loop -- instead of creating yet another
+runner. Using a random secret is for security and to avoid that workflows are
-runner. Using a random secret is for security and to avoid that workflows are
+runner. Using a random secret is for security and to avoid that workflows can actually force end the runner loop.
-runner. Using a random secret is for security and to avoid that workflows are
+runner. Using a random secret is for security and to avoid that workflows can actually force end the runner loop.
+able to actually force end the runner loop. Since the value of the secret is
+passed through the `.env` file that is automatically removed as soon as the
+microVM has booted and is running the `entrypoint.sh` script, workflows are not
-microVM has booted and is running the `entrypoint.sh` script, workflows are not
+microVM has booted and is running the `entrypoint.sh` script, workflows cannot
-microVM has booted and is running the `entrypoint.sh` script, workflows are not
+microVM has booted and is running the `entrypoint.sh` script, workflows cannot
+able to break the external loop: they are able to create files in the
-able to break the external loop: they are able to create files in the
+can create files in the `/_environment` directory
-able to break the external loop: they are able to create files in the
+can create files in the `/_environment` directory
+`/_environment` directory, but they cannot know the value of the secret to put
+into the file to force the exiting handshake.
 
 The same type of handshaking happens when the main runner loop is terminating,
 for example after the life-time period provided with the command-line option
-for example after the life-time period provided with the command-line option
+for example, after the life-time period provided with the command-line option
-for example after the life-time period provided with the command-line option
+for example, after the life-time period provided with the command-line option
-`-k`. In that case, a file containing the secret and ending with the `.trm`
-extension is created in what the VM sees as the `/_environment` directory. When
-such a file is present, the main `runner.sh` script inside the VM will kill the
-GitHub runner process and unregister it.
+`-k`. In that case, a file containing the secret and ending with the `.trm` --
+for "terminate" -- extension is created in what the VM sees as the
+`/_environment` directory. When such a file is present, the main `entrypoint.sh`
+script inside the VM will kill the GitHub runner process and unregister it.
 
 ## Changes to the Installation Scripts
 
@@ -62,4 +65,5 @@ Note that when changing the logic of the "entrypoints", i.e. the scripts run at
 microVM initialisation, you do not need to wait for the image to be created.
 Instead, pass `-D /local` to the [`runner.sh`](./runner.sh) script. This will
 mount the [`runner`](./runner/) directory into the microVM at `/local` and run
-the scripts that it contains from there instead.
+the scripts that it contains from there instead. Which "entrypoint" to use is
+driven by the `RUNNER_ENTRYPOINT` variable in [`runner.sh`](./runner.sh).
diff --git a/README.md b/README.md
@@ -39,15 +39,14 @@ the base repository, e.g. `ubuntu` and `krunvm`. The GitHub runner
 implementation will automatically add other labels in addition to those.
 
 In the example above, the double-dash `--` separates options given to the
-user-facing [orchestrator] from options to the loop implementation
-[runner](./runner.sh) script. All options appearing after the `--` will be
-blindly passed to the [runner] loop and script. All scripts within the project
-accepts short options only and can either be controlled through options or
-environment variables -- but CLI options have precedence. Running scripts with
-the `-h` option will provide help and a list of those variables. Variables
-starting with `ORCHESTRATOR_` will affect the behaviour of the [orchestrator],
-while variables starting with `RUNNER_` will affect the behaviour of each
-[runner] (loop).
+user-facing [orchestrator] from options to the loop implementation [runner]
+script. All options appearing after the `--` will be blindly passed to the
+[runner] loop and script. All scripts within the project accepts short options
+only and can either be controlled through options or environment variables --
+but CLI options have precedence. Running scripts with the `-h` option will
+provide help and a list of those variables. Variables starting with
+`ORCHESTRATOR_` will affect the behaviour of the [orchestrator], while variables
+starting with `RUNNER_` will affect the behaviour of each [runner] (loop).
 
   [orchestrator]: ./orchestrator.sh
   [runner]: ./runner.sh
@@ -68,9 +67,12 @@ while variables starting with `RUNNER_` will affect the behaviour of each
 + Ability to mount local directories to cache local runner-based requirements or
   critical software tools.
 + Good compatibility with the regular GitHub [runners]: same user ID, member of
-  the `docker` group, etc.
-+ In theory, the main [image] should be able to be used in more traditional
-  container-based solutions -- perhaps [sysbox]? Reports/changes are welcome.
+  the `docker` group, password-less `sudo`, etc.
++ In theory, the main [ubuntu] and [fedora] images should be able to be used in
+  more traditional container-based solutions -- perhaps [sysbox]? Reports and/or
+  changes are welcome.
++ Relaying of the container daemon logs to provide for improved debugging of
+  complex workflows.
 
   [sysbox]: https://github.com/nestybox/sysbox
 
@@ -90,6 +92,8 @@ installed on the host. Installation is easiest on Fedora
 + `buildah`
 + `krunvm` (and its [requirements])
 
+Note: You do not need `podman`.
+
   [built]: ./.github/workflows/ci.yml
   [requirements]: https://github.com/containers/krunvm#installation
 
@@ -122,13 +126,12 @@ permissions.
 
 ## Architecture and Design
 
-The [orchestrator](./orchestrator.sh) creates as many loops of ephemeral runners
-as requested. These loops are implemented as part of the
-[runner.sh](./runner.sh) script: the script will create a microVM based on the
-default image (see below), memory and vCPU requirement. It will then start that
-microVM using `krunvm` and that will start an (ephemeral) [runner][self]. As
-soon as a job has been executed on that runner, the microVM will end and a new
-will be created.
+The [orchestrator] creates as many loops of ephemeral runners as requested.
+These loops are implemented as part of the [runner.sh][runner] script: the
+script will create a microVM based on the default image (see below), memory and
+vCPU requirement. It will then start that microVM using `krunvm` and that will
+start an (ephemeral) GitHub [runner][self]. As soon as a job has been executed
+on that runner, the microVM will end and a new will be created.
 
 The OCI image is built in two parts:
 
@@ -150,15 +153,15 @@ containers with the `--network host` option. This is made transparent through a
 docker CLI [wrapper](./base/docker.sh) that will automatically add this option
 to all (relevant) commands.
 
-When the microVM starts, the [runner.sh](./runner/runner.sh) script will be
-started. This script will pick its options using an `.env` file, shared from the
-host. The file will be sourced and removed at once. This ensures that secrets
-are not leaked to the workflows through the process table or a file. Upon start,
-the script will [request](./runner/token.sh) a runner token, configure the
-runner and then start the actions runner .NET implementation, under the `runner`
-user. The `runner` user shares the same id as the one at GitHub and is also a
-member of the `docker` group. Similarily to GitHub runners, the user is capable
-of `sudo` without a password.
+When the microVM starts, the [entrypoint.sh](./runner/entrypoint.sh) script will
+be started. This script will pick its options using an `.env` file, shared from
+the host. The file will be sourced and removed at once. This ensures that
+secrets are not leaked to the workflows through the process table or a file.
+Upon start, the script will [request](./runner/token.sh) a runner token,
+configure the runner and then start the actions runner .NET implementation,
+under the `runner` user. The `runner` user shares the same id as the one at
+GitHub and is also a member of the `docker` group. Similarily to GitHub runners,
+the user is capable of `sudo` without a password.
 
 Runner tokens are written to the directory that is shared with the host. This is
 used during initial synchronisation, to avoid starting up several runners at the

diff --git a/lib/common.sh b/lib/common.sh
@@ -203,7 +203,7 @@ error() { _log ERR "$@" && exit 1; }
 sublog() {
   # Eagerly wait for the log file to exist
   while ! [ -f "${1-0}" ]; do sleep 0.1; done
-  verbose "$1 now present on disk"
+  debug "$1 now present on disk"
 
   # Then reroute its content through our logging printf style
   tail -n +0 -f "$1" 2>/dev/null | while IFS= read -r line; do

diff --git a/orchestrator.sh b/orchestrator.sh
@@ -132,7 +132,6 @@ trap cleanup EXIT
 # Pass essential variables, verbosity and log configuration to main runner
 # script.
 RUNNER_PREFIX=$ORCHESTRATOR_PREFIX
-RUNNER_ENVIRONMENT="${ORCHESTRATOR_ENVIRONMENT:-}"
 RUNNER_VERBOSE=$ORCHESTRATOR_VERBOSE
 RUNNER_LOG=$ORCHESTRATOR_LOG
 export RUNNER_PREFIX RUNNER_ENVIRONMENT RUNNER_VERBOSE RUNNER_LOG
@@ -141,6 +140,18 @@ export RUNNER_PREFIX RUNNER_ENVIRONMENT RUNNER_VERBOSE RUNNER_LOG
 # indefinitely create ephemeral runners. Looping is implemented in runner.sh,
 # in the same directory as this script.
 for i in $(seq 1 "$ORCHESTRATOR_RUNNERS"); do
+  # Create a separate environment for each runner loop, to further isolate
+  # runners from one another.
+  if [ -n "$ORCHESTRATOR_ENVIRONMENT" ]; then
+    RUNNER_ENVIRONMENT="$ORCHESTRATOR_ENVIRONMENT/${ORCHESTRATOR_PREFIX}-$(printf %.3d\\n "${i}")"
+    if ! [ -d "$RUNNER_ENVIRONMENT" ]; then
+      mkdir -p "$RUNNER_ENVIRONMENT"
+    fi
+  else
+    RUNNER_ENVIRONMENT=""
+  fi
+  export RUNNER_ENVIRONMENT
+
   # Launch a runner loop in the background and collect its PID in the
   # ORCHESTRATOR_PIDS variable.
   verbose "Creating runner loop $i"
@@ -156,9 +167,9 @@ for i in $(seq 1 "$ORCHESTRATOR_RUNNERS"); do
   if [ "$i" -lt "$ORCHESTRATOR_RUNNERS" ]; then
     # Wait for the runner token to be ready before starting the next runner,
     # or, at least, sleep for some time.
-    if [ -n "${ORCHESTRATOR_ENVIRONMENT:-}" ]; then
-      wait_path -f "${ORCHESTRATOR_ENVIRONMENT}/${i}-*.tkn" -1 5
-      token=$(find_pattern "${ORCHESTRATOR_ENVIRONMENT}/${i}-*.tkn")
+    if [ -n "${RUNNER_ENVIRONMENT:-}" ]; then
+      wait_path -f "${RUNNER_ENVIRONMENT}/${i}-*.tkn" -1 5
+      token=$(find_pattern "${RUNNER_ENVIRONMENT}/${i}-*.tkn")
       rm -f "$token"
       verbose "Removed token file $token"
     elif [ -n "$ORCHESTRATOR_SLEEP" ] && [ "$ORCHESTRATOR_SLEEP" -gt 0 ]; then

diff --git a/runner.sh b/runner.sh
@@ -195,10 +195,10 @@ check_positive_number "$RUNNER_MEMORY" "Memory (in MB)"
 # Decide which runner.sh implementation (this is the "entrypoint" of the
 # microVM) to use: the one from the mount point, or the built-in one.
 if [ -z "$RUNNER_DIR" ]; then
-  RUNNER_ENTRYPOINT=/opt/gh-runner-krunvm/bin/runner.sh
+  RUNNER_ENTRYPOINT=/opt/gh-runner-krunvm/bin/entrypoint.sh
 else
-  check_command "${RUNNER_ROOTDIR}/runner/runner.sh"
-  RUNNER_ENTRYPOINT=${RUNNER_DIR%/}/runner/runner.sh
+  check_command "${RUNNER_ROOTDIR}/runner/entrypoint.sh"
+  RUNNER_ENTRYPOINT=${RUNNER_DIR%/}/runner/entrypoint.sh
 fi
 
 # Create the VM used for orchestration. Add --volume options for all necessary

diff --git a/runner/runner.sh → runner/entrypoint.sh b/runner/runner.sh → runner/entrypoint.sh