diff --git a/.gitignore b/.gitignore index 3fe07d6fe..b29b99182 100644 --- a/.gitignore +++ b/.gitignore @@ -66,3 +66,5 @@ data_gisclub/ # png images for the brms tutorial /content/tutorials/r_brms/brms_eng/*.png /content/tutorials/r_brms/brms_nl/*.png + +/.quarto/ diff --git a/content/tutorials/development_containers1/index.bak b/content/tutorials/development_containers1/index.bak new file mode 100644 index 000000000..9ce704dd2 --- /dev/null +++ b/content/tutorials/development_containers1/index.bak @@ -0,0 +1,874 @@ +--- +title: Containers with Docker and Podman +description: Introduction to containerization and the practical use of Docker-like tools. +date: "2025-02-20" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + + +You might have heard about "containerization" with [**Docker**](https://docs.docker.com). +Docker has been labeled "the *Holy Grail* of reproducibility" in [The Open Science Manual by Claudio Zandonella Callegher and Davide Massidda (2023)](https://arca-dpss.github.io/manual-open-science/docker-chapter.html). +Although containerization is an immensely useful Open Science tool worth striving for, the *Holy Grail* is an inaccurate metaphor, because +(i) Unlike The Grail, Docker is easy to find and accessible. +(ii) Docker alone does not make a reproducible workflow; some of its capability is occasionally confused with package version management. +(iii) Docker has issues, some of them mitigated by configuration adjustment or switching to "Podman". + +In this tutorial, I demonstrate step-by-step how to set up and deploy a **custom container** with Docker or Podman. +This is intended to be a rather general test case, serving for later configuration of more specific container solutions. +For example, you will learn how to spin up an existing `rocker/rstudio` container, and even modify it with additional system components and libraries. +I follow other tutorials available online, and try to capture their essence for an INBO context. +Hence, this is just an assembly of other tutorials, with references - no original ideas to be found below, but nevertheless some guidance. + +On Windows, installation, configuration, and management of containers runs via the `docker desktop` app. +However, this tutorial also covers (and in fact focuses on) the terminal-centered steps to be executed on a Linux computer or within a WSL. + +I also present **Podman** as a full replacement for Docker, and recommend to give it a try. + +Generally, if you are an INBO user, it is recommended to contact and involve your ICT department for support with the setup. + +**References:** + +- <https://docs.docker.com> +- <https://podman.io/docs>, <https://github.com/containers/podman/blob/main/docs/tutorials/podman-for-windows.md> +- <https://wiki.archlinux.org/title/Podman> +- <https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html> +- <https://medium.com/@geeekfa/docker-compose-setup-for-a-python-flask-api-with-nginx-reverse-proxy-b9be09d9db9b> +- <https://testdriven.io/blog/dockerizing-flask-with-postgres-gunicorn-and-nginx> +- <https://arca-dpss.github.io/manual-open-science/docker-chapter.html> +- <https://do4ds.com/chapters/sec1/1-6-docker.html> +- <https://colinfay.me/docker-r-reproducibility> +- <https://solutions.posit.co/envs-pkgs/environments/docker> + + +# Installation + +The installation procedure [is documented here](https://docs.docker.com/install). + +Docker comes with the *Docker Desktop* app. +That app by itself is trivial and hardly worth a tutorial. + +## Microsoft Windows + +Navigate to [the download site for Docker on Windows](https://docs.docker.com/desktop/setup/install/windows-install). +Download the "App" (newspeak for: graphical user interface to a software tool). +Install it. + +*Note for INBO users:* you might choose to select Hyper-V, instead of WSL, against Docker's recommendation (WSL is not working in our enterprise environment; however, we are trying to improve and ICT might help). +You probably do not have admin rights, which is good. +To re-iterate: **ask our friendly ICT helpdesk for support right away.** + +<figure> +<img src="../../images/tutorials/development_docker/docker_desktop1.jpg" alt="desktop app" /> +<figcaption aria-hidden="true">The Desktop App.</figcaption> +</figure> + +Using a convenient app is possible with "Docker Desktop". +On Windows, you can download and install it with administrator rights. +On Linux, that same `docker-desktop` [is available for installation](https://docs.docker.com/desktop/setup/install/linux). +Yet while automating some aspects, the app is not entirely transparent on telemetry and advertisement; some anti-features are included (e.g. required login). +This is unfortunate, because it makes the app less open for more privacy-concerned users. + +The terminal aspect of Docker is entirely free and open source, and universally accessible. +This is why the rest of this tutorial will focus on terminal access. + +## Terminal + +On the Windows terminal or Linux shell, you can install `docker` as a terminal tool. + +{{% callout note %}} +On Windows, this comes bundled with the App; the steps below are not necessary. +There might be ways to get around the Desktop App and facilitate installation, either via WSL2 or using [a windows package manager called Chocolatey](https://en.wikipedia.org/wiki/Chocolatey). + +Either way, note that you need to run the docker app or docker in a terminal *as administrator*. + +{{% /callout %}} + +More info about the installation on Debian-based or Ubuntu Linux systems [can be found here](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository). +The procedure requires you to add an extra repository, [some caution is warranted](https://wiki.debian.org/DontBreakDebian). + +``` sh +sudo apt update && sudo apt install docker-ce docker-buildx-plugin # debian-based +# sudo pacman -Sy docker docker-buildx # Arch Linux +``` + +As you will notice, this installs a "CE" version of Docker, `docker-ce`. +CE stands for "community edition", as opposed to "enterprise edition" ([cf. here](https://www.geeksforgeeks.org/docker-community-edition-vs-enterprise-edition)). +Many features which you would take for granted in this kind of software (security, consistency, scalability) are handled differently in the two editions, thus it is worth knowing the difference and considering the alternatives. + +For users to be able to use Docker, they must be in the "docker" group. +(Insert your username at `<your-username>`.) + +``` sh +sudo usermod -a -G docker <your-username> +``` + +For this change to take effect, log off and log in again and restart the Docker service if it was running. + +Containers are managed by a system task ("service" and "socket") which need to be started. +Most likely, your Linux uses `systemd`. +Your system can start and stop that service automatically, by using `systemctl enable <...>`. +However, due to [diverse](https://docs.docker.com/engine/security) [security](https://github.com/moby/moby/issues/9976) [pitfalls](https://snyk.io/blog/top-ten-most-popular-docker-images-each-contain-at-least-30-vulnerabilities), it is good practice to **not keep it enabled** permanently on your system (unless, of course, if you use it all the time). + +On a `systemd` system, you can start and stop Docker on demand via the following commands (those will ask you for `sudo` authentification if necessary). + +``` sh +systemctl start docker + +systemctl status docker # check status + +systemctl stop docker.socket +systemctl stop docker.service +``` + +For aficionados: docker actually runs multiple services: the docker service, the docker socket, and the [container daemon](https://www.docker.com/blog/containerd-vs-docker) `containerd`. + +You can check the Docker installation by confirming the version at which the service is running. + +``` sh +docker --version +``` + +Congratulations: now the fun starts! + +# Existing Containers: `run` + +## Rationale + +Docker is about assembling and working in containers. +"Living" in containers. +Or, rather, you can think of this as living in a ["tiny home", or "mobile home"](https://parametric-architecture.com/tiny-house-movement). +Let's call it a fancy caravan. +The good thing is that at least you get to pick a general design and to choose all details of the interior. + +<figure> +<img src="../../images/tutorials/development_docker/docker_metaphor_tiny_space.jpg" alt="Black/white image of a tiny home as a metaphor for software containerization." /> +<figcaption aria-hidden="true">A tiny home close to "Gare Maritime", Brussels, February 2025.</figcaption> +</figure> + + + +The best thing: if you feel like you do not have the cash, time, or talent to build your own home, you can *of course* use someone else's. +There are a gazillion **Docker images available for you** on [Docker Hub](https://hub.docker.com). + +## Example + +For example[^2], there are Docker images with [rstudio server](https://posit.co/download/rstudio-server) pre-installed: + +- <https://hub.docker.com/r/rocker/rstudio> + +{{% callout note %}} +If you control containers via the desktop app, simply search, pull, and run it. +{{% /callout %}} + +<figure> +<img src="../../images/tutorials/development_docker/docker_desktop2.jpg" alt="desktop app: run" /> +<figcaption aria-hidden="true">Desktop App: run.</figcaption> +</figure> + +Otherwise, execute the following script (*Windows*: use an administrator terminal). +If it does not find the resources locally, Docker will download and extract the image from Docker Hub[^3]. + +``` sh +docker run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD rocker/rstudio +``` + +- The `--rm` flag makes the Docker container non-permanent, i.e. disk space will be freed after you close the container (<a href="#sec-permanence" class="quarto-xref">Section 2.5</a>). +- The port specified at `-p` is the one you use to access this local container server (the `-p` actually maps host- and container ports). You have to specify it explicitly, otherwise the host system will not let you pass (`:gandalf-meme:`). +- The `-e` flag allows you to specify environment variables, in this case used to set a password for the RStudio server. But if you do not specify one, a random password will be generated and displayed upon startup (read the terminal output). + +<figure> +<img src="../../images/tutorials/development_docker/docker_run.jpg" alt="run" /> +<figcaption aria-hidden="true">Docker run, on the terminal.</figcaption> +</figure> + +You are now running (`run`) a `rocker/rstudio` server instance on your `localhost`, i.e. your computer. +You can access it via a browser, going to <localhost:8787>, with the username `rstudio` and your chosen password. + +You can shut down the container with the keyboard shortcut `[ctrl]+[C]` (probably `[ctrl]+[Z] [Return]` on Windows). + + +<a id="sec-mounting"></a> +## File Access + +The downside of this is that your container is isolated (well... at least to a certain degree). + +Images can take up considerable storage space. +Storing files locally, i.e. on the host machine, without storing an unneccessarily filled container, might be a good strategy. +This can be achieved by mapping a virtual path on the container to a local drive on your computer. +(Linux users will be familiar with the concept of "mounting" and "linking" storage locations.) +Note that the technique is equally relevant when running the container locally, hence not exclusive to remote hosts. + +Docker `run` brings the `-v` flag for mounting volumes. +Suppose you have an R project you would like to work on, stored, for example, in this path: + +- `/data/git/coding-club` + +Then you can link this to your container's home folder via the following command. + +``` sh +# Windows syntax, mapping on `D:\data` +docker run --rm -p 8787:8787 -v //d/data/git/coding-club:/home/rstudio/coding-club rocker/rstudio + +# Linux syntax +docker run --rm -p 8787:8787 -v /data/git/coding-club:/home/rstudio/coding-club rocker/rstudio +``` + +Again, navigate to <localhost:8787>, *et voilà*, you can access your project and store files back in your regular folders. + +## Limitations + +This is a simple and quick way to run R and RStudio in a container. + +However, there are limitations: + +{{% callout note %}} +- You have to live with the R packages provided in the container, or otherwise install them each time you access it... +- ... unless you make your container permanent by omitting the `--rm` option. Note that this will cost considerable disk space, will not transfer to other computers (the original purpose of Docker), and demand occasional updates (<a href="#sec-permanence" class="quarto-xref">Section 2.5</a>). +- You could alternatively add `--pull always` to `docker run`, which will check and pull new versions. +- Speaking of updates: it is good practice to keep software up to date. Occasionally update or simply re-install your Docker image and R packages to get the latest versions. +- You should make sure that the containers are configured correctly and securely. This is especially important with server components which expose your machine to the internet. +- Because most containers contain a linux system, user permissions are taken seriously, and the consequences might be confusing. There are guides online ([e.g. here](https://labex.io/tutorials/docker-how-to-handle-permissions-in-docker-415866)); there are example repositories (like the author's own struggle [here](https://github.com/inbo/containbo?tab=readme-ov-file#understanding-volumes) and [here](https://github.com/inbo/containbo/tree/main/emacs)); base images are well set up and one can normally get by with default users. +- There is a performance penalty from using containers: in inaccurate laymans' terms, they emulate (parts of a) "computer" inside your computer. +{{% /callout %}} + +On the performance issue: I attempted this on my local laptop with matrix multiplication. + +``` r +# https://cran.r-project.org/web/packages/rbenchmark/rbenchmark.pdf +# install.packages("rbenchmark") + +test <- function(){ + # test from https://prdm0.github.io/ropenblas/#installation + m <- 1e4; n <- 1e3; k <- 3e2 + X <- matrix(rnorm(m*k), nrow=m); Y <- matrix(rnorm(n*k), ncol=n) + X %*% Y +} + +benchmark(test()) +``` + +In the terminal: + + test replications elapsed relative user.self sys.self user.child sys.child + 1 test() 100 22.391 1 83.961 65.291 0 0 + +In the container: + + test replications elapsed relative user.self sys.self user.child sys.child + 1 test() 100 26.076 1 102.494 153.89 0 0 + +Now, the *good news* is that the difference is not by orders of magnitude. +This indicates that the chosen rocker image integrated the more performant `blas` variant which is [recommended](https://pbs-assess.github.io/sdmTMB/index.html#installation) [elsewhere](https://prdm0.github.io/ropenblas/#installation) (`blas-openblas`). + +The *bad news* is that we still a hit of `-20%` performance, which is considerable. + +This is just a single snapshot on a laptop, and putatively `blas`-confounded. +Feel free to systematically and scientifically repeat the tests on your own machine. + + +<a id="sec-permanence"></a> +## Container Permanence: The `--rm` Option + +As briefly touched above, `docker run` comes with the `--rm` option. +This basically enables two separate workflows, i.e. usage paradigms. + +The first option, which is the default, is that your container is stored on the system permanently. +This counts for the upstream images, which are downloaded upon first invocation of a container. +But also, changes you apply while working in the container are persistently stored until you log in again, using hard drive space of the host. +Images may still be removed by manually running `docker rmi [...]` (<a href="#sec-commands" class="quarto-xref">Section 5</a>). + +In contrast, with the second option, `docker run --rm [...]`, ad-hoc changes in the container are removed when the container is finished. +Unless, of course, you mount a local volume with `docker run --rm -v [...]` (<a href="#sec-mounting" class="quarto-xref">Section 2.3</a>). +However, contrary to a rather general intuition, starting a container with `--rm` will not require dependency download a second time. + +You might want to test this for yourself. +Consider the following series of commands to create a test file in the Docker home directory: + +``` sh +docker run --name testing_permanence --rm -it docker.io/rocker/r-base +echo "testing permanence." > ~/test.txt +cat ~/test.txt +exit +``` + +Re-connecting is instantateous. +However, + +``` sh +docker run --name testing_permanence --rm -it docker.io/rocker/r-base bash +cat ~/test.txt +``` + +will return: + +> cat: /root/test.txt: No such file or directory + +This behavior is desired (in the second workflow above): if you start up a fresh environment each time you work in Docker, you **assure that your work pipeline is independent of prior changes on the system**. +Whether this makes sense as a workflow has to be evaluated with respect to hard drive space requirement, updates, the option to build upon a customized Dockerfile, reproducibility potential. + +You can "link in" folders for working files (note how you have to specify the full path to `new_home`, and that this container uses the root user by default): + +``` sh +mkdir new_home +docker run --name testing_permanence -v /data/containers/new_home:/root --rm -it docker.io/rocker/r-base bash +echo "testing permanence." > ~/test.txt +``` + +Using `--rm` might not be desirable in every case. +However, it is a valuable option for testing, good to have when disk space is sparse, or as a final check before publishing. +Generally, I would consider it good practice to treat containers as volatile, thereby keeping them hostmachine-independent as much as possible. + +# Custom Containers: `build` + +(Here follows somewhat advanced stuff. Nevertheless, be brave and give it a read!) + +## Rationale + +One advantage of a Docker container is its mobility: you can "bring it with you" to other workstations, host it for colleagues or readers, use cloud computing, mostly without having to worry about installation of the components. +This is a matter of good open science practice. +But it also pays off in complicated server setups and distributed computing. + +A standardized container from [Docker Hub](https://hub.docker.com) is a good start. +However, you will probably require personalization. +As a use case, imagine you would like to have an RStudio server which comes with relevant inbo packages pre-installed (e.g. [`inbodb`](https://inbo.github.io/inbodb), [`watina`](https://inbo.github.io/watina); *cf.* [the containbo repository](https://github.com/inbo/containbo)). + +I will return to this use case below. +To explore the general workings of `docker build`, let us turn to more web-directed tasks for a change. + +{{% callout note %}} +With Docker Desktop, you have the graphical interface for "builds". +This might fall under the extended functionality which requires a login. + +Yet even without a login, you *can* proceed via a terminal, as below. +Once you create a `Dockerfile` and build it, it will appear in the GUI. +{{% /callout %}} + +<figure> +<img src="../../images/tutorials/development_docker/docker_winbuild.jpg" alt="build on Windows" /> +<figcaption aria-hidden="true">Build on Windows.</figcaption> +</figure> + +## Init: a `flask` + +[Python `flask`](https://en.wikipedia.org/wiki/Flask_(web_framework)) is a library which allows you to execute Python scripts upon web access by users. +For example, you can use flask to gather information a user provides in an html form, then process and store it wherever you like. + +I started from the following examples and tutorials to spin up a flask container, but provide modifications and comments on the steps. + +- <https://docs.docker.com/build/concepts/dockerfile> +- <https://medium.com/@geeekfa/dockerizing-a-python-flask-app-a-step-by-step-guide-to-containerizing-your-web-application-d0f123159ba2> + +> **It all starts with a [Dockerfile](https://www.geeksforgeeks.org/what-is-dockerfile).**[^4] + +As you will see, the Docker file will give you all the design choices to create your own containers. +I think of the Docker file as a script which provides all the instructions to set up your container, starting with `FROM` (i.e. which prior container you build upon) to `RUN`ning any type of commands. +Not *any* type, really: we are working on (mysterious, powerful) Linux - don't fret, it is easier than you think! + +To our `python/flask` example. +A list of the official python containers is [available here](https://hub.docker.com/_/python). +Note that you build every container upon the skeleton of an operating system: I chose [Alpine Linux](https://en.wikipedia.org/wiki/Alpine_Linux). +(It's *en vogue*.) + +The Dockerfile resides in your working folder (yet it also defines a [`WORKDIR`](https://stackoverflow.com/a/51066379) from within which later commands are executed). + +- Navigate to a folder in which you intend to store your container(s), e.g. `cd C:\data\docker` (Windows) or `cd /data/docker` (Linux). +- Create a file called `Dockerfile`: `touch Dockerfile`. +- Edit the file in your favorite text editor (`vim Dockerfile`; Windows users probably use "notepad"). +- Paste and optionally modify the content below. + +<!-- --> + + # Use the official Python image (Alpine Linux, Python 3) + FROM python:3-alpine + + # install app dependencies + RUN apk update && apk add --no-cache python3 py3-pip + RUN pip install flask + + # install app + COPY hello.py / + + # final configuration + ENV FLASK_APP=hello + EXPOSE 8000 + CMD ["flask", "run", "--host", "0.0.0.0", "--port", "8000"] + +Note that the following `hello.py` file needs to be present in your working directory (you will be reminded by a friendly error message): + +``` python +from flask import Flask +app = Flask(__name__) + +@app.route("/") +def hello(): + return "Hello, INBO!" +``` + +With the `Dockerfile` and `hello.py` in place, you can build the container [^5]. + +``` sh +# on Windows, you are already in an administrator terminal +docker build --pull -t my-flask . + +docker build --pull -t my-flask . +``` + +On Linux, you might need to use `sudo` if the user is not in the `docker` group, like so: `sudo docker build -t my-flask`. +Using `--pull` is good practice to ensure the download of the latest upstream containers; you could even use `--no-cache` to avoid previous downloads altogether. +The `-t` parameter [will "tag" the image at build time](https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image), auto-generating extra metadata. +Also, some variants can omit the final dot ("."), others require it; the dot is just a Linux shorthand reference to the current working directory (i.e. where your Dockerfile resides). + + +<figure> +<img src="../../images/tutorials/development_docker/docker_build.jpg" alt="build" /> +<figcaption aria-hidden="true">Docker build.</figcaption> +</figure> + +List your available container images via the `docker images` command. + +You should now see a `python` image, which is the base alpine image we built upon. +There is also a `my-flask`. +Try it! + +``` sh +docker run my-flask +``` + +The terminal should give you an IP and port; because the flask runs in a container, `localhost:8000` will **not work** out-of-the-box. +Instead, in my case, it was `http://172.17.0.2:8000`. +(Sadly, although I could build and run this container on windows, I did not get through via the browser :shrug: but try with port mapping `-p 8000:8000`.) + +{{% callout note %}} +So far, so good. +We have used an existing image and added `flask` on top of it. +This works via writing a Dockerfile and building an image. +{{% /callout %}} + +## Multiple Images: `compose` *versus* `build` + +The above works fine for most cases. +However, if you want to assemble and combine multiple images, or build on base images from multiple sources, you need a level up. + +In that case `docker compose` is [the way to go](https://docs.docker.com/compose/gettingstarted). +On Debian or Ubuntu, this extra functionality comes with the `docker-compose-plugin`. +I did not have the need to try this out, yet, but will return here if that changes. + +## Relation to Version Control and Version Management + +Back to the initial paradigma of reproducibility: +*What exactly is the Open Science aspect of containerization?* + +This question might have led to some confusion, and I would like to throw in a paragraph of clarification. +A crucial distinction lies in the preparation of *Dockerfiles* (i.e. build instructions) and the preservation of *images* (i.e. end products of a build process). + + +One purpose of a Dockerfile may be that you document the exact components of your system environment. +You start at a base image (e.g. a `rocker`) and add additional software via Dockerfile layers. +This is good practice, and encouraged: if you publish an analysis, provide a tested container recipe with it. + +However, this alone does not solve the problem of version conflicts and deprecation. +Documenting the versions of packages you used is an extra step, for which [other tools are available](https://doi.org/10.1038/d41586-023-01469-0): + +- It is good practice to report the exact versions of the software used upon publication ([see here, for example](https://arca-dpss.github.io/manual-open-science/requirements-chapter.html)). +- Version control such as `git` will track the changes within your own texts, scripts, even version snapshots and Dockerfiles. +- Finally, docker images can serve as a snapshot of a (virtual) machine on which your code would run. + +{{% callout emphasize %}} +The simple rule of thumb is: use all three methods, ideally all the time. + +Virtual environments. +Version control. +Snapshots. + +Get used to them. +They are easy. +They will save you time and trouble almost immediately. +{{% /callout %}} + + +But unless you use them already, you might require some starting points and directions: here we go. +The second point, **version control**, is a fantastic tool to enable open science, and avoid personal trouble. +You will [find starting points and help in other tutorials on this website](https://tutorials.inbo.be/tags/git). +It might have a steep learning curve, yet [there](https://rstudio.github.io/cheatsheets/git-github.pdf) [are](https://www.sourcetreeapp.com) [fantastic](https://magit.vc) [tools](https://www.sublimemerge.com) to get you started. +The other point, version documentation, is trivially achieved by manual storage of currently installed versions via `sessionInfo()` in R, or `pip freeze > versions.txt` for Python. +A small step towads somewhat more professionalism are **virtual environments**. +Those exist for R ([renv](https://rstudio.github.io/renv/articles/renv.html)) or Python ([venv](https://docs.python.org/3/library/venv.html)). +The `pak` library in R can [handle lock files conveniently](https://pak.r-lib.org/reference/lockfile_install.html) with `pak::lockfile_install()`. +Then there is the integration of R, Python and system packages in `conda`-like tools ([e.g. micromamba](https://mamba.readthedocs.io/en/latest)). +There are even system level tools, for example [`nix` and `rix`](https://docs.ropensci.org/rix). + +The methods are not mutually exclusive: +all Dockerfiles, build recipes and scripts to establish virtual environments should generally be subject to version control. + + +However, documenting the exact tools and versions used in a project does not guarantee that these versions will be accessible to future investigators (like oneself, trying to reproduce an analysis five years later). +This is where **Docker images** come in. +Docker images are the actual containers which you create from the Dockerfile blueprints by the process of building. +In the "tiny home" metaphor: your "image" is the physical (small, but real, DIY-achievement) home to live in, built from step-by-step instructions. +Think of a Docker image as a virtual copy of your computer which you store for later re-activation. +For example, a collection of images for specific analysis pipelines at INBO are preserved at [Docker Hub/inbobmk](https://hub.docker.com/u/inbobmk). +We consider these "stable" versions because they could be re-activated no matter what crazy future updates will shatter the R community, which enables us to return to all details of previous analyses. + + +Some confusion might arise from the fact that managing these image snapshots is achieved with the same vocabulary as version control, for example you would ["commit"](https://docs.docker.com/reference/cli/docker/container/commit) updated versions and ["push"](https://docs.docker.com/reference/cli/docker/image/push) them to a container repository. + +Even more confusion might arise from the fact that you also find ready-made images online, e.g. on [Docker Hub](https://hub.docker.com), or [Quai](https://quay.io), or elsewhere. +These provide images of (recent) versions of working environments, supposed to stand in as starting points for derived containers. +Hence, be aware of the dual use case of images: (i) the dynamic, universal base image which improves efficiency and (ii) the static, derived, bespoke image which you created for your analysis (shared with the world for reproducibility). + + +And, once more, those images are not a "holy grail" solution: they are not entirely system independent (e.g. processor architecture), and they might occupy a considerable amount of hard disk space (Dockerfile optimization is warranted). +Ideally, to be a "full stack open science developer", you want to implement **a mixed strategy** consisting virtual environments and containers, wrapped in version control and stored in a backup image. + + +<a id="sec-rootless"></a> +## "Because Roots Are Important"[^6]: Rootless Mode + +One of the main criticism about Docker is the necessity to run in a privileged user environment, which is indeed a security issue. +This may refer to the system process requiring elevated privileges, or users in the `docker` system group [effectively having superuser privileges](https://github.com/moby/moby/issues/9976). +Because of the risk of privilege escalation in case of a container breakout, this situation would worsen existing vulnerabilities, [of which there are some](https://snyk.io/blog/top-5-docker-security-vulnerabilities) in [Docker containers](https://www.docker.com/blog/container-security-and-why-it-matters). + +Historically, Docker could not run "rootless", i.e. without elevated privileges. +[This seems to have changed](https://docs.docker.com/engine/security/rootless), according to Docker. +Some caution is still warranted: the setup procedure requires downloading and running shell scripts (which must be checked); the deamon still builds on `systemd` (*usually* root level); some functionality is limited. + +On the other hand, there is Podman (<a href="#sec-podman" class="quarto-xref">Section 6</a>). +It *used to* require almost the same extra steps as the `docker-rootless` to work rootless, but we found that these requirements are now met per default. +It seems that, at the time of writing, Docker and Podman have identical capabilities in terms of rootless containerization. +The remaining difference is that Podman seems to have more sensible default settings. + +It might therefore be worth considering and exchanging both tools. + +# Use Case: RStudio With Packages + +## Rationale + +We should be able to apply the above to modify the `rocker/rstudio` server image for our purpose. + +Build recipes for some of the INBO packages you might want to include are collected in this repository: + +- <https://github.com/inbo/contaINBO> + +Contributions are much appreciated! + +## Dockerfile + +This use case is, in fact, well documented: + +- <https://rocker-project.org/use/extending.html> +- <https://rocker-project.org/images/versioned/rstudio.html> +- <https://davetang.org/muse/2021/04/24/running-rstudio-server-with-docker> + +The Rocker crew rocks! +They prepared quite [a lot of useful images](https://hub.docker.com/u/rocker), including for example the `tidyverse` or geospatial packages. + +Note the syntax in `FROM`: it is `rocker/<image>:<version>`. + +``` +FROM rocker/rstudio:latest +# (Use the rocker rstudio image) + +# update the system packages +RUN apt update \ + && apt upgrade --yes + +# git2rdata requires git +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + git libgit2-dev\ + && apt-get clean + +# update pre-installed R packages +# RUN Rscript -e 'update.packages(ask=FALSE)' + +# copy a `.Rprofile` to the container +# available here: https://tutorials.inbo.be/installation/administrator/admin_install_r/Rprofile.site +COPY docker/.Rprofile $R_HOME/etc/Rprofile.site + +# install package via an R command (`R -q -e` or `Rscript -e`) +# (a) from pre-configured repositories +RUN Rscript -e 'install.packages("git2rdata")' + +# (b) via r-universe +RUN R -q -e 'install.packages("watina", repos = c(inbo = "https://inbo.r-universe.dev", CRAN = "https://cloud.r-project.org"))' + +# (b) from github +RUN R -q -e 'install.packages("remotes")' +RUN R -q -e 'remotes::install_github("inbo/INBOmd", dependencies = TRUE)' +``` + +It takes some puzzle work to get the dependencies right, e.g. with the `libgit2` dependency (try commenting out that line to get a feeling for build failure). +However, there is hope: (i) the error output is quite instructive (at least for Linux users), (ii) building is incremental, so you can add successively. +It just takes patience. +As a shortcut, consider using `pak` ([from r-lib](https://pak.r-lib.org)) or `r2u` ([apt repository](https://github.com/eddelbuettel/r2u)) to implicitly deal with the system dependencies. +Generally, remember which system powers your container (Debian/Ubuntu), find help online, and document your progress. + +{{% callout note %}} +Dockerfiles offer some room for optimization. +For example, every `RUN` is a "Layer"; you should put stable layers top and volatile layers later. +In principle, it is recommended to combine layers as much as possible. + +More here: <https://docs.docker.com/build/building/best-practices> +{{% /callout %}} + +Test the image: + +``` sh +docker build -t test-rstudio . +``` + +Run it, as before: + +``` sh +docker run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD test-rstudio +``` + +Another good practice is to extract modifications in scripts and modularly bring them in to be executed upon installation ([see here](https://stackoverflow.com/q/69167940), [and here](https://rocker-project.org/use/extending.html#install2.r)), via `COPY`. +This exposes them to a more refined version control on the host machine. +As you know, [version control is key!](https://tutorials.inbo.be/tags/git) + +But, on that line, how about private repositories? +More generally, how would we get (personal) data from our host machine to the container? + +## Data Exchange + +Arguably, among the rather tricky tasks when working with containers is file exchange. +There are [several options available](https://forums.docker.com/t/best-practices-for-getting-code-into-a-container-git-clone-vs-copy-vs-data-container/4077): + +- `COPY` in the Dockerfile (or `ADD` [in appropriate cases](https://www.docker.com/blog/docker-best-practices-understanding-the-differences-between-add-and-copy-instructions-in-dockerfiles)) +- ["bind mounts"](https://docs.docker.com/engine/storage/bind-mounts) +- [volumes](https://docs.docker.com/engine/storage/volumes) +- R's own ways of installing from far (e.g. `remotes::install_github()`) + +For the use case of [installing R packages from a private git repo](https://www.geeksforgeeks.org/how-to-clone-private-git-repo-with-dockerfile), there are several constraints: + +- It best happens at build time, to enable all the good stuff: `--rm`, sharing, ... +- Better keep your credentials (e.g. ssh keys, access tokens) off the container, both system side and [on the R side](https://usethis.r-lib.org/articles/git-credentials.html). +- On the other hand, updates can often happen by re-building. + +In this (and only this) situation, the simple solution is to copy a clone of the repository to the container, and then install it. +The `git clone` should reside within the Dockerfile folder. +Then the Dockerfile section can look like the following: + + # copy the repo + COPY my_private_repo /opt/my_private_repo + + # manually install dependencies + RUN R -q -e 'install.packages("remotes", dependencies = TRUE)' + + # install package from folder + RUN R -q -e 'install.packages("/opt/my_private_repo", repos = NULL, type = "source", dependencies = TRUE)' + +This way of handling private repositories [seems to be good practice](https://stackoverflow.com/questions/23391839/clone-private-git-repo-with-dockerfile/55761914#55761914), for being simple, secure, and generally most feasible. + +The next best alternative would be mounting the `~/.ssh` folder from the host to the container via `-v`. + + +<a id="sec-commands"></a> +# Useful Commands + +We have briefly seen `docker --version`, `docker build`, `docker run`, and there are certainly more settings and tweaks on these commands to learn about. + +There are other Docker commands which might help you out of a temporary misery. + +- First and foremost, `docker --help` will list the available commands and options. +- `docker run -it --entrypoint /bin/bash <image>` or `docker run -it <image> /bin/bash` brings you to the shell of a container; you can update, upgrade, or just mess around. Try `bash` or `bin/sh` as alternatives. +- `docker images` will list your images in convenient table format; the `-q` flag returns only IDs. +- `docker inspect <image-name or image-id>` brings up all the configuration details about a specific image; you can, for example, find out its Docker version and network IP address. +- `docker ps` ("print status") will list all running containers; `docker stop $(docker ps -a -q)` will stop them **all**. +- Be aware that docker images occupy a considerable amount of hard disk space. `docker rmi <image-name or image-id>` will remove an image; `docker rmi $(docker images -q)` will remove **all** your images. The command `docker system prune` provides an interactive cleanup, `docker system prune --all` will clean up non-interactively. Of course, you get to keep the Dockerfiles. +- `docker commit` and `docker diff` support the creation and maintenance of snapshots of processed images, which you could keep locally, or upload them to an online storage such as Docker Hub. + +There are a gazillion more to choose and use. +A more complete list can be found [here, for example](https://do4ds.com/chapters/append/cheatsheets.html#cheat-docker), and the [Docker docs](https://docs.docker.com/reference/cli/docker) are your go-to source. + + +One more note on the `ENTRYPOINT`: +It defines through which terminal or script the user will access the container. +For example, `/bin/bash`, `/usr/bin/bash` or `bin/sh` are the bash (Linux terminal on the container). +Rocker images usually enter into an R console, or monitor an RStudio server, via an `/init` script. +The flask container above runs a script which hosts your website and Python. +Anything is possible. +You can define an entrypoint in the Dockerfile (i.e. set a default), or overwrite it on each `run`. + + +<a id="sec-podman"></a> +# Podman + +## Purpose + +There are alternative approaches to containerization which mitigate some of the Docker limitations and disadvantages. + +The most prominent one (or rather the only one *I* looked at, sorry) might be `podman`. +Vocabulary is marginally different: a container is a "pod", they run on a "machine", and this FOSS tool helps you to manage them. +One major advantage of Podman is that it can be configured to run **"rootless"**, i.e. without administrator rights [^7]. +A second advantage is that it is "all community", full Free and Open Source: it does not promote and "enterprise edition". + +Podman is [well documented](https://podman.io/docs/installation). +Another reliable source as so often is the [Arch Linux wiki on Podman](https://wiki.archlinux.org/title/Podman), no matter which Linux you are on. +Windows users have succeeded in running Podman through a WSL. + +{{% callout note %}} +For Windows, there is a convenient "Podman Desktop" GUI which guides you through the installation and setup, including WSL instantiation. +It is intuitive, transparent (telemetry opt-out), backed by RedHat. + +Unfortunately, it relies on Windows Subsystem for Linux (WSL), which is not available for INBO users at the moment. + +:( + +We are working on it. +{{% /callout %}} + +## Setup + +The instructions below were tested on Arch Linux, but generalize easily. + +I follow the `podman` installation instructions for Arch Linux, to set up a **rootless container environment**. + +Installation: + +``` sh +pacman -Sy podman podman-docker passt +``` + +The last one, `passt` (providing `pasta`, yum!), is required for rootless network access. +Optionally, there is `podman-compose`. + +Originally, Podman was designed to run *only if you are root*, just like Docker. +However, we experienced that it now comes in *rootless* configuration per default ([further instructions](https://man.archlinux.org/man/podman.1#Rootless_mode)). +Just to be safe, I briefly list the major configuration steps. + +The first step is to confirm a required kernel module: check that `unpriviledged_users_clone` is set to one. + +``` sh +sysctl kernel.unprivileged_userns_clone +``` + +Then, configure "subordinate user IDs". +There are detail differences in each Linux distribution; with some luck, your username is already present in these lists: + +``` sh +cat /etc/subuid +cat /etc/subgid +``` + +If not, you can be admitted to the club of subordinates with the command: + +``` sh +usermod --add-subuids 100000-165535 --add-subgids 100000-165535 <username> +podman system migrate +``` + +We note some useful commands on the way: `podman system ...` and `podman info`. +You might immediately check "native rootless overlays" (has something to do with mounting filesystems in the container): + +``` sh +podman info | grep -i overlay +``` + +Then, networking: pods might need to communicate to each other and to the world. +And, of course, container storage: make sure you know where your containers are stored. +These and more settings are in `/etc/containers/containers.conf` and `/etc/containers/storage.conf`; make sure to scan and edit them to your liking. + +## Usage + +You can use images from `docker.io` with Podman. +The only difference from Docker is the explicit mention of the source, `docker.io`. +For example: + +``` sh +podman search docker.io/alpine +podman pull docker.io/alpine # download a machine +podman run -it docker.io/alpine # will connect to the container +exit +``` + +## Limitations + +Note that at least some `docker.io` images will not work: I actually experienced issues with the "rootless Docker image": + +``` sh +# podman run --rm -it docker.io/docker:25.0-dind-rootless +``` + +However, it is logical that that one does not work: it builds a (root-level, <a href="#sec-rootless" class="quarto-xref">Section 3.5</a>) Docker which is supposed to contain a rootless Docker. +The outer Docker layer requires root, which Podman cannot provide. + +This is a logical case; if you understand it, congratulations: you have achieved a basic understanding of containers and user privileges :) +There might be yet other images which do not work by default and require additional tinkering in Podman, due to its altered design. +Most use cases are covered, for example a containerized R environment. + +## Podman Rocker + +From here, **Podman is a full drop-in replacement for Docker**; just that you are not forced to grant host system root privileges to containers. + +Any Dockerfile should work, with the mentioned mini-adjustment to `FROM`. +And you can use any Docker image; `docker.io/rocker/rstudio` [is available](https://rocker-project.org/use/rootless-podman.html) (don't forget to specify the port). +You may even write `docker` in the terminal: it will alias to `podman` (via the `podman-docker` package on Linux, or an alias). + +``` sh +podman run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD -v /data/git/coding-club:/root/coding-club docker.io/rocker/rstudio +``` + +There is another subtle change: the default user to login to `rstudio` is not `rstudio`, but `root`, because for some reason RStudio needs to have root rights on the container. +You had those before anyways, but now they are confined to within the pod. +There might be workarounds, which I will explore. + +{{% callout note %}} +To summarize the Podman experience: + +- **Docker's Dockerfiles like the one above will build equally well on Podman, except for micro-adjustments compared to Docker.** +- You can even stick to the `docker` commands thanks to the `podman-docker` package. +- There is Podman Desktop, if you like clicking. +- Podman is everything Docker is, just minimally different, and more secure, full FOSS. +{{% /callout %}} + +Kudos to the Podman devs! + +# Summary + +In this tutorial, I demonstrated the basics of containerization with Docker and Podman. +There are convenient GUI apps, and sophisticated terminal commands, the latter are much more powerful. + +Personally, I find the concept of containerization fascinating, and was surprised how simple and useful of a trick it is. + +Containerization offers the advantages of modularity, configurability, transparency (open science: share your rocker file), shared use ... +There are some manageable pitfalls with respect to admin rights and resource limitation. + +This was just a quick tour; I brushed over a lot of novel vocabulary with which you will have to familiarize yourself. +Your head might be twisting in a swirl of containers by now. +I hope you find this overview useful, nevertheless. +Thank you for reading! + + +[^2]: I mostly follow [this tutorial](https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html). + +[^3]: Just like "Github" is a server service to store git repositories, guess what: "Docker Hub" is a hosting service to store Docker containers. + +[^4]: Here I quoted the docs (<https://docs.docker.com/build/concepts/dockerfile>) before having read them. + +[^5]: If you did not install the `buildx` package on Linux, you will read a legacy warning. + +[^6]: Reference to the film "La Grande Bellezza". + +[^7]: Daniel J. Walsh (2019): "How does rootless Podman work?" <https://opensource.com/article/19/2/how-does-rootless-podman-work> diff --git a/content/tutorials/development_containers1/index.md b/content/tutorials/development_containers1/index.md new file mode 100644 index 000000000..62dd6b4ff --- /dev/null +++ b/content/tutorials/development_containers1/index.md @@ -0,0 +1,373 @@ +--- +title: "Containers: An Overview" +description: "Introduction to containerization and the practical use of Docker-like tools." +date: "2025-02-21" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + + +You might have heard about "containerization" with [**Docker**](https://docs.docker.com). +Docker has been labeled "the *Holy Grail* of reproducibility" in [The Open Science Manual by Claudio Zandonella Callegher and Davide Massidda (2023)](https://arca-dpss.github.io/manual-open-science/docker-chapter.html). +Although containerization is an immensely useful Open Science tool worth striving for, the *Holy Grail* is an inaccurate metaphor, because + +- (i) Unlike The Grail, Docker is easy to find and accessible. +- (ii) Docker alone does not make a reproducible workflow; some of its capability is occasionally confused with package version management. +- (iii) Docker has issues, some of them mitigated by configuration adjustment or switching to "Podman". + + +<figure> +<img src="../../images/tutorials/development_docker/Gemini_Generated_Image_ngoz1wngoz1wngoz.jpg" alt="build" /> +<figcaption aria-hidden="true">I could not resist generating<sup id="fnref:gemini"><a class="footnote-ref" href="#fn:gemini" role="doc-noteref">*</a></sup> a catchy image on this, just to make this tutorial seem a little less dull. </figcaption> +</figure> + + +Time to explore what containers really are, and what they are not. + + +# Overview + +There are many good applications for containers. + +One advantage of a container is its *mobility*: you can "bring it with you" to other workstations, host it for colleagues or readers, use cloud computing, mostly without having to worry about installation of the components. +Containers pay off in complicated server setups and distributed computing. + +Yet they are also a matter of good *open science* practice: +you can document build instructions for a reproducible analysis environment, +or store and publish a whole image right away. + +In this notebook, you will find **installation instructions**, <a href="#sec-commands" class="quarto-xref"><b>useful commands</b></a>, references, and a loose assembly of general and almost philosophical topics to prime you on the **complications and misconceptions** surrounding containerization. + +There are numerous useful build instructions and container images already out there, which you can **simply `pull` and `run`**. +This is an easy, entry level application of container software like Docker, [covered in an introductory tutorial](../../tutorials/development_containers2_run). + +A second step is to set up and deploy a **self-`build` custom container** I demonstrate step-by-step [in a slightly more advanced tutorial](../../tutorials/development_containers3_build). +This is intended to be a rather general test case, enabling you to later configure more specific container solutions for your own purpose. +For example, you will learn how to spin up an existing `rocker/rstudio` container, and even modify it with additional system components and libraries. + +For relevant INBO-specific use cases, make sure to [check out the `containbo` repository](https://github.com/inbo/containbo) which documents **even more tipps and tricks** assembled during my humble (but mostly succesful) attempts to get INBO R packages to run in a container environment. + +I also present **Podman** as a [full replacement for Docker](../../tutorials/development_containers4_podman), and recommend to give it a try. + +On Windows, installation, configuration, and management of containers runs via the `docker desktop` app. +However, this series of tutorials also covers (and in fact focuses on) the terminal-centered steps to be executed on a Linux computer or within a WSL. + +Generally, if you are an INBO user, it is recommended to contact and involve your ICT department for support with the setup. + +# General References + +I follow other tutorials available online, and try to capture their essence for an INBO context. +Hence, this series is just an assembly of other tutorials, with references - no original ideas to be found herein, but nevertheless some guidance. +Here is an incomplete list of online material which you might find helpful. + +- <https://docs.docker.com> +- <https://podman.io/docs>, <https://github.com/containers/podman/blob/main/docs/tutorials/podman-for-windows.md> +- <https://github.com/inbo/contaINBO> +- <https://wiki.archlinux.org/title/Podman> +- <https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html> +- <https://medium.com/@geeekfa/docker-compose-setup-for-a-python-flask-api-with-nginx-reverse-proxy-b9be09d9db9b> +- <https://testdriven.io/blog/dockerizing-flask-with-postgres-gunicorn-and-nginx> +- <https://arca-dpss.github.io/manual-open-science/docker-chapter.html> +- <https://do4ds.com/chapters/sec1/1-6-docker.html> +- <https://colinfay.me/docker-r-reproducibility> +- <https://solutions.posit.co/envs-pkgs/environments/docker> + + +<a id="sec-installation"></a> +# Installation + +The installation procedure [is documented here](https://docs.docker.com/install). + +Docker comes with the *Docker Desktop* app. +That app by itself is trivial and hardly worth a tutorial. + +## Microsoft Windows + +Navigate to [the download site for Docker on Windows](https://docs.docker.com/desktop/setup/install/windows-install). +Download the "App" (newspeak for: graphical user interface to a software tool). +Install it. + +*Note for INBO users:* you might choose to select Hyper-V, instead of WSL, against Docker's recommendation (WSL is not working in our enterprise environment; however, we are trying to improve and ICT might help). +You probably do not have admin rights, which is good. +To re-iterate: **ask our friendly ICT helpdesk for support right away.** + +<figure> +<img src="../../images/tutorials/development_docker/docker_desktop1.jpg" alt="desktop app" /> +<figcaption aria-hidden="true">The Desktop App.</figcaption> +</figure> + +Using a convenient app is possible with "Docker Desktop". +On Windows, you can download and install it with administrator rights. +On Linux, that same `docker-desktop` [is available for installation](https://docs.docker.com/desktop/setup/install/linux). +Yet while automating some aspects, the app is not entirely transparent on telemetry and advertisement; some anti-features are included (e.g. required login). +This is unfortunate, because it makes the app less open for more privacy-concerned users. + +The terminal aspect of Docker is entirely free and open source, and universally accessible. +This is why the rest of this tutorial will focus on terminal access. + +## Terminal + +On the Windows terminal or Linux shell, you can install `docker` as a terminal tool. + +{{% callout note %}} +On Windows, this comes bundled with the App; the steps below are not necessary. +There might be ways to get around the Desktop App and facilitate installation, either via WSL2 or using [a windows package manager called Chocolatey](https://en.wikipedia.org/wiki/Chocolatey). + +Either way, note that you need to run the docker app or docker in a terminal *as administrator*. +{{% /callout %}} + +More info about the installation on your specific Linux operation systems [can be found here](https://docs.docker.com/engine/install). +The procedure for Debian or Ubuntu-based distributions involves trusting dockers gpg keys and adding an extra repository, [some caution is warranted](https://wiki.debian.org/DontBreakDebian). + +``` sh +#| eval: false +sudo apt update && sudo apt install docker-ce docker-buildx-plugin # debian-based +# sudo pacman -Sy docker docker-buildx # Arch Linux +``` + +As you will notice, this installs a "CE" version of Docker, `docker-ce`. +CE stands for "community edition", as opposed to "enterprise edition" ([cf. here](https://www.geeksforgeeks.org/docker-community-edition-vs-enterprise-edition)). +Many features which you would take for granted in this kind of software (security, consistency, scalability) are handled differently in the two editions, thus it is worth knowing the difference and considering the alternatives. + +For users to be able to use Docker, they must be in the "docker" group. +(Insert your username at `<your-username>`.) + +``` sh +#| eval: false +sudo usermod -a -G docker <your-username> +``` + +For this change to take effect, log off and log in again and restart the Docker service if it was running. + +Containers are managed by a system task ("service" and "socket") which need to be started. +Most likely, your Linux uses `systemd`. +Your system can start and stop that service automatically, by using `systemctl enable <...>`. +However, due to [diverse](https://docs.docker.com/engine/security) [security](https://github.com/moby/moby/issues/9976) [pitfalls](https://snyk.io/blog/top-ten-most-popular-docker-images-each-contain-at-least-30-vulnerabilities), it is good practice to **not keep it enabled** permanently on your system (unless, of course, if you use it all the time). + +On a `systemd` system, you can start and stop Docker on demand via the following commands (those will ask you for `sudo` authentification if necessary). + +``` sh +#| eval: false +systemctl start docker + +systemctl status docker # check status + +systemctl stop docker.socket +systemctl stop docker.service +``` + +For aficionados: docker actually runs multiple services: the docker service, the docker socket, and the [container daemon](https://www.docker.com/blog/containerd-vs-docker) `containerd`. + +You can check the Docker installation by confirming the version at which the service is running. + +``` sh +#| eval: false +docker --version +``` + +Congratulations: now the fun starts! + +With docker installed, the next step is to run a container image which someone else has prepared and hosted online, [which you can read about in the next tutorial](../../tutorials/development_containers2_run). + +# The Holy Grail? + +Yet to know what containers can achieve and what not, it is useful to understand their general workings, quirks, and relation to other tools. + +## Relation to Version Control and Version Management + +Back to the initial paradigm of reproducibility: +*What exactly is the Open Science aspect of containerization?* + +This question might have led to some confusion, and I would like to throw in a paragraph of clarification. +A crucial distinction lies in the preparation of *Dockerfiles* (i.e. build instructions) and the preservation of *images* (i.e. end products of a build process). + +One purpose of a Dockerfile may be that you document the exact components of your system environment. +You start at a base image (e.g. a `rocker`) and add additional software via Dockerfile layers. +This is good practice, and encouraged: if you publish an analysis, provide a tested container recipe with it. + +However, this alone does not solve the problem of version conflicts and deprecation. +Documenting the versions of packages you used is an extra step, for which [other tools are available](https://doi.org/10.1038/d41586-023-01469-0): + +- It is good practice to report the exact versions of the software used upon publication ([see here, for example](https://arca-dpss.github.io/manual-open-science/requirements-chapter.html)). This is best achieved via virtual environments. +- Version control such as `git` will track the changes within your own texts, scripts, even version snapshots and Dockerfiles. +- Finally, docker images can serve as a snapshot of a (virtual) machine on which your code would run. + +{{% callout note %}} +The simple rule of thumb is: use all three methods, ideally all the time. + +Virtual environments. +Version control. +Snapshots. + +Get used to them. +They are easy. +They will save you time and trouble almost immediately. +{{% /callout %}} + +But unless you use them already, you might require some starting points and directions: here we go. +The second point, **version control**, is a fantastic tool to enable open science, and avoid personal trouble. +It might have a steep learning curve, yet [there](https://rstudio.github.io/cheatsheets/git-github.pdf) [are](https://www.sourcetreeapp.com) [fantastic](https://magit.vc) [tools](https://www.sublimemerge.com) to get you started. +You will [find starting points and help in other tutorials on this website](https://tutorials.inbo.be/tags/git). +The other point, version documentation, is trivially achieved by manual storage of currently installed versions via `sessionInfo()` in R, or `pip freeze > versions.txt` for Python. +A small step towads somewhat more professionalism are **virtual environments**. +Those exist for R ([renv](https://rstudio.github.io/renv/articles/renv.html)) or Python ([venv](https://docs.python.org/3/library/venv.html)). +The `pak` library in R can [handle lock files conveniently](https://pak.r-lib.org/reference/lockfile_install.html) with `pak::lockfile_install()`. +Then there is the integration of R, Python and system packages in `conda`-like tools ([e.g. micromamba](https://mamba.readthedocs.io/en/latest)). +There are even system level tools, for example [`nix` and `rix`](https://docs.ropensci.org/rix). + +The methods are not mutually exclusive: +all Dockerfiles, build recipes and scripts to establish virtual environments should generally be subject to version control. + +However, documenting the exact tools and versions used in a project does not guarantee that these versions will be accessible to future investigators (like oneself, trying to reproduce an analysis five years later). +This is where **Docker images** come in. +Docker images are the actual containers which you create from the Dockerfile blueprints by the process of building. +In the "tiny home" metaphor: your "image" is the physical (small, but real, DIY-achievement) home to live in, built from step-by-step instructions. +Think of a Docker image as a virtual copy of your computer which you store for later re-activation. +For example, a collection of images for specific analysis pipelines at INBO are preserved at [Docker Hub/inbobmk](https://hub.docker.com/u/inbobmk). +We consider these "stable" versions because they could be re-activated no matter what crazy future updates will shatter the R community, which enables us to return to all details of previous analyses. + +Some confusion might arise from the fact that managing these image snapshots is achieved with the same vocabulary as version control, for example you would ["commit"](https://docs.docker.com/reference/cli/docker/container/commit) updated versions and ["push"](https://docs.docker.com/reference/cli/docker/image/push) them to a container repository. + +Even more confusion might arise from the fact that you also find ready-made images online, e.g. on [Docker Hub](https://hub.docker.com), or [Quai](https://quay.io), or elsewhere. +These provide images of (recent) versions of working environments, supposed to stand in as starting points for derived containers. +Hence, be aware of the dual use case of images: (i) the dynamic, universal base image which improves efficiency and (ii) the static, derived, bespoke image which you created for your analysis (shared with the world for reproducibility). + +And, once more, those images are not a "holy grail" solution: they are not entirely system independent (e.g. processor architecture), and they might occupy a considerable amount of hard disk space (Dockerfile optimization is warranted). +Ideally, to be a "full stack open science developer", you want to implement **a mixed strategy** consisting virtual environments and containers, wrapped in version control and stored in a backup image. + + +<a id="sec-rootless"></a> +## "Because Roots Are Important": Rootless Mode[^2] + +One of the main criticism about Docker is the necessity to run in a privileged user environment, which is indeed a security issue. +This may refer to the system process requiring elevated privileges, or users in the `docker` system group [effectively having superuser privileges](https://github.com/moby/moby/issues/9976). +Because of the risk of privilege escalation in case of a container breakout, this situation would worsen existing vulnerabilities, [of which there are some](https://snyk.io/blog/top-5-docker-security-vulnerabilities) in [Docker containers](https://www.docker.com/blog/container-security-and-why-it-matters). + +Historically, Docker could not run "rootless", i.e. without elevated privileges. +[This seems to have changed](https://docs.docker.com/engine/security/rootless), according to Docker. +Some caution is still warranted: the setup procedure requires downloading and running shell scripts (which must be checked); the deamon still builds on `systemd` (*usually* root level); some functionality is limited. + +On the other hand, there is Podman (cf. the [Podman tutorial](../../tutorials/development_containers4_podman)). +It *used to* require almost the same extra steps as the `docker-rootless` to work rootless, but we found that these requirements are now met per default. +It seems that, at the time of writing, Docker and Podman have identical capabilities in terms of rootless containerization. +The remaining difference is that Podman seems to have more sensible default settings. + +It might therefore be worth considering and exchanging both tools. + +But, on that line, how about private repositories? +More generally, how would we get (personal) data from our host machine to the container? + +## Data Exchange + +Arguably, among the rather tricky tasks when working with containers is file exchange. +There are [several options available](https://forums.docker.com/t/best-practices-for-getting-code-into-a-container-git-clone-vs-copy-vs-data-container/4077): + +- `COPY` in the Dockerfile (or `ADD` [in appropriate cases](https://www.docker.com/blog/docker-best-practices-understanding-the-differences-between-add-and-copy-instructions-in-dockerfiles)) +- ["bind mounts"](https://docs.docker.com/engine/storage/bind-mounts) +- [volumes](https://docs.docker.com/engine/storage/volumes) +- R's own ways of installing from far (e.g. `remotes::install_github()`) + +For the use case of [installing R packages from a private git repo](https://www.geeksforgeeks.org/how-to-clone-private-git-repo-with-dockerfile), there are several constraints: + +- It best happens at build time, to enable all the good stuff: `--rm`, sharing, ... +- Better keep your credentials (e.g. ssh keys, access tokens) off the container, both system side and [on the R side](https://usethis.r-lib.org/articles/git-credentials.html). +- On the other hand, updates can often happen by re-building. + +In this (and only this) situation, the simple solution is to copy a clone of the repository to the container, and then install it. +The `git clone` should reside within the Dockerfile folder. +Then the Dockerfile section can look like the following: + +``` +# copy the repo +COPY my_private_repo /opt/my_private_repo + +# manually install dependencies +RUN R -q -e 'install.packages("remotes", dependencies = TRUE)' + +# install package from folder +RUN R -q -e 'install.packages("/opt/my_private_repo", repos = NULL, type = "source", dependencies = TRUE)' +``` + +This way of handling private repositories [seems to be good practice](https://stackoverflow.com/questions/23391839/clone-private-git-repo-with-dockerfile/55761914#55761914), for being simple, secure, and generally most feasible. + +The next best alternative would be mounting the `~/.ssh` folder from the host to the container via `-v`. + +You can finde some more options [on the `containbo` repository](https://github.com/inbo/containbo). + + +<a id="sec-commands"></a> +# Useful Commands + +You will certainly encounter `docker --version`, `docker run`, and `docker build` in this series of tutorials, and there are certainly more settings and tweaks on these commands to learn about. + +There are other Docker commands which might help you out of a temporary misery. + +- First and foremost, `docker --help` will list the available commands and options. +- `docker run -it --entrypoint /bin/bash <image>` or `docker run -it <image> /bin/bash` brings you to the shell of a container; you can update, upgrade, or just mess around. Try `bash` or `bin/sh` as alternatives. +- `docker images` will list your images in convenient table format; the `-q` flag returns only IDs. +- `docker inspect <image-name or image-id>` brings up all the configuration details about a specific image; you can, for example, find out its Docker version and network IP address. +- `docker ps` ("print status") will list all running containers; `docker stop $(docker ps -a -q)` will stop them **all**. +- Be aware that docker images occupy a considerable amount of hard disk space. `docker rmi <image-name or image-id>` will remove an image; `docker rmi $(docker images -q)` will remove **all** your images. The command `docker system prune` provides an interactive cleanup, `docker system prune --all` will clean up non-interactively. Of course, you get to keep the Dockerfiles. +- `docker commit` and `docker diff` support the creation and maintenance of snapshots of processed images, which you could keep locally, or upload them to an online storage such as Docker Hub. + +There are a gazillion more to choose and use. +A more complete list can be found [here, for example](https://do4ds.com/chapters/append/cheatsheets.html#cheat-docker), and the [Docker docs](https://docs.docker.com/reference/cli/docker) are your go-to source. + +One more note on the `ENTRYPOINT`: +It defines through which terminal or script the user will access the container. +For example, `/bin/bash`, `/usr/bin/bash` or `bin/sh` are the bash (Linux terminal on the container). +Rocker images usually enter into an R console, or monitor an RStudio server, via an `/init` script. +The flask container above runs a script which hosts your website and Python. +Anything is possible. +You can define an entrypoint in the Dockerfile (i.e. set a default), or overwrite it on each `run`. + +# Summary + +In this series of tutorials, I demonstrate the basics of containerization with Docker and Podman. +There are convenient GUI apps, and sophisticated terminal commands, the latter are much more powerful. +This particular notebook assembled references, useful commands, information about the installation of Docker, and general considerations. + +This is the central node of a series of tutorials; the others are: +- Running containers: [https://tutorials.inbo.be/tutorials/development_containers2_run](../development_containers2_run) +- Building containers: [https://tutorials.inbo.be/tutorials/development_containers3_build](../development_containers3_build) +- Advanced Build Recipes: <https://github.com/inbo/containbo> +- Switching to Podman: [https://tutorials.inbo.be/tutorials/development_containers4_podman](../development_containers4_podman) + +Personally, I find the concept of containerization fascinating, and was surprised how simple and useful of a trick it is. + +Containerization offers the advantages of modularity, configurability, transparency (open science: share your rocker file), shared use ... +There are some manageable pitfalls with respect to admin rights and resource limitation. + +This was just a quick tour; I brushed over a lot of novel vocabulary with which you will have to familiarize yourself. +Your head might be twisting in a swirl of containers by now. +I hope you find this overview useful, nevertheless. +Thank you for reading! + + +<hr> +<ol> +<li id="fn:gemini" role="doc-endnote"> +<sup>*</sup> <p>Generated by Google Gemini (2025-02-21), modified. Prompt `I would love to have a comic-style image of a whale in a grail. The grail should be golden and shiny, resembling the holy grail. The whale on top is a reference to the docker logo (you may add sketchy little container blocks on its back).` +<a href="#fnref:gemini" class="footnote-backref" role="doc-backlink">↩︎</a> </p> +</li> </ol> + +[^2]: Reference to the film "La Grande Bellezza". diff --git a/content/tutorials/development_containers1/index.qmd b/content/tutorials/development_containers1/index.qmd new file mode 100644 index 000000000..9226bcb7b --- /dev/null +++ b/content/tutorials/development_containers1/index.qmd @@ -0,0 +1,407 @@ +--- +title: ""Containers: An Overview"" +description: "Introduction to containerization and the practical use of Docker-like tools." +date: "2025-02-21" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + +You might have heard about "containerization" with [**Docker**](https://docs.docker.com). +Docker has been labeled "the *Holy Grail* of reproducibility" in [The Open Science Manual by Claudio Zandonella Callegher and Davide Massidda (2023)](https://arca-dpss.github.io/manual-open-science/docker-chapter.html). +Although containerization is an immensely useful Open Science tool worth striving for, the *Holy Grail* is an inaccurate metaphor, because + +- (i) Unlike The Grail, Docker is easy to find and accessible. +- (ii) Docker alone does not make a reproducible workflow; some of its capability is occasionally confused with package version management. +- (iii) Docker has issues, some of them mitigated by configuration adjustment or switching to "Podman". + +<figure> +<img src="../../images/tutorials/development_docker/Gemini_Generated_Image_ngoz1wngoz1wngoz.jpg" alt="build" /> +<figcaption aria-hidden="true">I could not resist generating<sup id="fnref:gemini"><a class="footnote-ref" href="#fn:gemini" role="doc-noteref">*</a></sup> a catchy image on this, just to make this tutorial seem a little less dull. </figcaption> +</figure> + + + + +Time to explore what containers really are, and what they are not. + +# Overview + +There are many good applications for containers. + +One advantage of a container is its *mobility*: you can "bring it with you" to other workstations, host it for colleagues or readers, use cloud computing, mostly without having to worry about installation of the components. +Containers pay off in complicated server setups and distributed computing. + +Yet they are also a matter of good *open science* practice: +you can document build instructions for a reproducible analysis environment, +or store and publish a whole image right away. + + +In this notebook, you will find **installation instructions**, [**useful commands**]([@sec-commands]), references, and a loose assembly of general and almost philosophical topics to prime you on the **complications and misconceptions** surrounding containerization. + + +There are numerous useful build instructions and container images already out there, which you can **simply `pull` and `run`**. +This is an easy, entry level application of container software like Docker, [covered in an introductory tutorial](../../tutorials/development_containers2_run). + + +A second step is to set up and deploy a **self-`build` custom container** I demonstrate step-by-step [in a slightly more advanced tutorial](../../tutorials/development_containers3_build). +This is intended to be a rather general test case, enabling you to later configure more specific container solutions for your own purpose. +For example, you will learn how to spin up an existing `rocker/rstudio` container, and even modify it with additional system components and libraries. + + +For relevant INBO-specific use cases, make sure to [check out the `containbo` repository](https://github.com/inbo/containbo) which documents **even more tipps and tricks** assembled during my humble (but mostly succesful) attempts to get INBO R packages to run in a container environment. + + +I also present **Podman** as a [full replacement for Docker](../../tutorials/development_containers4_podman), and recommend to give it a try. + + +On Windows, installation, configuration, and management of containers runs via the `docker desktop` app. +However, this series of tutorials also covers (and in fact focuses on) the terminal-centered steps to be executed on a Linux computer or within a WSL. + +Generally, if you are an INBO user, it is recommended to contact and involve your ICT department for support with the setup. + + +# General References + +I follow other tutorials available online, and try to capture their essence for an INBO context. +Hence, this series is just an assembly of other tutorials, with references - no original ideas to be found herein, but nevertheless some guidance. +Here is an incomplete list of online material which you might find helpful. + +- <https://docs.docker.com> +- <https://podman.io/docs>, <https://github.com/containers/podman/blob/main/docs/tutorials/podman-for-windows.md> +- <https://github.com/inbo/contaINBO> +- <https://wiki.archlinux.org/title/Podman> +- <https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html> +- <https://medium.com/@geeekfa/docker-compose-setup-for-a-python-flask-api-with-nginx-reverse-proxy-b9be09d9db9b> +- <https://testdriven.io/blog/dockerizing-flask-with-postgres-gunicorn-and-nginx> +- <https://arca-dpss.github.io/manual-open-science/docker-chapter.html> +- <https://do4ds.com/chapters/sec1/1-6-docker.html> +- <https://colinfay.me/docker-r-reproducibility> +- <https://solutions.posit.co/envs-pkgs/environments/docker> + + +<a id="sec-installation"></a> +# Installation + +The installation procedure [is documented here](https://docs.docker.com/install). + +Docker comes with the *Docker Desktop* app. +That app by itself is trivial and hardly worth a tutorial. + + +## Microsoft Windows + +Navigate to [the download site for Docker on Windows](https://docs.docker.com/desktop/setup/install/windows-install). +Download the "App" (newspeak for: graphical user interface to a software tool). +Install it. + + +*Note for INBO users:* you might choose to select Hyper-V, instead of WSL, against Docker's recommendation (WSL is not working in our enterprise environment; however, we are trying to improve and ICT might help). +You probably do not have admin rights, which is good. +To re-iterate: **ask our friendly ICT helpdesk for support right away.** + + + + +Using a convenient app is possible with "Docker Desktop". +On Windows, you can download and install it with administrator rights. +On Linux, that same `docker-desktop` [is available for installation](https://docs.docker.com/desktop/setup/install/linux). +Yet while automating some aspects, the app is not entirely transparent on telemetry and advertisement; some anti-features are included (e.g. required login). +This is unfortunate, because it makes the app less open for more privacy-concerned users. + + +The terminal aspect of Docker is entirely free and open source, and universally accessible. +This is why the rest of this tutorial will focus on terminal access. + + +## Terminal + +On the Windows terminal or Linux shell, you can install `docker` as a terminal tool. + +:::{.callout-note} +On Windows, this comes bundled with the App; the steps below are not necessary. +There might be ways to get around the Desktop App and facilitate installation, either via WSL2 or using [a windows package manager called Chocolatey](https://en.wikipedia.org/wiki/Chocolatey). + +Either way, note that you need to run the docker app or docker in a terminal *as administrator*. +::: + + +More info about the installation on your specific Linux operation systems [can be found here](https://docs.docker.com/engine/install). +The procedure for Debian or Ubuntu-based distributions involves trusting dockers gpg keys and adding an extra repository, [some caution is warranted](https://wiki.debian.org/DontBreakDebian). + +```sh +#| eval: false +sudo apt update && sudo apt install docker-ce docker-buildx-plugin # debian-based +# sudo pacman -Sy docker docker-buildx # Arch Linux +``` + + +As you will notice, this installs a "CE" version of Docker, `docker-ce`. +CE stands for "community edition", as opposed to "enterprise edition" ([cf. here](https://www.geeksforgeeks.org/docker-community-edition-vs-enterprise-edition)). +Many features which you would take for granted in this kind of software (security, consistency, scalability) are handled differently in the two editions, thus it is worth knowing the difference and considering the alternatives. + + +For users to be able to use Docker, they must be in the "docker" group. +(Insert your username at `<your-username>`.) + +```sh +#| eval: false +sudo usermod -a -G docker <your-username> +``` + +For this change to take effect, log off and log in again and restart the Docker service if it was running. + + +Containers are managed by a system task ("service" and "socket") which need to be started. +Most likely, your Linux uses `systemd`. +Your system can start and stop that service automatically, by using `systemctl enable <...>`. +However, due to [diverse](https://docs.docker.com/engine/security) [security](https://github.com/moby/moby/issues/9976) [pitfalls](https://snyk.io/blog/top-ten-most-popular-docker-images-each-contain-at-least-30-vulnerabilities), it is good practice to **not keep it enabled** permanently on your system (unless, of course, if you use it all the time). + + +On a `systemd` system, you can start and stop Docker on demand via the following commands (those will ask you for `sudo` authentification if necessary). + +```sh +#| eval: false +systemctl start docker + +systemctl status docker # check status + +systemctl stop docker.socket +systemctl stop docker.service +``` + + +For aficionados: docker actually runs multiple services: the docker service, the docker socket, and the [container daemon](https://www.docker.com/blog/containerd-vs-docker) `containerd`. + + + +You can check the Docker installation by confirming the version at which the service is running. + +```sh +#| eval: false +docker --version +``` + +Congratulations: now the fun starts! + + +With docker installed, the next step is to run a container image which someone else has prepared and hosted online, [which you can read about in the next tutorial](../../tutorials/development_containers2_run). + + +# The Holy Grail? + +Yet to know what containers can achieve and what not, it is useful to understand their general workings, quirks, and relation to other tools. + + +## Relation to Version Control and Version Management + +Back to the initial paradigm of reproducibility: +*What exactly is the Open Science aspect of containerization?* + +This question might have led to some confusion, and I would like to throw in a paragraph of clarification. +A crucial distinction lies in the preparation of *Dockerfiles* (i.e. build instructions) and the preservation of *images* (i.e. end products of a build process). + + +One purpose of a Dockerfile may be that you document the exact components of your system environment. +You start at a base image (e.g. a `rocker`) and add additional software via Dockerfile layers. +This is good practice, and encouraged: if you publish an analysis, provide a tested container recipe with it. + +However, this alone does not solve the problem of version conflicts and deprecation. +Documenting the versions of packages you used is an extra step, for which [other tools are available](https://doi.org/10.1038/d41586-023-01469-0): + +- It is good practice to report the exact versions of the software used upon publication ([see here, for example](https://arca-dpss.github.io/manual-open-science/requirements-chapter.html)). This is best achieved via virtual environments. +- Version control such as `git` will track the changes within your own texts, scripts, even version snapshots and Dockerfiles. +- Finally, docker images can serve as a snapshot of a (virtual) machine on which your code would run. + +:::{.callout-tip} +The simple rule of thumb is: use all three methods, ideally all the time. + +Virtual environments. +Version control. +Snapshots. + +Get used to them. +They are easy. +They will save you time and trouble almost immediately. +::: + + +But unless you use them already, you might require some starting points and directions: here we go. +The second point, **version control**, is a fantastic tool to enable open science, and avoid personal trouble. +It might have a steep learning curve, yet [there](https://rstudio.github.io/cheatsheets/git-github.pdf) [are](https://www.sourcetreeapp.com) [fantastic](https://magit.vc) [tools](https://www.sublimemerge.com) to get you started. +You will [find starting points and help in other tutorials on this website](https://tutorials.inbo.be/tags/git). +The other point, version documentation, is trivially achieved by manual storage of currently installed versions via `sessionInfo()` in R, or `pip freeze > versions.txt` for Python. +A small step towads somewhat more professionalism are **virtual environments**. +Those exist for R ([renv](https://rstudio.github.io/renv/articles/renv.html)) or Python ([venv](https://docs.python.org/3/library/venv.html)). +The `pak` library in R can [handle lock files conveniently](https://pak.r-lib.org/reference/lockfile_install.html) with `pak::lockfile_install()`. +Then there is the integration of R, Python and system packages in `conda`-like tools ([e.g. micromamba](https://mamba.readthedocs.io/en/latest)). +There are even system level tools, for example [`nix` and `rix`](https://docs.ropensci.org/rix). + +The methods are not mutually exclusive: +all Dockerfiles, build recipes and scripts to establish virtual environments should generally be subject to version control. + + +However, documenting the exact tools and versions used in a project does not guarantee that these versions will be accessible to future investigators (like oneself, trying to reproduce an analysis five years later). +This is where **Docker images** come in. +Docker images are the actual containers which you create from the Dockerfile blueprints by the process of building. +In the "tiny home" metaphor: your "image" is the physical (small, but real, DIY-achievement) home to live in, built from step-by-step instructions. +Think of a Docker image as a virtual copy of your computer which you store for later re-activation. +For example, a collection of images for specific analysis pipelines at INBO are preserved at [Docker Hub/inbobmk](https://hub.docker.com/u/inbobmk). +We consider these "stable" versions because they could be re-activated no matter what crazy future updates will shatter the R community, which enables us to return to all details of previous analyses. + + +Some confusion might arise from the fact that managing these image snapshots is achieved with the same vocabulary as version control, for example you would ["commit"](https://docs.docker.com/reference/cli/docker/container/commit) updated versions and ["push"](https://docs.docker.com/reference/cli/docker/image/push) them to a container repository. + +Even more confusion might arise from the fact that you also find ready-made images online, e.g. on [Docker Hub](https://hub.docker.com), or [Quai](https://quay.io), or elsewhere. +These provide images of (recent) versions of working environments, supposed to stand in as starting points for derived containers. +Hence, be aware of the dual use case of images: (i) the dynamic, universal base image which improves efficiency and (ii) the static, derived, bespoke image which you created for your analysis (shared with the world for reproducibility). + + +And, once more, those images are not a "holy grail" solution: they are not entirely system independent (e.g. processor architecture), and they might occupy a considerable amount of hard disk space (Dockerfile optimization is warranted). +Ideally, to be a "full stack open science developer", you want to implement **a mixed strategy** consisting virtual environments and containers, wrapped in version control and stored in a backup image. + + +## "Because Roots Are Important": Rootless Mode[^6] {#sec-rootless} + +[^6]: Reference to the film "La Grande Bellezza". + +One of the main criticism about Docker is the necessity to run in a privileged user environment, which is indeed a security issue. +This may refer to the system process requiring elevated privileges, or users in the `docker` system group [effectively having superuser privileges](https://github.com/moby/moby/issues/9976). +Because of the risk of privilege escalation in case of a container breakout, this situation would worsen existing vulnerabilities, [of which there are some](https://snyk.io/blog/top-5-docker-security-vulnerabilities) in [Docker containers](https://www.docker.com/blog/container-security-and-why-it-matters). + + +Historically, Docker could not run "rootless", i.e. without elevated privileges. +[This seems to have changed](https://docs.docker.com/engine/security/rootless), according to Docker. +Some caution is still warranted: the setup procedure requires downloading and running shell scripts (which must be checked); the deamon still builds on `systemd` (*usually* root level); some functionality is limited. + + +On the other hand, there is Podman (cf. the [Podman tutorial](../../tutorials/development_containers4_podman)). +It *used to* require almost the same extra steps as the `docker-rootless` to work rootless, but we found that these requirements are now met per default. +It seems that, at the time of writing, Docker and Podman have identical capabilities in terms of rootless containerization. +The remaining difference is that Podman seems to have more sensible default settings. + +It might therefore be worth considering and exchanging both tools. + + + +But, on that line, how about private repositories? +More generally, how would we get (personal) data from our host machine to the container? + + +## Data Exchange + +Arguably, among the rather tricky tasks when working with containers is file exchange. +There are [several options available](https://forums.docker.com/t/best-practices-for-getting-code-into-a-container-git-clone-vs-copy-vs-data-container/4077): + +- `COPY` in the Dockerfile (or `ADD` [in appropriate cases](https://www.docker.com/blog/docker-best-practices-understanding-the-differences-between-add-and-copy-instructions-in-dockerfiles)) +- ["bind mounts"](https://docs.docker.com/engine/storage/bind-mounts) +- [volumes](https://docs.docker.com/engine/storage/volumes) +- R's own ways of installing from far (e.g. `remotes::install_github()`) + + +For the use case of [installing R packages from a private git repo](https://www.geeksforgeeks.org/how-to-clone-private-git-repo-with-dockerfile), there are several constraints: + +- It best happens at build time, to enable all the good stuff: `--rm`, sharing, ... +- Better keep your credentials (e.g. ssh keys, access tokens) off the container, both system side and [on the R side](https://usethis.r-lib.org/articles/git-credentials.html). +- On the other hand, updates can often happen by re-building. + + +In this (and only this) situation, the simple solution is to copy a clone of the repository to the container, and then install it. +The `git clone` should reside within the Dockerfile folder. +Then the Dockerfile section can look like the following: + +``` +# copy the repo +COPY my_private_repo /opt/my_private_repo + +# manually install dependencies +RUN R -q -e 'install.packages("remotes", dependencies = TRUE)' + +# install package from folder +RUN R -q -e 'install.packages("/opt/my_private_repo", repos = NULL, type = "source", dependencies = TRUE)' +``` + +This way of handling private repositories [seems to be good practice](https://stackoverflow.com/questions/23391839/clone-private-git-repo-with-dockerfile/55761914#55761914), for being simple, secure, and generally most feasible. + +The next best alternative would be mounting the `~/.ssh` folder from the host to the container via `-v`. + +You can finde some more options [on the `containbo` repository](https://github.com/inbo/containbo). + + +# Useful Commands {#sec-commands} + +You will certainly encounter `docker --version`, `docker run`, and `docker build` in this series of tutorials, and there are certainly more settings and tweaks on these commands to learn about. + + +There are other Docker commands which might help you out of a temporary misery. + +- First and foremost, `docker --help` will list the available commands and options. +- `docker run -it --entrypoint /bin/bash <image>` or `docker run -it <image> /bin/bash` brings you to the shell of a container; you can update, upgrade, or just mess around. Try `bash` or `bin/sh` as alternatives. +- `docker images` will list your images in convenient table format; the `-q` flag returns only IDs. +- `docker inspect <image-name or image-id>` brings up all the configuration details about a specific image; you can, for example, find out its Docker version and network IP address. +- `docker ps` ("print status") will list all running containers; `docker stop $(docker ps -a -q)` will stop them **all**. +- Be aware that docker images occupy a considerable amount of hard disk space. `docker rmi <image-name or image-id>` will remove an image; `docker rmi $(docker images -q)` will remove **all** your images. The command `docker system prune` provides an interactive cleanup, `docker system prune --all` will clean up non-interactively. Of course, you get to keep the Dockerfiles. +- `docker commit` and `docker diff` support the creation and maintenance of snapshots of processed images, which you could keep locally, or upload them to an online storage such as Docker Hub. + +There are a gazillion more to choose and use. +A more complete list can be found [here, for example](https://do4ds.com/chapters/append/cheatsheets.html#cheat-docker), and the [Docker docs](https://docs.docker.com/reference/cli/docker) are your go-to source. + + +One more note on the `ENTRYPOINT`: +It defines through which terminal or script the user will access the container. +For example, `/bin/bash`, `/usr/bin/bash` or `bin/sh` are the bash (Linux terminal on the container). +Rocker images usually enter into an R console, or monitor an RStudio server, via an `/init` script. +The flask container above runs a script which hosts your website and Python. +Anything is possible. +You can define an entrypoint in the Dockerfile (i.e. set a default), or overwrite it on each `run`. + + +# Summary + +In this series of tutorials, I demonstrate the basics of containerization with Docker and Podman. +There are convenient GUI apps, and sophisticated terminal commands, the latter are much more powerful. +This particular notebook assembled references, useful commands, information about the installation of Docker, and general considerations. + +This is the central node of a series of tutorials; the others are: +- Running containers: [https://tutorials.inbo.be/tutorials/development_containers2_run](../development_containers2_run) +- Building containers: [https://tutorials.inbo.be/tutorials/development_containers3_build](../development_containers3_build) +- Advanced Build Recipes: [https://github.com/inbo/containbo](https://github.com/inbo/containbo) +- Switching to Podman: [https://tutorials.inbo.be/tutorials/development_containers4_podman](../development_containers4_podman) + + +Personally, I find the concept of containerization fascinating, and was surprised how simple and useful of a trick it is. + +Containerization offers the advantages of modularity, configurability, transparency (open science: share your rocker file), shared use ... +There are some manageable pitfalls with respect to admin rights and resource limitation. + +This was just a quick tour; I brushed over a lot of novel vocabulary with which you will have to familiarize yourself. +Your head might be twisting in a swirl of containers by now. +I hope you find this overview useful, nevertheless. +Thank you for reading! + + + +<hr> +<ol> +<li id="fn:gemini" role="doc-endnote"> +<sup>*</sup> <p>Generated by Google Gemini (2025-02-21), modified. Prompt `I would love to have a comic-style image of a whale in a grail. The grail should be golden and shiny, resembling the holy grail. The whale on top is a reference to the docker logo (you may add sketchy little container blocks on its back).` +<a href="#fnref:gemini" class="footnote-backref" role="doc-backlink">↩︎</a> </p> +</li> </ol> diff --git a/content/tutorials/development_containers1/notes_qmd.txt b/content/tutorials/development_containers1/notes_qmd.txt new file mode 100644 index 000000000..9eaf0a9fe --- /dev/null +++ b/content/tutorials/development_containers1/notes_qmd.txt @@ -0,0 +1,43 @@ + +steps to get a qmd to hugo markdown: + ++ export hugo-md: + quarto render <file>.qmd --to hugo-md + ++ include yaml + preserve_yaml: true +but double-check the yaml header: it does not always copy correctly (author, date, categories, tags). +also check data and description + ++ remove TOC + (usually unnecessary for short texts) + ++ callouts: https://rossabaker.com/configs/website/shortcodes/callout/ +{{% callout note %}} +{{% /callout %}} + + ++ section crosslinks: +<a id="sec-section"></a> +## Section + ++ figure captions +<img +src="path/to/figure.png" +id="fig-label" +alt="Figure 1: Caption text." /> +<figcaption>Figure 1: Caption text.</figcaption><br> + ++ equations + in yaml header: + params: + math: true + replace $s$ -> \\(s\\), $$\ldots$$ -> \\[\ldots\\] + cf. math https://gohugo.io/content-management/mathematics/ + eqn with \\(\\) and \\[\\] + ++ preview procedure: + rm tutorials -rf + unzip <zip> + python -m http.server 8887 + diff --git a/content/tutorials/development_containers2_run/index.md b/content/tutorials/development_containers2_run/index.md new file mode 100644 index 000000000..ab2c2f044 --- /dev/null +++ b/content/tutorials/development_containers2_run/index.md @@ -0,0 +1,240 @@ +--- +title: Running Existing Containers +description: Pulling and running containers from an online container repository. +date: "2025-02-21" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + +Docker is about assembling and working in containers. +"Living" in containers. +Or, rather, you can think of this as living in a ["tiny home", or "mobile home"](https://parametric-architecture.com/tiny-house-movement). +(Let's call it a fancy caravan.) +In the simple, but comfortable case, you do not get to pick a general design or to choose all details of the interior: you just take that wheeled cabine "as is" from a tiny home reseller. + +<figure> +<img src="../../images/tutorials/development_docker/docker_metaphor_tiny_space.jpg" alt="Black/white image of a tiny home as a metaphor for software containerization." /> +<figcaption aria-hidden="true">A tiny home close to "Gare Maritime", Brussels, February 2025.</figcaption> +</figure> + +Just as a tiny home is a mini-version of an immobile house, a container can be thought of as a miniature computer which can be transferred to other computing environments. +The good news: +there are a gazillion **Docker images available** on repositories like [Docker Hub](https://hub.docker.com) or [Quay](https://quay.io). + +This tutorial will show you how to use such "containers-to-go", thereby demonstrating some basic principles and vocabulary about containerization. +I assume that you have [installed docker](../../tutorials/development_containers1#sec-installation). +This tutorial will stay on the more involved route of running Docker in the terminal (the Docker Desktop "app" is rather self-explanatory, and you can manoever it easily with knowledge of terminal vocabulary). +Once you master these first step, you can proceed to [customize your container images](../../tutorials/development_containers3_build). +You might also [consider Podman as a Docker alternative](../../tutorials/development_containers4_podman). + +## Example + +Because of the useful idea of bringing your computer environment along (think of benefits for distributed computing), container images of all kind are abundant on the container repositories mentioned above. +For example[^1], there are Docker images with [rstudio server](https://posit.co/download/rstudio-server) pre-installed: + +- <https://hub.docker.com/r/rocker/rstudio> + +{{% callout note %}} +If you control containers via the desktop app, simply search, pull, and run it. +{{% /callout %}} + + +<figure> +<img src="../../images/tutorials/development_docker/docker_desktop2.jpg" alt="desktop app: run" /> +<figcaption aria-hidden="true">Desktop App: run.</figcaption> +</figure> + +If you are comfortable using the terminal, execute the following script (*Windows*: use an administrator terminal). +If it does not find the resources locally, Docker will download and extract the image from Docker Hub[^2]. + +``` sh +docker run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD rocker/rstudio +``` + +- The `run` command will automatically `pull`, i.e. download an existing image; though you could also `pull` without running. +- The `--rm` flag makes the Docker container non-permanent, i.e. disk space will be freed after you close the container (<a href="#sec-permanence" class="quarto-xref">Section 0.4</a>). +- The port specified at `-p` is the one you use to access this local container server (the `-p` actually maps host- and container ports). You have to specify it explicitly, otherwise the host system will not let you pass (`:gandalf-meme:`). +- The `-e` flag allows you to specify environment variables, in this case used to set a password for the RStudio server. But if you do not specify one, a random password will be generated and displayed upon startup (read the terminal output). + +<figure> +<img src="../../images/tutorials/development_docker/docker_run.jpg" alt="run" /> +<figcaption aria-hidden="true">Docker run, on the terminal.</figcaption> +</figure> + +You are now running (`run`) a `rocker/rstudio` server instance on your `localhost`, i.e. your computer. +You can access it via a browser, going to <localhost:8787>, with the username `rstudio` and your chosen password. + +You can shut down the container with the keyboard shortcut `[ctrl]+[C]` (probably `[ctrl]+[Z] [Return]` on Windows). + + +<a id="sec-mounting"></a> +## File Access + +The downside of this is that your container is isolated (well... at least to a certain degree). + +Images can take up considerable storage space. +Storing files locally, i.e. on the host machine, without storing an unneccessarily filled container, might be a good strategy. +This can be achieved by mapping a virtual path on the container to a local drive on your computer. +(Linux users will be familiar with the concept of "mounting" and "linking" storage locations.) +Note that the technique is equally relevant when running the container locally, hence not exclusive to remote hosts. + +Docker `run` brings the `-v` flag for mounting volumes. +Suppose you have an R project you would like to work on, stored, for example, in this path: + +- `/data/git/coding-club` + +Then you can link this to your container's home folder via the following command. + +``` sh +# Windows syntax, mapping on `D:\data` +docker run --rm -p 8787:8787 -v //d/data/git/coding-club:/home/rstudio/coding-club rocker/rstudio + +# Linux syntax +docker run --rm -p 8787:8787 -v /data/git/coding-club:/home/rstudio/coding-club rocker/rstudio +``` + +Again, navigate to <localhost:8787>, *et voilà*, you can access your project and store files back in your regular folders. + +## Limitations + +This is a simple and quick way to run R and RStudio in a container. + +However, there are limitations: + + +{{% callout note %}} + +- You have to live with the R packages provided in the container, or otherwise install them each time you access it... +- ... unless you make your container permanent by omitting the `--rm` option. Note that this will cost considerable disk space, will not transfer to other computers (the original purpose of Docker), and demand occasional updates (<a href="#sec-permanence" class="quarto-xref">Section 0.4</a>). +- You could alternatively add `--pull always` to `docker run`, which will check and pull new versions. +- Speaking of updates: it is good practice to keep software up to date. Occasionally update or simply re-install your Docker image and R packages to get the latest versions. +- You should make sure that the containers are configured correctly and securely. This is especially important with server components which expose your machine to the internet. +- Because most containers contain a Linux system, user permissions are taken seriously, and the consequences might be confusing. There are guides online ([e.g. here](https://labex.io/tutorials/docker-how-to-handle-permissions-in-docker-415866)); there are example repositories (like the author's own struggle [here](https://github.com/inbo/containbo?tab=readme-ov-file#understanding-volumes) and [here](https://github.com/inbo/containbo/tree/main/emacs)); base images are well set up and one can normally get by with default users. +- There is a performance penalty from using containers: in inaccurate laymans' terms, they emulate (parts of a) "computer" inside your computer. +{{% /callout %}} + +On the performance issue: I attempted this on my local laptop with matrix multiplication. + +``` r +# https://cran.r-project.org/web/packages/rbenchmark/rbenchmark.pdf +# install.packages("rbenchmark") + +test <- function(){ + # test from https://prdm0.github.io/ropenblas/#installation + m <- 1e4; n <- 1e3; k <- 3e2 + X <- matrix(rnorm(m*k), nrow=m); Y <- matrix(rnorm(n*k), ncol=n) + X %*% Y +} + +benchmark(test()) +``` + +In the terminal: + + test replications elapsed relative user.self sys.self user.child sys.child + 1 test() 100 22.391 1 83.961 65.291 0 0 + +In the container: + + test replications elapsed relative user.self sys.self user.child sys.child + 1 test() 100 26.076 1 102.494 153.89 0 0 + +Now, the *good news* is that the difference is not by orders of magnitude. +This indicates that the chosen rocker image integrated the more performant `blas` variant which is [recommended](https://pbs-assess.github.io/sdmTMB/index.html#installation) [elsewhere](https://prdm0.github.io/ropenblas/#installation) (`blas-openblas`). + +The *bad news* is that we still suffer a performance drop of `-20%`, which is considerable. + +This is just a single snapshot on a laptop, and putatively `blas`-confounded. +Feel free to systematically and scientifically repeat the tests on your own machine. + + +<a id="sec-permanence"></a> +## Container Permanence: The `--rm` Option + +As briefly touched above, `docker run` comes with the `--rm` option. +This basically enables two separate workflows, i.e. usage paradigms. + +The first option, which is the default, is that your container is stored on the system permanently. +This counts for the upstream images, which are downloaded upon first invocation of a container. +But also, changes you apply while working in the container are persistently stored until you log in again, using hard drive space of the host. +Images may still be removed by manually running `docker rmi [...]` ([*cf.* "useful commands" in the overview tutorial](../../tutorials/development_containers1#sec-commands)). + +In contrast, with the second option, `docker run --rm [...]`, ad-hoc changes in the container are removed when the container is finished. +Unless, of course, you mount a local volume with `docker run --rm -v [...]` (<a href="#sec-mounting" class="quarto-xref">Section 0.2</a>). +However, contrary to a rather general intuition, starting a container with `--rm` will not require dependency download a second time. + +You might want to test this for yourself. +Consider the following series of commands to create a test file in the Docker home directory: + +``` sh +docker run --name testing_permanence --rm -it docker.io/rocker/r-base +echo "testing permanence." > ~/test.txt +cat ~/test.txt +exit +``` + +Re-connecting is instantateous. +However, + +``` sh +docker run --name testing_permanence --rm -it docker.io/rocker/r-base bash +cat ~/test.txt +``` + +will return: + +> cat: /root/test.txt: No such file or directory + +This behavior is desired (in the second workflow above): if you start up a fresh environment each time you work in Docker, you **assure that your work pipeline is independent of prior changes on the system**. +Whether this makes sense as a workflow has to be evaluated with respect to hard drive space requirement, updates, the option to build upon a customized Dockerfile, reproducibility potential. + +You can "link in" folders for working files (note how you have to specify the full path to `new_home`, and that this container uses the root user by default): + +``` sh +mkdir new_home +docker run --name testing_permanence -v /data/containers/new_home:/root --rm -it docker.io/rocker/r-base bash +echo "testing permanence." > ~/test.txt +``` + +Using `--rm` might not be desirable in every case. +However, it is a valuable option for testing, good to have when disk space is sparse, or as a final check before publishing. +Generally, I would consider it good practice to treat containers as volatile, thereby keeping them hostmachine-independent as much as possible. + +# Summary + +Docker images are the actual containers which you create from the Dockerfile blueprints by the process of building. +In the "tiny home" metaphor: your "image" is the physical (small, but real, DIY-achievement) home to live in, built from step-by-step instructions. +Think of a Docker image as a virtual copy of your computer which you store for later re-activation. + +Luckily, other people have prepared images for you. +For example, a collection of images for specific analysis pipelines at INBO are preserved at [Docker Hub/inbobmk](https://hub.docker.com/u/inbobmk). +We consider these "stable" versions because they could be re-activated no matter what crazy future updates will shatter the R community, which enables us to return to all details of previous analyses. + +This tutorial provided introductory details on how to run such images. +If you would like to take this further and customize your containers, proceed with [the next tutorial about the `build` command](../../tutorials/development_containers3_build). +Those commands are practically identical [in Docker and Podman](../../tutorials/development_containers4_podman). + +An overview on the topic is [available here](../../tutorials/development_containers1). + +[^1]: I mostly follow [this tutorial](https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html). + +[^2]: Just like "Github" is a server service to store git repositories, guess what: "Docker Hub" is a hosting service to store Docker containers. diff --git a/content/tutorials/development_containers2_run/index.qmd b/content/tutorials/development_containers2_run/index.qmd new file mode 100644 index 000000000..0f7794cd3 --- /dev/null +++ b/content/tutorials/development_containers2_run/index.qmd @@ -0,0 +1,260 @@ +--- +title: "Running Existing Containers" +description: "Pulling and running containers from an online container repository." +date: "2025-02-21" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + +Docker is about assembling and working in containers. +"Living" in containers. +Or, rather, you can think of this as living in a ["tiny home", or "mobile home"](https://parametric-architecture.com/tiny-house-movement). +(Let's call it a fancy caravan.) +In the simple, but comfortable case, you do not get to pick a general design or to choose all details of the interior: you just take that wheeled cabine "as is" from a tiny home reseller. + + + + +Just as a tiny home is a mini-version of an immobile house, a container can be thought of as a miniature computer which can be transferred to other computing environments. +The good news: +there are a gazillion **Docker images available** on repositories like [Docker Hub](https://hub.docker.com) or [Quay](https://quay.io). + + +This tutorial will show you how to use such "containers-to-go", thereby demonstrating some basic principles and vocabulary about containerization. +I assume that you have [installed docker](../tutorials/development_containers1#sec-installation). +This tutorial will stay on the more involved route of running Docker in the terminal (the Docker Desktop "app" is rather self-explanatory, and you can manoever it easily with knowledge of terminal vocabulary). +Once you master these first step, you can proceed to [customize your container images](../tutorials/development_containers3_build). +You might also [consider Podman as a Docker alternative](../tutorials/development_containers4_podman). + + +## Example + +Because of the useful idea of bringing your computer environment along (think of benefits for distributed computing), container images of all kind are abundant on the container repositories mentioned above. +For example[^1], there are Docker images with [rstudio server](https://posit.co/download/rstudio-server) pre-installed: + +- <https://hub.docker.com/r/rocker/rstudio> + +[^1]: I mostly follow [this tutorial](https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html). + + +:::{.callout-note} +If you control containers via the desktop app, simply search, pull, and run it. +::: + + + + +If you are comfortable using the terminal, execute the following script (*Windows*: use an administrator terminal). +If it does not find the resources locally, Docker will download and extract the image from Docker Hub[^2]. + +[^2]: Just like "Github" is a server service to store git repositories, guess what: "Docker Hub" is a hosting service to store Docker containers. + +```{sh} +#| eval: false +docker run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD rocker/rstudio +``` + + +- The `run` command will automatically `pull`, i.e. download an existing image; though you could also `pull` without running. +- The `--rm` flag makes the Docker container non-permanent, i.e. disk space will be freed after you close the container ([@sec-permanence]). +- The port specified at `-p` is the one you use to access this local container server (the `-p` actually maps host- and container ports). You have to specify it explicitly, otherwise the host system will not let you pass (`:gandalf-meme:`). +- The `-e` flag allows you to specify environment variables, in this case used to set a password for the RStudio server. But if you do not specify one, a random password will be generated and displayed upon startup (read the terminal output). + + + + +You are now running (`run`) a `rocker/rstudio` server instance on your `localhost`, i.e. your computer. +You can access it via a browser, going to <localhost:8787>, with the username `rstudio` and your chosen password. + + +You can shut down the container with the keyboard shortcut `[ctrl]+[C]` (probably `[ctrl]+[Z] [Return]` on Windows). + + +## File Access {#sec-mounting} + +The downside of this is that your container is isolated (well... at least to a certain degree). + +Images can take up considerable storage space. +Storing files locally, i.e. on the host machine, without storing an unneccessarily filled container, might be a good strategy. +This can be achieved by mapping a virtual path on the container to a local drive on your computer. +(Linux users will be familiar with the concept of "mounting" and "linking" storage locations.) +Note that the technique is equally relevant when running the container locally, hence not exclusive to remote hosts. + + +Docker `run` brings the `-v` flag for mounting volumes. +Suppose you have an R project you would like to work on, stored, for example, in this path: + +- `/data/git/coding-club` + + +Then you can link this to your container's home folder via the following command. + +```{sh} +#| eval: false +# Windows syntax, mapping on `D:\data` +docker run --rm -p 8787:8787 -v //d/data/git/coding-club:/home/rstudio/coding-club rocker/rstudio + +# Linux syntax +docker run --rm -p 8787:8787 -v /data/git/coding-club:/home/rstudio/coding-club rocker/rstudio +``` + +Again, navigate to <localhost:8787>, *et voilà*, you can access your project and store files back in your regular folders. + + +## Limitations + +This is a simple and quick way to run R and RStudio in a container. + +However, there are limitations: + +::: {.callout-note} +- You have to live with the R packages provided in the container, or otherwise install them each time you access it... +- ... unless you make your container permanent by omitting the `--rm` option. Note that this will cost considerable disk space, will not transfer to other computers (the original purpose of Docker), and demand occasional updates ([@sec-permanence]). +- You could alternatively add `--pull always` to `docker run`, which will check and pull new versions. +- Speaking of updates: it is good practice to keep software up to date. Occasionally update or simply re-install your Docker image and R packages to get the latest versions. +- You should make sure that the containers are configured correctly and securely. This is especially important with server components which expose your machine to the internet. +- Because most containers contain a Linux system, user permissions are taken seriously, and the consequences might be confusing. There are guides online ([e.g. here](https://labex.io/tutorials/docker-how-to-handle-permissions-in-docker-415866)); there are example repositories (like the author's own struggle [here](https://github.com/inbo/containbo?tab=readme-ov-file#understanding-volumes) and [here](https://github.com/inbo/containbo/tree/main/emacs)); base images are well set up and one can normally get by with default users. +- There is a performance penalty from using containers: in inaccurate laymans' terms, they emulate (parts of a) "computer" inside your computer. + +::: + + +On the performance issue: I attempted this on my local laptop with matrix multiplication. + +```{r eval=FALSE} +#| eval: false +# https://cran.r-project.org/web/packages/rbenchmark/rbenchmark.pdf +# install.packages("rbenchmark") + +test <- function(){ + # test from https://prdm0.github.io/ropenblas/#installation + m <- 1e4; n <- 1e3; k <- 3e2 + X <- matrix(rnorm(m*k), nrow=m); Y <- matrix(rnorm(n*k), ncol=n) + X %*% Y +} + +benchmark(test()) +``` + +In the terminal: + +``` + test replications elapsed relative user.self sys.self user.child sys.child +1 test() 100 22.391 1 83.961 65.291 0 0 +``` + +In the container: + +``` + test replications elapsed relative user.self sys.self user.child sys.child +1 test() 100 26.076 1 102.494 153.89 0 0 +``` + + +Now, the *good news* is that the difference is not by orders of magnitude. +This indicates that the chosen rocker image integrated the more performant `blas` variant which is [recommended](https://pbs-assess.github.io/sdmTMB/index.html#installation) [elsewhere](https://prdm0.github.io/ropenblas/#installation) (`blas-openblas`). + +The *bad news* is that we still suffer a performance drop of `-20%`, which is considerable. + + +This is just a single snapshot on a laptop, and putatively `blas`-confounded. +Feel free to systematically and scientifically repeat the tests on your own machine. + + +## Container Permanence: The `--rm` Option {#sec-permanence} + +As briefly touched above, `docker run` comes with the `--rm` option. +This basically enables two separate workflows, i.e. usage paradigms. + + +The first option, which is the default, is that your container is stored on the system permanently. +This counts for the upstream images, which are downloaded upon first invocation of a container. +But also, changes you apply while working in the container are persistently stored until you log in again, using hard drive space of the host. +Images may still be removed by manually running `docker rmi [...]` ([*cf.* "useful commands" in the overview tutorial](../../tutorials/development_containers1#sec-commands)). + + +In contrast, with the second option, `docker run --rm [...]`, ad-hoc changes in the container are removed when the container is finished. +Unless, of course, you mount a local volume with `docker run --rm -v [...]` ([@sec-mounting]). +However, contrary to a rather general intuition, starting a container with `--rm` will not require dependency download a second time. + + +You might want to test this for yourself. +Consider the following series of commands to create a test file in the Docker home directory: + +```{sh} +#| eval: false +docker run --name testing_permanence --rm -it docker.io/rocker/r-base +echo "testing permanence." > ~/test.txt +cat ~/test.txt +exit +``` + + +Re-connecting is instantateous. +However, + +```{sh} +#| eval: false +docker run --name testing_permanence --rm -it docker.io/rocker/r-base bash +cat ~/test.txt +``` + +will return: + +> cat: /root/test.txt: No such file or directory + + +This behavior is desired (in the second workflow above): if you start up a fresh environment each time you work in Docker, you **assure that your work pipeline is independent of prior changes on the system**. +Whether this makes sense as a workflow has to be evaluated with respect to hard drive space requirement, updates, the option to build upon a customized Dockerfile, reproducibility potential. + + +You can "link in" folders for working files (note how you have to specify the full path to `new_home`, and that this container uses the root user by default): + +```{sh} +#| eval: false +mkdir new_home +docker run --name testing_permanence -v /data/containers/new_home:/root --rm -it docker.io/rocker/r-base bash +echo "testing permanence." > ~/test.txt +``` + + +Using `--rm` might not be desirable in every case. +However, it is a valuable option for testing, good to have when disk space is sparse, or as a final check before publishing. +Generally, I would consider it good practice to treat containers as volatile, thereby keeping them hostmachine-independent as much as possible. + + +# Summary + +Docker images are the actual containers which you create from the Dockerfile blueprints by the process of building. +In the "tiny home" metaphor: your "image" is the physical (small, but real, DIY-achievement) home to live in, built from step-by-step instructions. +Think of a Docker image as a virtual copy of your computer which you store for later re-activation. + +Luckily, other people have prepared images for you. +For example, a collection of images for specific analysis pipelines at INBO are preserved at [Docker Hub/inbobmk](https://hub.docker.com/u/inbobmk). +We consider these "stable" versions because they could be re-activated no matter what crazy future updates will shatter the R community, which enables us to return to all details of previous analyses. + + +This tutorial provided introductory details on how to run such images. +If you would like to take this further and customize your containers, proceed with [the next tutorial about the `build` command](../tutorials/development_containers3_build). +Those commands are practically identical [in Docker and Podman](../tutorials/development_containers4_podman). + +An overview on the topic is [available here](../tutorials/development_containers1). diff --git a/content/tutorials/development_containers3_build/index.md b/content/tutorials/development_containers3_build/index.md new file mode 100644 index 000000000..7ac97c160 --- /dev/null +++ b/content/tutorials/development_containers3_build/index.md @@ -0,0 +1,294 @@ +--- +title: Building Custom Containers +description: How to customize and extend containers with Dockerfiles and the `build` command. +date: "2025-02-21" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + +By now, you [will have successfully installed](../../tutorials/development_containers1#sec-installation) Docker or [Podman](../../tutorials/development_containers4_podman). +You hopefully succeeded in [running others' containers](../../tutorials/development_containers2_run), e.g. from a container repository. + +Next, it is time to customize your container. + +To give you a metaphor to work on: imagine you have a nice little DIY project for your garage workshop. +This time, you would like to build your own [Matryoshka dolls](https://en.wikipedia.org/wiki/Matryoshka_doll) (матрёшка, stacking dolls, a great allegory for recursion). + +<figure> +<img src="https://images.unsplash.com/photo-1586010135736-c16373adf060?q=80" alt="Photo of a set of Matryoshka dolls." /> +<figcaption aria-hidden="true">(Photo by <a href="https://unsplash.com/@ilmatar?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Iza Gawrych</a> on <a href="https://unsplash.com/photos/a-group-of-blue-and-gold-vases-sitting-on-top-of-a-table-oL3O2PybLoo?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a>)</figcaption> +</figure> + + + +Like all good DIY, you do not fully start from scratch: you start with a blueprint which someone else has created, or general building instructions. +You usually do not grow your own trees to get the wood, you buy wooden blocks of approximately the right size; neither do you mix paint from elemental ingredients, you assemble what others have to offer. +But with those ingredients, you customize your making and end up with a very individual creation which, ideally, is exactly what you had in mind. + +Customizing container images with the `build` command is the same business. +Start from an image someone else prepared, as close as possible to your outcome. +Add extra ingredients, few of them are innovative. +Sprinkle in your own files for customization. +The result is a container with a set of components which noone else might have ever used. + +{{% callout note %}} +With Docker Desktop, you have the graphical interface for "builds". +This might fall under the extended functionality which requires a login. + +Yet even without a login, you *can* proceed via a terminal, as below. +Once you create a `Dockerfile` and build it, it will appear in the GUI. +{{% /callout %}} + +<figure> +<img src="../../images/tutorials/development_docker/docker_winbuild.jpg" alt="build on Windows" /> +<figcaption aria-hidden="true">Build on Windows.</figcaption> +</figure> + +# Simple Example: A Webserver with Python/`flask` + +## Rationale + +Matryoshka dolls only work because the internal dolls are smaller than the ones covering them. +In terms of software, size is a rather abstract metric, but you might think of different layers as "wrappers" to toher elements. + +For example, [`flask`](https://palletsprojects.com/projects/flask) is a wrapper for other tools ("Werkzeug"), and it is a library within the Python ecosystem. +In this chapter, you will learn how to wrap `flask` in a container. + +## Init: What is a `flask` + +[Python `flask`](https://en.wikipedia.org/wiki/Flask_(web_framework)) is a library which allows you to execute Python scripts upon web access by users. +Though I will not go into details, know that flask is a useful library for interactive website functions. +For example, you can use flask to gather information a user provides in an html form, then process and store it wherever you like. + +I started from the following examples and tutorials to spin up a flask container, but provide modifications and comments on the steps. + +- <https://docs.docker.com/build/concepts/dockerfile> +- <https://medium.com/@geeekfa/dockerizing-a-python-flask-app-a-step-by-step-guide-to-containerizing-your-web-application-d0f123159ba2> + +> **It all starts with a [Dockerfile](https://www.geeksforgeeks.org/what-is-dockerfile).**[^1] + +As you will see, the Docker file will give you all the design choices to create your own containers. +I think of the Docker file as a script which provides all the instructions to set up your container, starting with `FROM` (i.e. which prior container you build upon) to `RUN`ning any type of commands. +Not *any* type, really: we are working on (mysterious, powerful) Linux - don't fret, it is easier than you think! + +To our `python/flask` example. +A list of the official python containers is [available here](https://hub.docker.com/_/python). +Note that you build every container upon the skeleton of an operating system: I chose [Alpine Linux](https://en.wikipedia.org/wiki/Alpine_Linux). +(It's *en vogue*.) + +The Dockerfile resides in your working folder (yet it also defines a [`WORKDIR`](https://stackoverflow.com/a/51066379) from within which later commands are executed). + +- Navigate to a folder in which you intend to store your container(s), e.g. `cd C:\data\docker` (Windows) or `cd /data/docker` (Linux). +- Create a file called `Dockerfile`: `touch Dockerfile`. +- Edit the file in your favorite text editor (`vim Dockerfile`; Windows users probably use "notepad"). +- Paste and optionally modify the content below. + +<!-- --> + +``` +# Use the official Python image (Alpine Linux, Python 3) +FROM python:3-alpine + +# install app dependencies +RUN apk update && apk add --no-cache python3 py3-pip +RUN pip install flask + +# install app +COPY hello.py / + +# final configuration +ENV FLASK_APP=hello +EXPOSE 8000 +CMD ["flask", "run", "--host", "0.0.0.0", "--port", "8000"] +``` + +Note that the following `hello.py` file needs to be present in your working directory (you will be reminded by a friendly error message): + +``` python +from flask import Flask +app = Flask(__name__) + +@app.route("/") +def hello(): + return "Hello, INBO!" +``` + +With the `Dockerfile` and `hello.py` in place, you can build the container [^2]. + +``` sh +#| eval: false +# on Windows, you are already in an administrator terminal +docker build --pull -t my-flask . + +docker build --pull -t my-flask . +``` + +On Linux, you might need to use `sudo` if the user is not in the `docker` group, like so: `sudo docker build -t my-flask`. +Using `--pull` is good practice to ensure the download of the latest upstream containers; you could even use `--no-cache` to avoid previous downloads altogether. +The `-t` parameter [will "tag" the image at build time](https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image), auto-generating extra metadata. +Also, some variants can omit the final dot ("."), others require it; the dot is just a Linux shorthand reference to the current working directory (i.e. where your Dockerfile resides). + +<figure> +<img src="../../images/tutorials/development_docker/docker_build.jpg" alt="build" /> +<figcaption aria-hidden="true">Docker build.</figcaption> +</figure> + +List your available container images via the `docker images` command. + +You should now see a `python` image, which is the base alpine image we built upon. +There is also a `my-flask`. +Try it! + +``` sh +#| eval: false +docker run my-flask +``` + +The terminal should give you an IP and port; because the flask runs in a container, `localhost:8000` will **not work** out-of-the-box. +Instead, in my case, it was `http://172.17.0.2:8000`. +(Sadly, although I could build and run this container on windows, I did not get through via the browser :shrug: but try with port mapping `-p 8000:8000`.) + +{{% callout note %}} +So far, so good. +We have used an existing image and added `flask` on top of it. +This works via writing a Dockerfile and building an image. +{{% /callout %}} + +## Multiple Images: `compose` *versus* `build` + +The above works fine for most cases. +However, if you want to assemble and combine multiple images, or build on base images from multiple sources, you need a level up. + +In that case `docker compose` is [the way to go](https://docs.docker.com/compose/gettingstarted). +On Debian or Ubuntu, this extra functionality comes with the `docker-compose-plugin`. +I did not have the need to try this out, yet, but will return here if that changes. + +# Application: RStudio With Packages + +## Rationale + +A Python flask might not be your kind of Matryoshka doll, if you are mainly concerned with heavy R scripting from within your familiar RStudio environment. +To re-iterate: containers are immensely flexible, and images are available for a multitude of situations. + +With the general tools presented above, we should be able to apply the above to modify the `rocker/rstudio` server image for our purpose. + +Build recipes for some of the INBO packages you might want to include are collected in this repository: + +- <https://github.com/inbo/contaINBO> + +Contributions are much appreciated! + +## Dockerfile + +This use case is, in fact, well documented: + +- <https://rocker-project.org/use/extending.html> +- <https://rocker-project.org/images/versioned/rstudio.html> +- <https://davetang.org/muse/2021/04/24/running-rstudio-server-with-docker> + +The Rocker crew rocks! +They prepared quite [a lot of useful images](https://hub.docker.com/u/rocker), including for example the `tidyverse` or geospatial packages. + +Note the syntax in `FROM`: it is `rocker/<image>:<version>`. + +``` +FROM rocker/rstudio:latest +# (Use the rocker rstudio image) + +# update the system packages +RUN apt update \ + && apt upgrade --yes + +# git2rdata requires git +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + git libgit2-dev\ + && apt-get clean + +# update pre-installed R packages +# RUN Rscript -e 'update.packages(ask=FALSE)' + +# copy a `.Rprofile` to the container +# available here: https://tutorials.inbo.be/installation/administrator/admin_install_r/Rprofile.site +COPY docker/.Rprofile $R_HOME/etc/Rprofile.site + +# install package via an R command (`R -q -e` or `Rscript -e`) +# (a) from pre-configured repositories +RUN Rscript -e 'install.packages("git2rdata")' + +# (b) via r-universe +RUN R -q -e 'install.packages("watina", repos = c(inbo = "https://inbo.r-universe.dev", CRAN = "https://cloud.r-project.org"))' + +# (b) from github +RUN R -q -e 'install.packages("remotes")' +RUN R -q -e 'remotes::install_github("inbo/INBOmd", dependencies = TRUE)' +``` + +It takes some puzzle work to get the dependencies right, e.g. with the `libgit2` dependency (try commenting out that line to get a feeling for build failure). +However, there is hope: (i) the error output is quite instructive (at least for Linux users), (ii) building is incremental, so you can add successively. +It just takes patience. +As a shortcut, consider using `pak` ([from r-lib](https://pak.r-lib.org)) or `r2u` ([apt repository](https://github.com/eddelbuettel/r2u)) to implicitly deal with the system dependencies. +Generally, remember which system powers your container (Debian/Ubuntu), find help online, and document your progress. + +{{% callout note %}} +Dockerfiles offer some room for optimization. +For example, every `RUN` is a "Layer"; you should put stable layers top and volatile layers later. +In principle, it is recommended to combine layers as much as possible. + +More here: <https://docs.docker.com/build/building/best-practices> +{{% /callout %}} + +Test the image: + +``` sh +#| eval: false +docker build -t test-rstudio . +``` + +Run it, as before: + +``` sh +#| eval: false +docker run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD test-rstudio +``` + +Another good practice is to extract modifications in scripts and modularly bring them in to be executed upon installation ([see here](https://stackoverflow.com/q/69167940), [and here](https://rocker-project.org/use/extending.html#install2.r)), via `COPY`. +This exposes them to a more refined version control on the host machine. +As you know, [version control is key!](https://tutorials.inbo.be/tags/git) + +# Summary + +Like a Matryoshka doll, software often comes in *layers*, as I have tried to illustrate in the examples above. +When designing and building Dockerfiles, you effectively craft your own DIY Matryoshka. +This may involve tinkering, some sawdust will fall off on the sides, but often the end product is quite presentable. + +And that is one of the main purposes of a custom docker image: you can store a given set of interrelated software building blocks for later use (reproducibility). +Some of these sets are rather rough, abstract, or general (like the images you get on image repositories, which you can [simply pull and run](../../tutorials/development_containers2_run)). +Others are bespoke, containing exact requirements for a given task. +Both functions are important building blocks of open science, and I elaborate more about this framework [in the main article on containerization](../../tutorials/development_containers1). +Docker is a specific implementation of the container concept, and you might also want to [try out Podman](../../tutorials/development_containers4_podman) as an alternative. + +Good luck with all your DIY projects, and thank you for reading! + +[^1]: Here I quoted the docs (<https://docs.docker.com/build/concepts/dockerfile>) before having read them. + +[^2]: If you did not install the `buildx` package on Linux, you will read a legacy warning. diff --git a/content/tutorials/development_containers3_build/index.qmd b/content/tutorials/development_containers3_build/index.qmd new file mode 100644 index 000000000..0d0dd4617 --- /dev/null +++ b/content/tutorials/development_containers3_build/index.qmd @@ -0,0 +1,317 @@ +--- +title: "Building Custom Containers" +description: "How to customize and extend containers with Dockerfiles and the `build` command." +date: "2025-02-21" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + +By now, you [will have successfully installed](../tutorials/development_containers1#sec-installation) Docker or [Podman](../tutorials/development_containers4_podman). +You hopefully succeeded in [running others' containers](../tutorials/development_containers2_run), e.g. from a container repository. + +Next, it is time to customize your container. + + +To give you a metaphor to work on: imagine you have a nice little DIY project for your garage workshop. +This time, you would like to build your own [Matryoshka dolls](https://en.wikipedia.org/wiki/Matryoshka_doll) (матрёшка, stacking dolls, a great allegory for recursion). + +<figure> +<img src="https://images.unsplash.com/photo-1586010135736-c16373adf060?q=80" alt="Photo of a set of Matryoshka dolls." /> +<figcaption aria-hidden="true">(Photo by <a href="https://unsplash.com/@ilmatar?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Iza Gawrych</a> on <a href="https://unsplash.com/photos/a-group-of-blue-and-gold-vases-sitting-on-top-of-a-table-oL3O2PybLoo?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a>)</figcaption> +</figure> + + + +Like all good DIY, you do not fully start from scratch: you start with a blueprint which someone else has created, or general building instructions. +You usually do not grow your own trees to get the wood, you buy wooden blocks of approximately the right size; neither do you mix paint from elemental ingredients, you assemble what others have to offer. +But with those ingredients, you customize your making and end up with a very individual creation which, ideally, is exactly what you had in mind. + + +Customizing container images with the `build` command is the same business. +Start from an image someone else prepared, as close as possible to your outcome. +Add extra ingredients, few of them are innovative. +Sprinkle in your own files for customization. +The result is a container with a set of components which noone else might have ever used. + + +:::{.callout-note} +With Docker Desktop, you have the graphical interface for "builds". +This might fall under the extended functionality which requires a login. + +Yet even without a login, you *can* proceed via a terminal, as below. +Once you create a `Dockerfile` and build it, it will appear in the GUI. +::: + + + + + +# Simple Example: A Webserver with Python/`flask` + +## Rationale + +Matryoshka dolls only work because the internal dolls are smaller than the ones covering them. +In terms of software, size is a rather abstract metric, but you might think of different layers as "wrappers" to toher elements. + +For example, [`flask`](https://palletsprojects.com/projects/flask) is a wrapper for other tools ("Werkzeug"), and it is a library within the Python ecosystem. +In this chapter, you will learn how to wrap `flask` in a container. + + +## Init: What is a `flask` + +[Python `flask`](https://en.wikipedia.org/wiki/Flask_(web_framework)) is a library which allows you to execute Python scripts upon web access by users. +Though I will not go into details, know that flask is a useful library for interactive website functions. +For example, you can use flask to gather information a user provides in an html form, then process and store it wherever you like. + + +I started from the following examples and tutorials to spin up a flask container, but provide modifications and comments on the steps. + +- <https://docs.docker.com/build/concepts/dockerfile> +- <https://medium.com/@geeekfa/dockerizing-a-python-flask-app-a-step-by-step-guide-to-containerizing-your-web-application-d0f123159ba2> + + + +> **It all starts with a [Dockerfile](https://www.geeksforgeeks.org/what-is-dockerfile).**[^3] + +[^3]: Here I quoted the docs (<https://docs.docker.com/build/concepts/dockerfile>) before having read them. + + +As you will see, the Docker file will give you all the design choices to create your own containers. +I think of the Docker file as a script which provides all the instructions to set up your container, starting with `FROM` (i.e. which prior container you build upon) to `RUN`ning any type of commands. +Not *any* type, really: we are working on (mysterious, powerful) Linux - don't fret, it is easier than you think! + + +To our `python/flask` example. +A list of the official python containers is [available here](https://hub.docker.com/_/python). +Note that you build every container upon the skeleton of an operating system: I chose [Alpine Linux](https://en.wikipedia.org/wiki/Alpine_Linux). +(It's *en vogue*.) + + +The Dockerfile resides in your working folder (yet it also defines a [`WORKDIR`](https://stackoverflow.com/a/51066379) from within which later commands are executed). + +- Navigate to a folder in which you intend to store your container(s), e.g. `cd C:\data\docker` (Windows) or `cd /data/docker` (Linux). +- Create a file called `Dockerfile`: `touch Dockerfile`. +- Edit the file in your favorite text editor (`vim Dockerfile`; Windows users probably use "notepad"). +- Paste and optionally modify the content below. + +``` +# Use the official Python image (Alpine Linux, Python 3) +FROM python:3-alpine + +# install app dependencies +RUN apk update && apk add --no-cache python3 py3-pip +RUN pip install flask + +# install app +COPY hello.py / + +# final configuration +ENV FLASK_APP=hello +EXPOSE 8000 +CMD ["flask", "run", "--host", "0.0.0.0", "--port", "8000"] +``` + + +Note that the following `hello.py` file needs to be present in your working directory (you will be reminded by a friendly error message): + +```python +#| eval: false +from flask import Flask +app = Flask(__name__) + +@app.route("/") +def hello(): + return "Hello, INBO!" +``` + + +With the `Dockerfile` and `hello.py` in place, you can build the container [^4]. + +```sh +#| eval: false +# on Windows, you are already in an administrator terminal +docker build --pull -t my-flask . + +docker build --pull -t my-flask . +``` + +[^4]: If you did not install the `buildx` package on Linux, you will read a legacy warning. + +On Linux, you might need to use `sudo` if the user is not in the `docker` group, like so: `sudo docker build -t my-flask`. +Using `--pull` is good practice to ensure the download of the latest upstream containers; you could even use `--no-cache` to avoid previous downloads altogether. +The `-t` parameter [will "tag" the image at build time](https://docs.docker.com/get-started/docker-concepts/building-images/build-tag-and-publish-an-image), auto-generating extra metadata. +Also, some variants can omit the final dot ("."), others require it; the dot is just a Linux shorthand reference to the current working directory (i.e. where your Dockerfile resides). + + + + + +List your available container images via the `docker images` command. + +You should now see a `python` image, which is the base alpine image we built upon. +There is also a `my-flask`. +Try it! + +```sh +#| eval: false +docker run my-flask +``` + +The terminal should give you an IP and port; because the flask runs in a container, `localhost:8000` will **not work** out-of-the-box. +Instead, in my case, it was `http://172.17.0.2:8000`. +(Sadly, although I could build and run this container on windows, I did not get through via the browser :shrug: but try with port mapping `-p 8000:8000`.) + + +:::{.callout-note} +So far, so good. +We have used an existing image and added `flask` on top of it. +This works via writing a Dockerfile and building an image. +::: + + +## Multiple Images: `compose` *versus* `build` + +The above works fine for most cases. +However, if you want to assemble and combine multiple images, or build on base images from multiple sources, you need a level up. + +In that case `docker compose` is [the way to go](https://docs.docker.com/compose/gettingstarted). +On Debian or Ubuntu, this extra functionality comes with the `docker-compose-plugin`. +I did not have the need to try this out, yet, but will return here if that changes. + + +# Application: RStudio With Packages + +## Rationale + +A Python flask might not be your kind of Matryoshka doll, if you are mainly concerned with heavy R scripting from within your familiar RStudio environment. +To re-iterate: containers are immensely flexible, and images are available for a multitude of situations. + +With the general tools presented above, we should be able to apply the above to modify the `rocker/rstudio` server image for our purpose. + + +Build recipes for some of the INBO packages you might want to include are collected in this repository: + +- <https://github.com/inbo/contaINBO> + +Contributions are much appreciated! + + +## Dockerfile + +This use case is, in fact, well documented: + +- <https://rocker-project.org/use/extending.html> +- <https://rocker-project.org/images/versioned/rstudio.html> +- <https://davetang.org/muse/2021/04/24/running-rstudio-server-with-docker> + +The Rocker crew rocks! +They prepared quite [a lot of useful images](https://hub.docker.com/u/rocker), including for example the `tidyverse` or geospatial packages. + + +Note the syntax in `FROM`: it is `rocker/<image>:<version>`. + +``` +FROM rocker/rstudio:latest +# (Use the rocker rstudio image) + +# update the system packages +RUN apt update \ + && apt upgrade --yes + +# git2rdata requires git +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + git libgit2-dev\ + && apt-get clean + +# update pre-installed R packages +# RUN Rscript -e 'update.packages(ask=FALSE)' + +# copy a `.Rprofile` to the container +# available here: https://tutorials.inbo.be/installation/administrator/admin_install_r/Rprofile.site +COPY docker/.Rprofile $R_HOME/etc/Rprofile.site + +# install package via an R command (`R -q -e` or `Rscript -e`) +# (a) from pre-configured repositories +RUN Rscript -e 'install.packages("git2rdata")' + +# (b) via r-universe +RUN R -q -e 'install.packages("watina", repos = c(inbo = "https://inbo.r-universe.dev", CRAN = "https://cloud.r-project.org"))' + +# (b) from github +RUN R -q -e 'install.packages("remotes")' +RUN R -q -e 'remotes::install_github("inbo/INBOmd", dependencies = TRUE)' +``` + +It takes some puzzle work to get the dependencies right, e.g. with the `libgit2` dependency (try commenting out that line to get a feeling for build failure). +However, there is hope: (i) the error output is quite instructive (at least for Linux users), (ii) building is incremental, so you can add successively. +It just takes patience. +As a shortcut, consider using `pak` ([from r-lib](https://pak.r-lib.org)) or `r2u` ([apt repository](https://github.com/eddelbuettel/r2u)) to implicitly deal with the system dependencies. +Generally, remember which system powers your container (Debian/Ubuntu), find help online, and document your progress. + + +:::{.callout-note} +Dockerfiles offer some room for optimization. +For example, every `RUN` is a "Layer"; you should put stable layers top and volatile layers later. +In principle, it is recommended to combine layers as much as possible. + +More here: <https://docs.docker.com/build/building/best-practices> +::: + + +Test the image: + +```sh +#| eval: false +docker build -t test-rstudio . +``` + + +Run it, as before: + +```sh +#| eval: false +docker run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD test-rstudio +``` + + +Another good practice is to extract modifications in scripts and modularly bring them in to be executed upon installation ([see here](https://stackoverflow.com/q/69167940), [and here](https://rocker-project.org/use/extending.html#install2.r)), via `COPY`. +This exposes them to a more refined version control on the host machine. +As you know, [version control is key!](https://tutorials.inbo.be/tags/git) + + + +# Summary + +Like a Matryoshka doll, software often comes in *layers*, as I have tried to illustrate in the examples above. +When designing and building Dockerfiles, you effectively craft your own DIY Matryoshka. +This may involve tinkering, some sawdust will fall off on the sides, but often the end product is quite presentable. + + +And that is one of the main purposes of a custom docker image: you can store a given set of interrelated software building blocks for later use (reproducibility). +Some of these sets are rather rough, abstract, or general (like the images you get on image repositories, which you can [simply pull and run](../tutorials/development_containers2_run)). +Others are bespoke, containing exact requirements for a given task. +Both functions are important building blocks of open science, and I elaborate more about this framework [in the main article on containerization](../tutorials/development_containers1). +Docker is a specific implementation of the container concept, and you might also want to [try out Podman](../tutorials/development_containers4_podman) as an alternative. + + +Good luck with all your DIY projects, and thank you for reading! diff --git a/content/tutorials/development_containers4_podman/index.md b/content/tutorials/development_containers4_podman/index.md new file mode 100644 index 000000000..ffba769da --- /dev/null +++ b/content/tutorials/development_containers4_podman/index.md @@ -0,0 +1,185 @@ +--- +title: Containers with Podman +description: 'Podman: a drop-in alternative to Docker.' +date: "2025-02-21" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + + +In this cluster of tutorials, you might have [gotten a general overview on containers](../../tutorials/development_containers1), and installed Docker. +You can find instructions on [how to run existing images](../../tutorials/development_containers2_run), and take this further to [building custom containers](../../tutorials/development_containers3_build). +And during the installation and use of Docker, you might have been annoyed by the mandatory administrator mode, or quirks of the Desktop App. +Or you might value fully free and open source software, like I do. + +Luckily, Docker is not a monolith. +There are alternative approaches to containerization which mitigate some of the Docker limitations and disadvantages. +In this tutorial, I will present [Podman](https://podman.io), a Docker alternative which I personally use the most (besides occasionally turning to ["buildah"](https://buildah.io)). + + +<figure> +<img src="https://podman.io/images/raw/characters/seal-diving.png" alt="The podman masquot, a stylized comic seal, jumping into the water." /> +<figcaption aria-hidden="true">Podman - let's dive in! (<a href="https://podman.io/features">Image source: the Podman website.</a>)</figcaption> +</figure> + + +# Podman + +Podman might be the most prominent Docker alternative. +Vocabulary is marginally different: a container is a "pod", they run on a "machine", and this FOSS tool helps you to manage them with the `podman` command. + +One major advantage of Podman is that it can be configured to run **"rootless"**, i.e. without administrator rights [^1]. +A second advantage is that it is "all community", full Free and Open Source: it does not promote and "enterprise edition". + +Podman is [well documented](https://podman.io/docs/installation). +Another reliable source as so often is the [Arch Linux wiki on Podman](https://wiki.archlinux.org/title/Podman), no matter which Linux you are on. +Windows users have succeeded in running Podman through a WSL. + +{{% callout note %}} +For Windows, there is a convenient "Podman Desktop" GUI which guides you through the installation and setup, including WSL instantiation. +It is intuitive, transparent (telemetry opt-out), backed by RedHat. + +Unfortunately, it relies on Windows Subsystem for Linux (WSL), which is not available for INBO users at the moment. + +:( + +We are working on it. +{{% /callout %}} + +# Setup + +The instructions below were tested on Arch Linux, but generalize easily. + +I follow the `podman` installation instructions for Arch Linux, to set up a **rootless container environment**. + +Installation: + +``` sh +#| eval: false +pacman -Sy podman podman-docker passt +``` + +The last one, `passt` (providing `pasta`, yum!), is required for rootless network access. +Optionally, there is `podman-compose`. + +Originally, Podman was designed to run *only if you are root*, just like Docker. +However, we experienced that it now comes in *rootless* configuration per default ([further instructions](https://man.archlinux.org/man/podman.1#Rootless_mode)). +Just to be safe, I briefly list the major configuration steps. + +The first step is to confirm a required kernel module: check that `unpriviledged_users_clone` is set to one. + +``` sh +#| eval: false +sysctl kernel.unprivileged_userns_clone +``` + +Then, configure "subordinate user IDs". +There are detail differences in each Linux distribution; with some luck, your username is already present in these lists: + +``` sh +#| eval: false +cat /etc/subuid +cat /etc/subgid +``` + +If not, you can be admitted to the club of subordinates with the command: + +``` sh +#| eval: false +usermod --add-subuids 100000-165535 --add-subgids 100000-165535 <username> +podman system migrate +``` + +We note some useful commands on the way: `podman system ...` and `podman info`. +You might immediately check "native rootless overlays" (has something to do with mounting filesystems in the container): + +``` sh +#| eval: false +podman info | grep -i overlay +``` + +Then, networking: pods might need to communicate to each other and to the world. +And, of course, container storage: make sure you know where your containers are stored. +These and more settings are in `/etc/containers/containers.conf` and `/etc/containers/storage.conf`; make sure to scan and edit them to your liking. + +# Usage + +You can use images from `docker.io` with Podman. +The only difference from Docker is the explicit mention of the source, `docker.io`. +For example: + +``` sh +#| eval: false +podman search docker.io/alpine +podman pull docker.io/alpine # download a machine +podman run -it docker.io/alpine # will connect to the container +exit +``` + +Except for the prefix, everything you [can read in our `docker run` tutorial](../../tutorials/development_containers2_run) still applies. + +# Limitations + +Note that at least some `docker.io` images will not work: I actually experienced issues with the "rootless Docker image": + +``` sh +#| eval: false +# podman run --rm -it docker.io/docker:25.0-dind-rootless +``` + +However, it is logical that that one does not work: it builds a (root-level) Docker which is supposed to contain a rootless Docker ([*cf.* the overview tutorial](../../tutorials/development_containers1#sec-rootless)). +The outer Docker layer requires root, which Podman cannot provide. + +This is a logical case; if you understand it, congratulations: you have achieved a basic understanding of containers and user privileges :) +There might be yet other images which do not work by default and require additional tinkering in Podman, due to its altered design. +Most use cases are covered, for example a containerized R environment. + +# Podman Rocker + +From here, **Podman is a full drop-in replacement for Docker**; just that you are not forced to grant host system root privileges to containers. +This means that you can simply apply [everything I showed about the `build` command](../../tutorial/development_containers3_build) by exchanging `docker` for `podman`. + +Any Dockerfile should work, with the mentioned mini-adjustment to `FROM`. +And you can use any Docker image; `docker.io/rocker/rstudio` [is available](https://rocker-project.org/use/rootless-podman.html) (don't forget to specify the port). +You may even write `docker` in the terminal: it will alias to `podman` (via the `podman-docker` package on Linux, or an alias). + +``` sh +#| eval: false +podman run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD -v /data/git/coding-club:/root/coding-club docker.io/rocker/rstudio +``` + +There is another subtle change: the default user to login to `rstudio` is not `rstudio`, but `root`, because for some reason RStudio needs to have root rights on the container. +You had those before anyways, but now they are confined to within the pod. +There might be workarounds, which I will explore. + +# Summary + +To summarize the Podman experience: + +- **Docker's Dockerfiles like the one above will build equally well on Podman, except for micro-adjustments compared to Docker.** +- You can even stick to the `docker` commands thanks to the `podman-docker` package. +- There is Podman Desktop, if you like clicking. +- Podman is everything Docker is, just minimally different, and more secure, full FOSS. + +Kudos to the Podman devs! + +[^1]: Daniel J. Walsh (2019): "How does rootless Podman work?" <https://opensource.com/article/19/2/how-does-rootless-podman-work> diff --git a/content/tutorials/development_containers4_podman/index.qmd b/content/tutorials/development_containers4_podman/index.qmd new file mode 100644 index 000000000..7fbb37f3d --- /dev/null +++ b/content/tutorials/development_containers4_podman/index.qmd @@ -0,0 +1,200 @@ +--- +title: "Containers with Podman" +description: "Podman: a drop-in alternative to Docker." +date: "2025-02-21" +authors: [falkmielke] +categories: ["development", "open science"] +tags: ["development", "open science", "docker", "containers"] +number-sections: false +params: + math: true +format: + html: + toc: true + html-math-method: katex + hugo-md: + toc: true + preserve_yaml: true + html-math-method: katex +output: + hugo-md: + preserve_yaml: true + variant: gfm+footnotes + html: + variant: gfm+footnotes +--- + + +In this cluster of tutorials, you might have [gotten a general overview on containers](../../tutorials/development_containers1), and installed Docker. +You can find instructions on [how to run existing images](../../tutorials/development_containers2_run), and take this further to [building custom containers](../../tutorials/development_containers3_build). +And during the installation and use of Docker, you might have been annoyed by the mandatory administrator mode, or quirks of the Desktop App. +Or you might value fully free and open source software, like I do. + + +Luckily, Docker is not a monolith. +There are alternative approaches to containerization which mitigate some of the Docker limitations and disadvantages. +In this tutorial, I will present [Podman](https://podman.io), a Docker alternative which I personally use the most (besides occasionally turning to ["buildah"](https://buildah.io)). + + +<figure> +<img src="https://podman.io/images/raw/characters/seal-diving.png" alt="The podman masquot, a stylized comic seal, jumping into the water." /> +<figcaption aria-hidden="true">Podman - let's dive in! (<a href="https://podman.io/features">Image source: the Podman website.</a>)</figcaption> +</figure> + + + +# Podman +Podman might be the most prominent Docker alternative. +Vocabulary is marginally different: a container is a "pod", they run on a "machine", and this FOSS tool helps you to manage them with the `podman` command. + +One major advantage of Podman is that it can be configured to run **"rootless"**, i.e. without administrator rights [^5]. +A second advantage is that it is "all community", full Free and Open Source: it does not promote and "enterprise edition". + + +[^5]: Daniel J. Walsh (2019): "How does rootless Podman work?" <https://opensource.com/article/19/2/how-does-rootless-podman-work> + + +Podman is [well documented](https://podman.io/docs/installation). +Another reliable source as so often is the [Arch Linux wiki on Podman](https://wiki.archlinux.org/title/Podman), no matter which Linux you are on. +Windows users have succeeded in running Podman through a WSL. + +:::{.callout-note} +For Windows, there is a convenient "Podman Desktop" GUI which guides you through the installation and setup, including WSL instantiation. +It is intuitive, transparent (telemetry opt-out), backed by RedHat. + +Unfortunately, it relies on Windows Subsystem for Linux (WSL), which is not available for INBO users at the moment. + +:( + +We are working on it. +::: + + +# Setup + +The instructions below were tested on Arch Linux, but generalize easily. + +I follow the `podman` installation instructions for Arch Linux, to set up a **rootless container environment**. + + +Installation: + +```sh +#| eval: false +pacman -Sy podman podman-docker passt +``` + +The last one, `passt` (providing `pasta`, yum!), is required for rootless network access. +Optionally, there is `podman-compose`. + + +Originally, Podman was designed to run *only if you are root*, just like Docker. +However, we experienced that it now comes in *rootless* configuration per default ([further instructions](https://man.archlinux.org/man/podman.1#Rootless_mode)). +Just to be safe, I briefly list the major configuration steps. + + +The first step is to confirm a required kernel module: check that `unpriviledged_users_clone` is set to one. + +```sh +#| eval: false +sysctl kernel.unprivileged_userns_clone +``` + +Then, configure "subordinate user IDs". +There are detail differences in each Linux distribution; with some luck, your username is already present in these lists: + +```sh +#| eval: false +cat /etc/subuid +cat /etc/subgid +``` + +If not, you can be admitted to the club of subordinates with the command: + +```sh +#| eval: false +usermod --add-subuids 100000-165535 --add-subgids 100000-165535 <username> +podman system migrate +``` + + +We note some useful commands on the way: `podman system ...` and `podman info`. +You might immediately check "native rootless overlays" (has something to do with mounting filesystems in the container): + +```sh +#| eval: false +podman info | grep -i overlay +``` + + +Then, networking: pods might need to communicate to each other and to the world. +And, of course, container storage: make sure you know where your containers are stored. +These and more settings are in `/etc/containers/containers.conf` and `/etc/containers/storage.conf`; make sure to scan and edit them to your liking. + + +# Usage + +You can use images from `docker.io` with Podman. +The only difference from Docker is the explicit mention of the source, `docker.io`. +For example: + +```sh +#| eval: false +podman search docker.io/alpine +podman pull docker.io/alpine # download a machine +podman run -it docker.io/alpine # will connect to the container +exit +``` + + +Except for the prefix, everything you [can read in our `docker run` tutorial](../../tutorials/development_containers2_run) still applies. + + +# Limitations + +Note that at least some `docker.io` images will not work: I actually experienced issues with the "rootless Docker image": + +```sh +#| eval: false +# podman run --rm -it docker.io/docker:25.0-dind-rootless +``` + +However, it is logical that that one does not work: it builds a (root-level) Docker which is supposed to contain a rootless Docker ([*cf.* the overview tutorial](../../tutorials/development_containers1#sec-rootless)). +The outer Docker layer requires root, which Podman cannot provide. + +This is a logical case; if you understand it, congratulations: you have achieved a basic understanding of containers and user privileges :) +There might be yet other images which do not work by default and require additional tinkering in Podman, due to its altered design. +Most use cases are covered, for example a containerized R environment. + + +# Podman Rocker + +From here, **Podman is a full drop-in replacement for Docker**; just that you are not forced to grant host system root privileges to containers. +This means that you can simply apply [everything I showed about the `build` command](../../tutorial/development_containers3_build) by exchanging `docker` for `podman`. + + +Any Dockerfile should work, with the mentioned mini-adjustment to `FROM`. +And you can use any Docker image; `docker.io/rocker/rstudio` [is available](https://rocker-project.org/use/rootless-podman.html) (don't forget to specify the port). +You may even write `docker` in the terminal: it will alias to `podman` (via the `podman-docker` package on Linux, or an alias). + +```sh +#| eval: false +podman run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD -v /data/git/coding-club:/root/coding-club docker.io/rocker/rstudio +``` + +There is another subtle change: the default user to login to `rstudio` is not `rstudio`, but `root`, because for some reason RStudio needs to have root rights on the container. +You had those before anyways, but now they are confined to within the pod. +There might be workarounds, which I will explore. + + +# Summary + +To summarize the Podman experience: + +- **Docker's Dockerfiles like the one above will build equally well on Podman, except for micro-adjustments compared to Docker.** +- You can even stick to the `docker` commands thanks to the `podman-docker` package. +- There is Podman Desktop, if you like clicking. +- Podman is everything Docker is, just minimally different, and more secure, full FOSS. + + +Kudos to the Podman devs! diff --git a/layouts/shortcodes/callout.html b/layouts/shortcodes/callout.html new file mode 100644 index 000000000..67e6997fb --- /dev/null +++ b/layouts/shortcodes/callout.html @@ -0,0 +1,4 @@ +<div class="callout callout-{{ .Get 0 }}" role="note"> + <div class="callout-title">{{ printf "callouts.%s" (.Get 0) | i18n | strings.FirstUpper }}</div> + {{ .Inner | markdownify | emojify }} +</div> diff --git a/static/css/custom.css b/static/css/custom.css index 8236045f0..78b935ba0 100644 --- a/static/css/custom.css +++ b/static/css/custom.css @@ -1,5 +1,38 @@ /* Custom CSS */ +/* callout boxes */ +/* cf. https://rossabaker.com/configs/website/shortcodes/callout/ */ +.callout { + margin: 1.5rem 0; + padding: 1em; + border-inline-start: .25em solid; +} + +.callout-title { + font-weight: bolder; + margin-bottom: 1em; +} + +.callout-note { + border-color: #366196; + background: #b4c9e4; +} + +.callout-emphasize { + border-color: #993368; + background: #e6b2cd; +} + +/* captions */ +/* cf. https://thesynack.com/posts/markdown-captions */ +figcaption { + font-style: italic; + font-size: small; + padding: 0px; + text-align: center; +} + + /* SIDEBAR */ /* Add gray background to sidebar */ diff --git a/static/images/tutorials/development_docker/Gemini_Generated_Image_ngoz1wngoz1wngoz.jpg b/static/images/tutorials/development_docker/Gemini_Generated_Image_ngoz1wngoz1wngoz.jpg new file mode 100644 index 000000000..79a65a0d9 Binary files /dev/null and b/static/images/tutorials/development_docker/Gemini_Generated_Image_ngoz1wngoz1wngoz.jpg differ diff --git a/static/images/tutorials/development_docker/docker_build.jpg b/static/images/tutorials/development_docker/docker_build.jpg new file mode 100644 index 000000000..d9c826a5d Binary files /dev/null and b/static/images/tutorials/development_docker/docker_build.jpg differ diff --git a/static/images/tutorials/development_docker/docker_desktop1.jpg b/static/images/tutorials/development_docker/docker_desktop1.jpg new file mode 100644 index 000000000..eddaae034 Binary files /dev/null and b/static/images/tutorials/development_docker/docker_desktop1.jpg differ diff --git a/static/images/tutorials/development_docker/docker_desktop2.jpg b/static/images/tutorials/development_docker/docker_desktop2.jpg new file mode 100644 index 000000000..aa786399b Binary files /dev/null and b/static/images/tutorials/development_docker/docker_desktop2.jpg differ diff --git a/static/images/tutorials/development_docker/docker_metaphor_tiny_space.jpg b/static/images/tutorials/development_docker/docker_metaphor_tiny_space.jpg new file mode 100644 index 000000000..63e6b07cd Binary files /dev/null and b/static/images/tutorials/development_docker/docker_metaphor_tiny_space.jpg differ diff --git a/static/images/tutorials/development_docker/docker_run.jpg b/static/images/tutorials/development_docker/docker_run.jpg new file mode 100644 index 000000000..683c50e69 Binary files /dev/null and b/static/images/tutorials/development_docker/docker_run.jpg differ diff --git a/static/images/tutorials/development_docker/docker_winbuild.jpg b/static/images/tutorials/development_docker/docker_winbuild.jpg new file mode 100644 index 000000000..d4c55067d Binary files /dev/null and b/static/images/tutorials/development_docker/docker_winbuild.jpg differ