Skip to content

azure: RHEL-10 specific customization#3

Closed
lzap wants to merge 3 commits intomainfrom
init-azure1
Closed

azure: RHEL-10 specific customization#3
lzap wants to merge 3 commits intomainfrom
init-azure1

Conversation

@lzap
Copy link
Collaborator

@lzap lzap commented Mar 6, 2026

The Containerfiles were generated from our distrodefs and this commit fine-tunes them for the initial version we want to start with.


All Containerfile files in this repo were created using one-time script from our distrodef YAML definitions (package includes/excludes, services, files, directories and kernel arguments). Since these are definitions for package-based images and because we are building something different here and we do not necessary need parity, I am opening this PR so we can discuss what to remove, what to keep. This first PR is only for Azure, I will do follow-ups for other image types.

I am assuming this repo is new for all of you, I suggest you read the main README first. But in short, these Containerfiles start with an incorrect FROM verb which is missing tag, this is because they are always built with the correct --from argument. Each distribution (Fedora, CentOS Stream 9/10) has a set of 4 files per each image type. Depending on the outcome of these PR discussions, we might end up with less files but this is currently how the CICD pipeline expects everything to be set. For each PR, all combinations of OSes, versions, arches and image types are built and for merged commits these are also pushed to quay.io (RHEL is only built we will do this downstream only).

Now, each Containerfile start with this:

ARG CONTAINERFILE=Containerfile
COPY $CONTAINERFILE /root/Containerfile
COPY azure /root

The idea is to ship the Containerfile itself and everything it needs in the bootc image itself in /root so users can rebuild their instances easily. Then all necessary packages for the particular image type are installed. Only then, resource files are copied over (to ensure directories are already created by the package management tool). Then various customizations are done and finally, services are enabled.

In the initial commit of this PR, we are starting with an extreme change - I am changing from our distrodefs which include and exclude a ton of packages and services to the opposite extreme: minimum possible Containerfile as shown in the bootc-examples repo. With one actual change - data loss service is kept which I already find useful to have. Or not? Let's find out!

@achilleas-k @croissanne @supakeen @thozza @ondrejbudai @lucasgarfield @ochosi

Edit: Tests are red, well, will fix later :-)

Copy link
Member

@thozza thozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. In general, I think that this direction makes sense. I like the minimal additional packages installed.

Here are my thoughts:

  • Changes to Containerfile.fedora-qcow2 seem to be wrong and unrelated to the PR. The Fedora Container file is being changed to inherit from centos image, which is almost certainly not correct.
  • I would propose that we keep the additional package set as minimal as done in this PR, but for the OS configuration, we actually use whatever is done in our distro definitions. This means things like modprobe config, adding some drop-in configs and units. Basically https://github.com/osbuild/images/blob/4d68eefc5c417eb48d35e80a7ba7900ded94879f/data/distrodefs/rhel-10/imagetypes.yaml#L521-L704 for el10 as an example. Our image definitions OS config is already done as it should be, so we should keep it unless we run into issues. For package set, that's a different story and the approach in this PR is IMO correct.
  • We can hardly get away with using the same config files (those in the azure/ directory) for all distros and their version. We will need to have configs specific for Fedora, RHEL-9, RHEL-10, etc.
  • I would advice that before we move on, we add a CI pipeline which will build bootable disk images from these derived container images and boot-test them. I don't think that there is any other way to verify that the Containerfiles are correct for their purpose.

[install]
# See also:
# - https://github.com/coreos/fedora-coreos-config/blob/testing-devel/platforms.yaml
# - https://github.com/osbuild/images/blob/63a1eead26a7c802dbcebe863439f591be6dc6e5/pkg/distro/rhel9/qcow2.go#L159
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this file to reference the azure image type, not qcow2.

FWIW, the kargs seem to be wrong. Originally, there were console=tty1 console=ttyS0 earlyprintk=ttyS0, but now there is "console=tty0", "console=ttyS0,115200n8", "earlyprintk=ttyS0".

The kernel args don't even match what is written on https://learn.microsoft.com/en-us/azure/virtual-machines/linux/create-upload-generic?source=recommendations#general-linux-system-requirements.

Also, net.iframes=0 should not be set blindly on all distros. At the minimum, el10 based distros move to predictable NIC naming.

If you somehow combine the information from the links above, it would make sense to explain how you ended up with the kargs list in this file, because I can't really tell how you did it...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I blindly copy-pasted from the example repo without too much thinking about it.

@lzap lzap changed the title azure: replace distrodefs with azure specific files azure: RHEL-10 specific customization Mar 12, 2026
@lzap
Copy link
Collaborator Author

lzap commented Mar 12, 2026

  • Changes to Containerfile.fedora-qcow2 seem to be wrong and unrelated to the PR. The Fedora Container file is being changed to inherit from centos image, which is almost certainly not correct.

Fixed, I added this file accidentally.

  • I would propose that we keep the additional package set as minimal as done in this PR

Ok I am going to review them one by one and remove everything that is not an official Microsoft recommendation.

but for the OS configuration, we actually use whatever is done in our distro definitions. This means things like modprobe config, adding some drop-in configs and units. Basically https://github.com/osbuild/images/blob/4d68eefc5c417eb48d35e80a7ba7900ded94879f/data/distrodefs/rhel-10/imagetypes.yaml#L521-L704 for el10 as an example. Our image definitions OS config is already done as it should be, so we should keep it unless we run into issues. For package set, that's a different story and the approach in this PR is IMO correct.

Gonna review these, again, everything that I will be able to find as Microsoft recommendation will go in. I will create a comment in the file that will lay out some principles and explain why something is added.

  • We can hardly get away with using the same config files (those in the azure/ directory) for all distros and their version. We will need to have configs specific for Fedora, RHEL-9, RHEL-10, etc.

Yeah but I did not want to create subdirectory for each version, it can work if we prefix or suffix those files:

$ tree gce
gce
└── etc
    └── yum.repos.d
        ├── google-cloud-el10.repo
        └── google-cloud-el9.repo

3 directories, 2 files
  • I would advice that before we move on, we add a CI pipeline which will build bootable disk images from these derived container images and boot-test them. I don't think that there is any other way to verify that the Containerfiles are correct for their purpose.

I will be testing these changes manually for now, but I created a ticket to do boot tests.

I am gonna push soon, I am removing EL9, Fedora and the irrelevant file from the patch and let's focus on EL10 first, I will test it properly and once we agree on this I am gonna proceed with other versions (and types).

@lzap
Copy link
Collaborator Author

lzap commented Mar 12, 2026

Problem: package cloud-init-24.4-8.el9.noarch from appstream requires policycoreutils-python3 is a CentOS9 Stream problem. The reproducer is:

FROM quay.io/centos-bootc/centos-bootc:stream9

RUN dnf install -y \
        "cloud-init" \
        "@Server"

Anyways, I pushed some updates, gonna test them, but feel free to review.

Edit: The error turned out to be an issue when the base container has a newer dependency that is on mirrors. I think this is a problem for upstream and it can be mitigated with --nobest option, RH CDN is more robust. It is worth noting that the trigger is @Server, installing just cloud-init does work so it should naturally disappear once I finish with "minimizing" package sets in all Containerfiles. To fix tests, I can simply remove all package groups from all other Containerfiles because I think we agreed we want to give up parity for smallest package sets?

@lzap
Copy link
Collaborator Author

lzap commented Mar 13, 2026

Ok I am running into issues with the CICD pipeline I am closing this PR, I will start this repo over with a new and hopefully better pipeline via Schutzbot / GitLab with native Fedora/RHEL runners on RHOS. This will vastly simplify it.

I will basically delete all files and start over in a new PR where we can discuss stuff one file after another. We can continue the discussion, I already incorporated all your points @thozza so the first image type we will discuss will be Azure.

@lzap lzap closed this Mar 13, 2026
@lzap lzap deleted the init-azure1 branch March 13, 2026 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants