Skip to content

engineering: Native /etc/default/grub editor for Azure Linux 4.0#642

Draft
Britel wants to merge 7 commits into
mainfrom
user/britel/azl4-1-grub-native
Draft

engineering: Native /etc/default/grub editor for Azure Linux 4.0#642
Britel wants to merge 7 commits into
mainfrom
user/britel/azl4-1-grub-native

Conversation

@Britel
Copy link
Copy Markdown
Collaborator

@Britel Britel commented May 12, 2026

Stacked change — needs PRs 2 and 3 to actually fire on AZL4.

This PR adds the foundation (native /etc/default/grub editor + AZL4 distro detection + a new GRUB update arm), but the new code path is gated on host_os_release which is always AZL3 in the MOS installer environment today. PR-3 flips the gate to use the image distro, which is what actually routes AZL4 installs through this new path. PR-2 fixes ESP/EFI vendor directory discovery for AZL4.

Stacked change Compare link
PR-2 — ESP layouts (auto-disable noprefix, generic EFI vendor dir) user/britel/azl4-1-grub-native...user/britel/azl4-2-esp-layouts
PR-3 — BLS cmdline extraction + image-distro detection user/britel/azl4-1-grub-native...user/britel/azl4-3-configure-bls
PR-4 — Native hostname carry-over (independent of this stack) main...user/britel/azl4-4-osconfig-hostname
PR-5 — testimages.py registration + AZL4 scenarios (independent of this stack) main...user/britel/azl4-5-vm-testimage

Summary

Adds Azure Linux 4.0 support for native /etc/default/grub manipulation, replacing the OS Modifier dependency for the GRUB defaults editing step. AZL4 uses Fedora-derived BLS (Boot Loader Spec) conventions, which OS Modifier currently does not handle.

What this PR does

  • New module osutils::grub_defaults — pure-Rust reader/editor for /etc/default/grub plus a grub2-mkconfig wrapper. Preserves comment lines, blank lines, and original quote style. Atomic write via sibling temp file + fsync + POSIX rename so a crash mid-write doesn't brick the boot path.
  • AzureLinuxRelease::AzL4 distro variant with parser recognition for Azure Linux 4.x os-release strings.
  • update_grub_config_native() wired as a new arm in the GRUB config update flow, gated by Distro::AzureLinux(AzureLinuxRelease::AzL4). Auto-detects whether the target image uses GRUB_CMDLINE_LINUX (AZL3/Ubuntu convention) or GRUB_CMDLINE_LINUX_DEFAULT (AZL4/Fedora convention) and updates the right variable.
  • BLS-aware best-effort cmdline extractionextract_cmdline_from_grub_cfg returns Err on BLS-only grub.cfg (which is the AZL4 native format); the call site treats that as a graceful "no preserved args" rather than failing the whole install. SELinux preservation falls through to defaults.
  • Tests — 10 unit tests covering the parser, editor, comment preservation, AZL4-specific BLS-only grub.cfg behavior, and the atomic write helper (3 dedicated tests covering create, replace, and no-leftover-tempfiles).

Behavior with PR-1 alone (without PRs 2 and 3)

Target install Code path Outcome
AZL3 → AZL3 AZL3 arm (existing osmodifier) Unchanged. Identical to today.
AZL3 → AZL4 (current AZL4 install case) AZL3 arm (existing osmodifier) Unchanged. Whatever happens today still happens.
AZL4 → anything (no one builds AZL4 MOS today) new AZL4 native arm New code fires.

Risk to AZL3 installs: zero. The AZL3 arm is untouched.

Testing

Suite Result
cargo test -p osutils --lib 139 / 139 pass
cargo test -p trident --lib 363 / 363 pass
cargo check --workspace --tests clean
cargo clippy -p osutils --tests clean
cargo clippy -p trident --tests 1 pre-existing warning in osimage/cosi/mod.rs:1456 (unrelated)

E2E integration test on AZL4 VM (using the full stack PR-1 + 2 + 3 + 4 + test infra):

  • Clean install completed: servicingState: clean-install-finalizedprovisioned after commit
  • /etc/default/grub written atomically with both GRUB_CMDLINE_LINUX and GRUB_CMDLINE_LINUX_DEFAULT populated; root partuuid baked into _DEFAULT as expected
  • BLS entry generated at /boot/loader/entries/<machine-id>-<kernel-version>.conf
  • Kernel boots cleanly: 6.18.5-1.8.azl4~20260420.x86_64
  • pytest -m base against trident_configurations/base-azl4: 3 passed, 1 failed (failure is pre-existing test_users KeyError on empty os.users; tracked separately)

Deep multi-model review

This PR went through a 9-agent adversarial review (3 roles × 3 models: Claude Opus 4.7, GPT-5.5, Claude Sonnet 4.6) using the deep-review-multi-model protocol. Two blocking findings were identified and fixed in this branch:

  1. extract_cmdline_from_grub_cfg previously propagated errors with ? at the call site, which would have hard-failed on any BLS-only grub.cfg (the AZL4 native format). Now best-effort.
  2. fs::write for /etc/default/grub was non-atomic. Now uses temp file + fsync + POSIX rename.

Non-blocking findings (BLS test that exercises inline rather than production code, set() always quoting, shell_split is split_whitespace, recovery-menuentry brace counting, unquote() length guard) are tracked in the deep-review report and queued for a follow-up commit.

Co-authored-by

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Britel and others added 7 commits May 8, 2026 15:18
…nipulation

Implements direct read/write/update of /etc/default/grub and kernel
command line arguments, replacing the external os-modifier dependency
for GRUB configuration management.

Includes:
- GrubDefaults struct for parsing and modifying /etc/default/grub
- Kernel cmdline arg parsing, updating, and removal
- grub.cfg kernel arg extraction (for initial value seeding)
- grub2-mkconfig wrapper for config regeneration
- 7 unit tests covering all operations

This is the foundation for eliminating the azurelinux-image-tools-osmodifier
dependency, enabling Trident to manage GRUB config natively on any distro
(including AZL4 where os-modifier is not available).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add AzureLinuxRelease::AzL4 variant for VERSION_ID 4.x detection.
Extend GRUB config update path to accept both AzL3 and AzL4.
Add AzL4 initramfs naming pattern for functional tests.
Add test case using real AZL4 Alpha 2 os-release data (ID_LIKE=fedora).
Add AZL4 mock os-release for test utilities.

Note: GRUB update path reuses AzL3 logic via os-modifier. The os-modifier
tool is not yet available in AZL4 repos. This will need resolution before
AZL4 A/B updates can work end-to-end.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add update_grub_config_native() in grub.rs that uses the new grub_defaults
module instead of the external os-modifier binary. Feature-gated by distro:

- AzL3: continues using os-modifier (unchanged behavior)
- AzL4: uses native /etc/default/grub manipulation + grub2-mkconfig

The native path handles all the same kernel args as os-modifier:
- Root device
- SELinux mode (selinux=, enforcing=)
- dm-verity (rd.systemd.verity, systemd.verity_root_data/hash)
- Overlay filesystem (rd.overlayfs=)
- cloud-init network-config disable (for netplan)

This eliminates the azurelinux-image-tools-osmodifier dependency for AZL4,
which is not available in the AZL4 Alpha 2 package repos.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…AULT

AZL4 (Fedora-based with BLS) uses GRUB_CMDLINE_LINUX_DEFAULT while
AZL3 uses GRUB_CMDLINE_LINUX. The native GRUB update path now detects
which variable exists in /etc/default/grub and uses it.

Also updated grub_defaults API: get_cmdline_args(), update_cmdline_args(),
and remove_cmdline_args() now take the variable name as a parameter,
making them usable for any GRUB variable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 3 tests for the AZL4 boot configuration:
- test_azl4_grub_cmdline_linux_default: Validates the full update flow
  with GRUB_CMDLINE_LINUX_DEFAULT, BLS config preservation, and
  Trident kernel arg injection (root, selinux, overlayfs)
- test_azl4_bls_entry_extraction: Confirms BLS-only grub.cfg correctly
  reports no linux lines (BLS entries are separate files)
- test_extract_cmdline_from_bls_entry: Validates parsing of BLS entry
  options lines (format from real AZL4 Alpha 2 system)

Test data sourced from live AZL4 Alpha 2 VM inspection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…/grub

Two fixes from PR-1 deep multi-model review:

1. extract_cmdline_from_grub_cfg is now best-effort at the call site.
   AZL4 ships GRUB_ENABLE_BLSCFG=true; fully-migrated systems will not
   have a top-level 'linux' line in grub.cfg, causing extraction to
   fail. current_args is only used for SELinux preservation fallback,
   so an empty map is a safe default. The native /etc/default/grub
   path is the authoritative source of kernel args either way.

2. atomic_write() replaces fs::write() for /etc/default/grub. Writes
   to a sibling temp file, fsyncs, and renames atomically. A crash
   during write of this boot-critical file could otherwise produce a
   truncated/empty file and brick the next grub2-mkconfig.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three unit tests for the new atomic_write helper:
- creates a new file
- replaces an existing file
- leaves no leftover temp files on success

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant