Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talos tries (and fails) to apply invalid file patches #9550

Open
PrivatePuffin opened this issue Oct 22, 2024 · 2 comments
Open

Talos tries (and fails) to apply invalid file patches #9550

PrivatePuffin opened this issue Oct 22, 2024 · 2 comments

Comments

@PrivatePuffin
Copy link

Bug Report

Description

When we add a files in a machine config like this:

  files:
    - path: "/etc/cri/conf.d/20-customization.part"
      permissions: 0
      content: |
        [plugins."io.containerd.grpc.v1.cri"]
          enable_unprivileged_ports = true
          enable_unprivileged_icmp = true
        [plugins."io.containerd.grpc.v1.cri".containerd]
          discard_unpacked_layers = false
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          discard_unpacked_layers = false

    - path: "/etc/nfsmount.conf"
      permissions: 420
      content: |
        [ NFSMount_Global_Options ]
        nfsvers=4.2
        hard=True
        noatime=True
        nodiratime=True
        rsize=131072
        wsize=131072
        nconnect=8

notice the lacking "operation" key

On Apply Talos fails to apply the files due to it missing the operation key, logically, and throws the Talos System in a bootloop.

Expected Behavior

There are a few things that could've happened, and one of them should've happened, that should prevent the scope of this issue:

A. Validate the file patches to be at-least valid patches
B. If a patch fails, dont try to reapply an already know borking file-patch and keep rebooting
C. revert the Apply if a fifepatch is broken

However, none of these options happen.
So we end up with a broken system instead.

Logs

Screenshot_2024-10-22_at_19 24 16

Environment

  • Talos version: 1.8.1
  • Kubernetes version: 1.31
  • Platform: Irrelevant
@smira
Copy link
Member

smira commented Oct 23, 2024

The bug here is that the machine config validation is probably incomplete.

The workaround is to apply previous machine config, as apid is running, which will fix this issue

@PrivatePuffin
Copy link
Author

The bug here is that the machine config validation is probably incomplete.

The workaround is to apply previous machine config, as apid is running, which will fix this issue

I'm aware how to reverse the issue, no worries... These are testruns/testmachines/testusers :)
I'm reporting this mostly to get it fixed more globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants