Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talos 1.8.0+ initial boot fails in phase meta (6/12) #9776

Closed
smauermann opened this issue Nov 21, 2024 · 14 comments
Closed

Talos 1.8.0+ initial boot fails in phase meta (6/12) #9776

smauermann opened this issue Nov 21, 2024 · 14 comments

Comments

@smauermann
Copy link

smauermann commented Nov 21, 2024

Bug Report

Hi team, I am failing to boot any Talos v1.8.0+ ISO from USB on an HP Elitedesk 800 g5 Mini (i5 9500T with vPro). The boot process fails in the meta phase at step 6/12.

I would really love to switch to Talos but I am at a loss right now on how to proceed. I would love any hints!

Description

Up until and including v1.7.7 I can boot just fine and apply the machine configs to have a function Kubernetes cluster. However, if I try to boot any Talos version greater than v1.8.0, the boot fails. Please see a screenshot of the failed boot process below.

I have tried various other images to verify nothing is wrong with the node: Debian, Proxmox, CoreOS, and of course different Talos version.

Besides the different versions, I have played with different extensions namely intel-ucode, i915-ucode, mei, utils-linux including all permutations of the selected extensions.

Interestingly, I was able to creat a cluster with 1.7.7 and upgrade to 1.8.0. Another upgrade to 1.8.3 (no extensions) failed, though. The screen just went black after the reboot and never came up. I am trying this now again with different extensions.

EDIT: I was able to upgrade from 1.7.7 without any extensions to 1.8.3 including the following extensions:

customization:
    systemExtensions:
        officialExtensions:
            - siderolabs/i915-ucode
            - siderolabs/intel-ucode
            - siderolabs/mei
            - siderolabs/util-linux-tools

Logs

"Screenshot" of the boot failure:
talos-boot-fail

Environment

  • Talos version: v1.8.3
  • Kubernetes version: na
  • Platform: metal on HP Elitedesk 800 g5 Mini (i5 9500T with vPro)
@smira
Copy link
Member

smira commented Nov 21, 2024

So there might be a mix of several issues here, with Talos 1.8 there's unfortunate side-effect for those having i915 - the i915-ucode should be included, otherwise the Linux kernel fails to boot (it will be fixed for 1.9+).

As for the error above in the screenshot, it is certainly a bug, but I don't understand how it ends up with way.

Does the disk contain any previous Talos install when booting from an ISO (USB)?

@smira
Copy link
Member

smira commented Nov 21, 2024

Oh yeah, I misread the picture. I guess you might have META partition somewhere on the disk.

Moreover, it might be related to incomplete wipe of the system disk. Please try to wipe the disks before installation.

@smauermann
Copy link
Author

Hi @smira, thanks for your swift reply! I did shred both internal disks before installing Talos and I performed a wipe via the disks machine config during the install of 1.7.7. I was pretty sure that I nuked everything before the installation. Is there any way of checking for the existence of META extraneous partitions?

@smauermann
Copy link
Author

Also, I'm happy to hear that the i915 issue will be fixed with the next minor version. Keep up the great work.

@smira
Copy link
Member

smira commented Nov 21, 2024

I don't see the logs, but I wonder if there's a message somewhere up from the VolumeManager controller about META partition being found (it shouldn't be).

@smauermann
Copy link
Author

I did not observe such a message, but then again the logs fly past pretty quickly and all I could capture is in the screenshot above 😄

@smira
Copy link
Member

smira commented Nov 21, 2024

One of the options is to record a video, it sometimes allows to see individual messages.

@smauermann
Copy link
Author

Would a talosctl reset get rid of any META partitions that could mess with any subsequent installs?

@eugene-marchanka
Copy link

I have a suspicion that I have similar issue with ASRock Motherboard Z690D4U-2L2T/G5
Solved by building and booting from image built with @smauermann parameters 👍🏻
Spent 2 days trying to get logs from SOL without success.
SOL is working fine to the point until Talos starts to boot 😞

@smira
Copy link
Member

smira commented Nov 25, 2024

Would a talosctl reset get rid of any META partitions that could mess with any subsequent installs?

yes, this should reset fully

@eugene-marchanka
Copy link

I fixed IPMI Console Redirect for my system just simply delete all console parameters from Grub menu and put console parameter matching my BIOS settings at the end.

I used Talos ISO image from GitHub that was not modified by Talos Factory.

Full boot log with Kernel Panic is attached.

@smira let me know, please, if this is different so I can open separate issue 🙏🏻

superlogics-talos-1-8-3-boot-crash.txt

@smira
Copy link
Member

smira commented Nov 26, 2024

@smira let me know, please, if this is different so I can open separate issue 🙏🏻

Do you have i915-ucode system extension installed - if not, this is a known issue. It is fixed in 1.9 by moving the i915 driver away from the base image.

@smira
Copy link
Member

smira commented Nov 27, 2024

Should be fixed in #9810

@smira smira closed this as completed Nov 27, 2024
@smauermann
Copy link
Author

smauermann commented Nov 28, 2024

I am happy to report that I have solved the boot problem by (drumroll) formatting the thumb drive I was using to boot Talos 😄 I guess I had a META partition on the thumb drive from a previous install and that was tripping up my boot attempts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants