Machine Without Valid Node #1885

edubach · 2024-02-13T11:44:17Z

Discussed in #1878

^{Originally posted by edubach February 1, 2024}
Hi,

When trying to add machines of a specific model (DELL C6420) to the cluster, version 4.14.0-0.okd-2024-01-06-084517, nodes are never associated with the bare metals, but they are shown as provisioned in the "Bare metal hosts" section.
After some time the Console starts to show alerts like

MachineWithoutValidNode
1 Feb 2024, 21:41
If the machine never became a node, you should diagnose the machine related failures.
If the node was deleted from the API, you may delete the machine if appropriate.
View details

MachineWithNoRunningPhase
1 Feb 2024, 21:41
The machine has been without a Running or Deleting phase for more than 60 minutes.
The machine may not have been provisioned properly from the infrastructure provider, or
it might have issues with CertificateSigningRequests being approved.

Login in the machines, I can see

Feb 01 21:51:31 localhost systemd[1]: Starting ostree-boot-complete.service - OSTree Complete Boot... 
Feb 01 21:51:32 localhost ostree[1696]: error: ostree-finalize-staged.service failed on previous boot: Child process exited with code 1 
Feb 01 21:51:32 localhost systemd[1]: ostree-boot-complete.service: Main process exited, code=exited, status=1/FAILURE 
Feb 01 21:51:32 localhost systemd[1]: ostree-boot-complete.service: Failed with result 'exit-code'. 
Feb 01 21:51:32 localhost systemd[1]: Failed to start ostree-boot-complete.service - OSTree Complete Boot.

# journalctl -b -1 -u ostree-finalize-staged 
Data from the specified boot (-1) is not available: No such boot ID in journal

# rpm-ostree status 
State: idle 
Deployments: 
● fedora:fedora/x86_64/coreos/stable 
                 Version: 38.20230609.3.0 (2023-06-26T21:56:57Z) 
                  Commit: 248366c65732b30ae0dbd96be8b75db46f08f428f68254ee14ac52cb39f82240 
            GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464

# journalctl -u ostree-finalize-staged.service 
Feb 01 21:48:45 node115 systemd[1]: Finished ostree-finalize-staged.service - OSTree Finalize Staged Deployment. 
Feb 01 21:48:54 node115 systemd[1]: Stopping ostree-finalize-staged.service - OSTree Finalize Staged Deployment... 
Feb 01 21:48:55 node115 ostree[4609]: Finalizing staged deployment 
Feb 01 21:48:58 node115 ostree[4609]: Copying /etc changes: 21 modified, 0 removed, 145 added 
Feb 01 21:48:58 node115 ostree[4609]: Copying /etc changes: 21 modified, 0 removed, 145 added 
Feb 01 21:48:58 node115 ostree[4618]: error parsing semanage configuration file: syntax error 
Feb 01 21:48:58 node115 ostree[4618]: semodule: Could not create semanage handle 
Feb 01 21:48:58 node115 ostree[4609]: error: Child process exited with code 1 
Feb 01 21:48:58 node115 systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited, status=1/FAILURE 
Feb 01 21:48:58 node115 systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'. 
Feb 01 21:48:58 node115 systemd[1]: Stopped ostree-finalize-staged.service - OSTree Finalize Staged Deployment.

No obvious errors are present at /etc/selinux/semanage.conf

It seems that the provisioning was indeed incomplete, with several packages missing:

kubelet.service: Failed to schedule restart job: Unit crio.service not found.

Feb 01 23:55:36 node117 configure-ovs.sh[9174]: /usr/local/bin/configure-ovs.sh: line 395: ovs-vsctl: command not found

Can someone please help me to troubleshoot this?

Thanks.

The text was updated successfully, but these errors were encountered:

JaimeMagiera · 2024-08-15T14:01:07Z

Hi,

We are not working on FCOS builds of OKD any more. Please see these documents...

https://okd.io/blog/2024/06/01/okd-future-statement
https://okd.io/blog/2024/07/30/okd-pre-release-testing

We will be providing documentation on upgrading clusters from 4.15 FCOS to 4.16 SCOS. In terms of clusters that are older, you may be able to get help from community members. I'll convert this to a discussion to facilitate that.

Many thanks,

Jaime

okd-project locked and limited conversation to collaborators Aug 15, 2024

JaimeMagiera converted this issue into discussion #2007 Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Machine Without Valid Node #1885

Machine Without Valid Node #1885

edubach commented Feb 13, 2024

JaimeMagiera commented Aug 15, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Machine Without Valid Node #1885

Machine Without Valid Node #1885

Comments

edubach commented Feb 13, 2024

Discussed in #1878

JaimeMagiera commented Aug 15, 2024

This issue was moved to a discussion.