Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine Without Valid Node #1885

Closed
edubach opened this issue Feb 13, 2024 Discussed in #1878 · 1 comment
Closed

Machine Without Valid Node #1885

edubach opened this issue Feb 13, 2024 Discussed in #1878 · 1 comment

Comments

@edubach
Copy link

edubach commented Feb 13, 2024

Discussed in #1878

Originally posted by edubach February 1, 2024
Hi,

When trying to add machines of a specific model (DELL C6420) to the cluster, version 4.14.0-0.okd-2024-01-06-084517, nodes are never associated with the bare metals, but they are shown as provisioned in the "Bare metal hosts" section.
After some time the Console starts to show alerts like

MachineWithoutValidNode
1 Feb 2024, 21:41
If the machine never became a node, you should diagnose the machine related failures.
If the node was deleted from the API, you may delete the machine if appropriate.
View details
MachineWithNoRunningPhase
1 Feb 2024, 21:41
The machine has been without a Running or Deleting phase for more than 60 minutes.
The machine may not have been provisioned properly from the infrastructure provider, or
it might have issues with CertificateSigningRequests being approved.

Login in the machines, I can see

Feb 01 21:51:31 localhost systemd[1]: Starting ostree-boot-complete.service - OSTree Complete Boot... 
Feb 01 21:51:32 localhost ostree[1696]: error: ostree-finalize-staged.service failed on previous boot: Child process exited with code 1 
Feb 01 21:51:32 localhost systemd[1]: ostree-boot-complete.service: Main process exited, code=exited, status=1/FAILURE 
Feb 01 21:51:32 localhost systemd[1]: ostree-boot-complete.service: Failed with result 'exit-code'. 
Feb 01 21:51:32 localhost systemd[1]: Failed to start ostree-boot-complete.service - OSTree Complete Boot.
# journalctl -b -1 -u ostree-finalize-staged 
Data from the specified boot (-1) is not available: No such boot ID in journal

# rpm-ostree status 
State: idle 
Deployments: 
● fedora:fedora/x86_64/coreos/stable 
                 Version: 38.20230609.3.0 (2023-06-26T21:56:57Z) 
                  Commit: 248366c65732b30ae0dbd96be8b75db46f08f428f68254ee14ac52cb39f82240 
            GPGSignature: Valid signature by 6A51BBABBA3D5467B6171221809A8D7CEB10B464
# journalctl -u ostree-finalize-staged.service 
Feb 01 21:48:45 node115 systemd[1]: Finished ostree-finalize-staged.service - OSTree Finalize Staged Deployment. 
Feb 01 21:48:54 node115 systemd[1]: Stopping ostree-finalize-staged.service - OSTree Finalize Staged Deployment... 
Feb 01 21:48:55 node115 ostree[4609]: Finalizing staged deployment 
Feb 01 21:48:58 node115 ostree[4609]: Copying /etc changes: 21 modified, 0 removed, 145 added 
Feb 01 21:48:58 node115 ostree[4609]: Copying /etc changes: 21 modified, 0 removed, 145 added 
Feb 01 21:48:58 node115 ostree[4618]: error parsing semanage configuration file: syntax error 
Feb 01 21:48:58 node115 ostree[4618]: semodule: Could not create semanage handle 
Feb 01 21:48:58 node115 ostree[4609]: error: Child process exited with code 1 
Feb 01 21:48:58 node115 systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited, status=1/FAILURE 
Feb 01 21:48:58 node115 systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'. 
Feb 01 21:48:58 node115 systemd[1]: Stopped ostree-finalize-staged.service - OSTree Finalize Staged Deployment.

No obvious errors are present at /etc/selinux/semanage.conf

It seems that the provisioning was indeed incomplete, with several packages missing:

kubelet.service: Failed to schedule restart job: Unit crio.service not found.

Feb 01 23:55:36 node117 configure-ovs.sh[9174]: /usr/local/bin/configure-ovs.sh: line 395: ovs-vsctl: command not found

Can someone please help me to troubleshoot this?

Thanks.

@JaimeMagiera
Copy link
Contributor

Hi,

We are not working on FCOS builds of OKD any more. Please see these documents...

https://okd.io/blog/2024/06/01/okd-future-statement
https://okd.io/blog/2024/07/30/okd-pre-release-testing

We will be providing documentation on upgrading clusters from 4.15 FCOS to 4.16 SCOS. In terms of clusters that are older, you may be able to get help from community members. I'll convert this to a discussion to facilitate that.

Many thanks,

Jaime

@okd-project okd-project locked and limited conversation to collaborators Aug 15, 2024
@JaimeMagiera JaimeMagiera converted this issue into discussion #2007 Aug 15, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants