Skip to content

Conversation

@tthvo
Copy link
Member

@tthvo tthvo commented Oct 31, 2025

- What I did

This updates the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS.

  • DualStack: default to 0.0.0.0
  • DualStackIPv6Primary: default to ::

This set the --node-ip (i.e. to 0.0.0.0 or ::) kubelet argument when enabling dualstack support on cloud providers, where node ip is not beforehand.

- Why I did

When investigating failures related to dual-stack support on AWS, I noticed kubelet ran without the --node-ip=<any-id> argument. As a result, CNI never came online, while complaining that node was missing the InternalIP address. For example, results from a failed attempt returned the following errors:

Component Failed log
ovnkube-controller container F0903 17:41:46.149835 5622 ovnkube.go:138] failed to run ovnkube: [failed to start network controller: failed to start default network controller - while waiting for any node to have zone: "i-041d879bce674db11.ec2.internal", error: context canceled, failed to start node network controller: failed to init default node network controller: i-041d879bce674db11.ec2.internal doesn't have an address with type InternalIP or ExternalIP]
kubelet Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?"
kube-rbac-proxy-crio container failed to initialize certificate reloader: error loading certificates: error loading certificate: open /var/lib/kubelet/pki/kubelet-server-current.pem: no such file or directory

After some research and trial, I determined that the kubelet --node-ip is necessary. It must be set to 0.0.0.0 or :: (ipv6-primary) in case of dualstack. After ensuring the argument is set, node was assigned InternalIP address and CNI progressed successfully.

- How to verify it

Tested with openshift/installer#9930. Alternatively, the installer can lay down a environment file to set the env var (for example, openshift/installer@9fa264d), but I think it seems quite hacky 😞

- Description for the changelog

Update the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS (i.e. 0.0.0.0 for Dualstack and :: for DualStackIPv6Primary)

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 31, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 31, 2025

@tthvo: This pull request references CORS-4208 which is a valid jira issue.

In response to this:

- What I did

This updates the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS.

  • DualStack: default to 0.0.0.0
  • DualStackIPv6Primary: default to ::

This set the --node-ip (i.e. to 0.0.0.0 or ::) kubelet argument when enabling dualstack support on cloud providers, where node ip is not beforehand.

- How to verify it

Tested with openshift/installer#9930. Alternatively, the installer can lay down a environment file to set the env var, but I think it is quite hacky.

- Description for the changelog

Update the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS (i.e. 0.0.0.0 for Dualstack and :: for DualStackIPv6Primary)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tthvo
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tthvo
Copy link
Member Author

tthvo commented Oct 31, 2025

/cc @sadasu @patrickdillon

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 31, 2025

@tthvo: This pull request references CORS-4208 which is a valid jira issue.

In response to this:

- What I did

This updates the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS.

  • DualStack: default to 0.0.0.0
  • DualStackIPv6Primary: default to ::

This set the --node-ip (i.e. to 0.0.0.0 or ::) kubelet argument when enabling dualstack support on cloud providers, where node ip is not beforehand.

- Why I did

When investigating failures related to dual-stack support on AWS, I noticed kubelet ran without the --node-ip=<any-id> argument. As a result, CNI never came online, while complaining that node was missing the InternalIP address. For example, results from a failed attempt returned the following errors:

Component Failed log
ovnkube-controller container F0903 17:41:46.149835 5622 ovnkube.go:138] failed to run ovnkube: [failed to start network controller: failed to start default network controller - while waiting for any node to have zone: "i-041d879bce674db11.ec2.internal", error: context canceled, failed to start node network controller: failed to init default node network controller: i-041d879bce674db11.ec2.internal doesn't have an address with type InternalIP or ExternalIP]
kubelet Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?"
kube-rbac-proxy-crio container failed to initialize certificate reloader: error loading certificates: error loading certificate: open /var/lib/kubelet/pki/kubelet-server-current.pem: no such file or directory

After some research and trial, I determined that the kubelet --node-ip is necessary. It must be set to 0.0.0.0 or :: (ipv6-primary) in case of dualstack. After ensuring the argument is set, node was assigned InternalIP address and CNI progressed successfully.

- How to verify it

Tested with openshift/installer#9930. Alternatively, the installer can lay down a environment file to set the env var (for example, openshift/installer@9fa264d), but I think it seems quite hacky 😞

- Description for the changelog

Update the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS (i.e. 0.0.0.0 for Dualstack and :: for DualStackIPv6Primary)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Member Author

tthvo commented Oct 31, 2025

Not sure if I am doing the right thing 😓 , but with openshift/installer#9930, this change worked as expected. This was only tested with AWS.

Pending confirmations for other platforms 👀 PTAL 🙏

@tthvo
Copy link
Member Author

tthvo commented Nov 3, 2025

/retest

@sadasu
Copy link
Contributor

sadasu commented Nov 5, 2025

/cc @cybertron and @mkowalski Could you PTAL ? This is required for adding DualStack support for AWS and Azure.

@cybertron
Copy link
Member

While it feels a little weird to set KUBELET_NODE_IPS to a single value, since all we're really doing is telling it whether to prefer v4 or v6 I think this should be okay. Also worth noting that for the on-prem platforms we override these values anyway so it shouldn't affect us. Just to be sure though:

/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6

@sadasu
Copy link
Contributor

sadasu commented Nov 6, 2025

Are the changes to the on-prem files done to maintain consistency? Test prove that the changes are fine. I am leaning towards not making any changes to on-prem files even if harmless.

This updates the master and worker kubelet service templates to set the
defaults KUBELET_NODE_IPS.
- DualStack: default to "0.0.0.0"
- DualStackIPv6Primary: default to "::"
@tthvo
Copy link
Member Author

tthvo commented Nov 6, 2025

Are the changes to the on-prem files done to maintain consistency? Test prove that the changes are fine. I am leaning towards not making any changes to on-prem files even if harmless.

Right, it was done for consistency. Thus, I removed the changes for on-prem unit files now as suggested 👍

@tthvo
Copy link
Member Author

tthvo commented Nov 6, 2025

Thanks everyone for the reviews and insights! I addressed the comments just now. PTAL again 🙏

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 7, 2025

@tthvo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack a113f54 link false /test e2e-openstack
ci/prow/bootstrap-unit 07ae9ad link false /test bootstrap-unit
ci/prow/okd-scos-e2e-aws-ovn 07ae9ad link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants