You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This appears to be a chicken and egg problem caused by choosing MetalLB as the VIP provider for the API server. When you use the HelmChart custom resource to install something, rke2 runs a Helm install job on startup that doesn't seem to provide any way to add tolerations that I can see to the pod it spawns. Thus, these pods will never schedule if you taint your control plane nodes. Since the API server VIP never comes up, agent nodes can't join, and the pods can't be scheduled to the control plane nodes.
A simple workaround is:
Login to initializer node and remove its taint
Edit MetalLB controller deployment and speaker daemonset once they're created to tolerate your taint
Reapply the taint
This can potentially still cause problems, as my biggest reason for having the taint was preventing Longhorn from running on control plane nodes, as it runs privileged and mounts a host path from /var/lib/longhorn which is a larger, separate disk on the agent nodes. Cleaning that up after the fact is a bit of a hassle. This can additionally be worked around by giving Longhorn a nodeSelector.
Real solutions could be:
If rke2 supports this, allow passing of tolerations to the Helm install jobs so they'll run on tainted nodes (would still require adding a HelmChartConfig for MetallLB so its pods will also run on the control plane)
Use kube-vip instead of MetalLB since it can run as static pods, which won't be affected by taints (this also better aligns with Harvester, another SUSE product that uses kube-vip to provide a bare metal load balancer for the Kubernetes API server)
The text was updated successfully, but these errors were encountered:
Thanks. It does appear that that would fix it, too. Not arbitrary taints, but the "CriticalAddonsOnly" that rke2 suggests would be tolerated, which is the one I was trying to use.
This appears to be a chicken and egg problem caused by choosing MetalLB as the VIP provider for the API server. When you use the
HelmChart
custom resource to install something, rke2 runs a Helm install job on startup that doesn't seem to provide any way to add tolerations that I can see to the pod it spawns. Thus, these pods will never schedule if you taint your control plane nodes. Since the API server VIP never comes up, agent nodes can't join, and the pods can't be scheduled to the control plane nodes.A simple workaround is:
This can potentially still cause problems, as my biggest reason for having the taint was preventing Longhorn from running on control plane nodes, as it runs privileged and mounts a host path from
/var/lib/longhorn
which is a larger, separate disk on the agent nodes. Cleaning that up after the fact is a bit of a hassle. This can additionally be worked around by giving Longhorn a nodeSelector.Real solutions could be:
HelmChartConfig
for MetallLB so its pods will also run on the control plane)The text was updated successfully, but these errors were encountered: