Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty spec.template.spec.version makes ClusterClass fail #354

Closed
anmazzotti opened this issue Jul 1, 2024 · 3 comments
Closed

Empty spec.template.spec.version makes ClusterClass fail #354

anmazzotti opened this issue Jul 1, 2024 · 3 comments
Assignees
Labels
kind/support Categorizes issue or PR as a support question priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@anmazzotti
Copy link
Contributor

What happened:
[A clear and concise description of what the bug is.]

In my ClusterClass, the RKE2ControlPlaneTemplate is defined as follows (note: no spec.template.spec.version):

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: RKE2ControlPlaneTemplate
metadata:
  name: rke2-control-plane
spec:
  template:
    spec:
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: ElementalMachineTemplate
        name: rke2-control-plane
      serverConfig:
        disableComponents:
          kubernetesComponents:
            - cloudController
      nodeDrainTimeout: 2m
      registrationMethod: "control-plane-endpoint"
      rolloutStrategy:
        type: "RollingUpdate"
        rollingUpdate:
          maxSurge: 1

Cluster definition carries the version in spec.topology.version as expected:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  labels:
    cni: ${CLUSTER_NAME}-crs-0
    crs: "true"
  name: ${CLUSTER_NAME}
  namespace: ${NAMESPACE}
spec:
  clusterNetwork:
    services:
      cidrBlocks: ${SERVICE_CIDR:=["10.96.0.0/12"]}
    pods:
      cidrBlocks: ${POD_CIDR:=["10.244.0.0/16"]}
    serviceDomain: ${SERVICE_DOMAIN:="cluster.local"}
  topology:
    class: rke2
    version: "v${KUBERNETES_VERSION:=1.30.1}+rke2r1"
    controlPlane:
      metadata: {}
      replicas: ${CONTROL_PLANE_MACHINE_COUNT:=1}
    workers:
      machineDeployments:
      - class: rke2-default-worker
        name: md-0
        replicas: ${WORKER_MACHINE_COUNT:=1}
    variables:
    - name: controlPlaneEndpointHost
      value: ${CONTROL_PLANE_ENDPOINT_HOST:=""}
    - name: controlPlaneEndpointPort
      value: ${CONTROL_PLANE_ENDPOINT_PORT:=6443}
    - name: vipInterface
      value: ${VIP_INTERFACE:=eth0}

However application fails with:

The RKE2ControlPlaneTemplate "rke2-control-plane" is invalid: spec.template.spec.version: Invalid value: "": spec.template.spec.version in body should match 'v(\d\.\d{2}\.\d)\+rke2r\d'

What did you expect to happen:

No failure. The version should be read from the Cluster.spec.topology.version.

How to reproduce it:

See above or check out the reference

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • rke provider version: 0.3.0
  • OS (e.g. from /etc/os-release):
@anmazzotti anmazzotti added kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 1, 2024
@salasberryfin salasberryfin self-assigned this Jul 2, 2024
@salasberryfin salasberryfin moved this from CAPI Backlog to In Progress (8 max) in CAPI & Hosted Kubernetes providers (EKS/AKS/GKE) Jul 2, 2024
@salasberryfin salasberryfin moved this from In Progress (8 max) to CAPI Backlog in CAPI & Hosted Kubernetes providers (EKS/AKS/GKE) Jul 2, 2024
@salasberryfin salasberryfin removed their assignment Jul 2, 2024
@furkatgofurov7
Copy link
Contributor

furkatgofurov7 commented Jul 17, 2024

Resurfacing the discussion details and my findings we had offline on slack:

I am a bit unsure why not just set spec.version on RKE2ControlPlane and use that in your case? My argument is, that field is required in CC API contract (ref), this specifically:

Impact on the controlPlane providers:

the provider implementers are required to implement the ControlPlaneTemplate type (e.g. KubeadmControlPlaneTemplate etc.).
it is also important to notice that:
ClusterClass and managed topologies can work only with control plane providers implementing support for the spec.version field; Additionally, it is required to provide support for the status.version field reporting the minimum API server version in the cluster as required by the control plane contract.

So, I mean version should be read from spec.template.spec.version by design as in the logs you are seeing.

On the other note, I also previously mentioned it, CAPRKE2 API diverged from KCP in terms of what fields they/we have and not quiet following it, since CC was designed with KCP in mind. To align with KCP, we need to re-organize/shuffle our API fields to make our API (fields) match with KCP and that would need probably an API bump. I think, generally this issue is not really a bug but a neat, since we are used to KCPs API structure, however, CAPRKE2 API is different and it has the reasons why it is like that I believe.

@furkatgofurov7 furkatgofurov7 added kind/support Categorizes issue or PR as a support question priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 17, 2024
@furkatgofurov7 furkatgofurov7 self-assigned this Jul 17, 2024
@anmazzotti
Copy link
Contributor Author

anmazzotti commented Jul 21, 2024

@furkatgofurov7 I think this is still a bug.

The main problem is that the validation is too strict and doesn't allow to write a CC that uses this version as a variable.
For example this will fail with the same error I posted above:

    - name: rKE2ControlPlaneTemplate
      definitions:
        - selector:
            apiVersion: controlplane.cluster.x-k8s.io/v1beta1
            kind: RKE2ControlPlaneTemplate
            matchResources:
              controlPlane: true
          jsonPatches:
            - op: add
              path: "/spec/template/spec/version"
              valueFrom:
                variable: k8sVersion

Am I doing anything wrong? op: create also does not work.
Do you have any example of a RKE2 Cluster Class using this version field that works?

It is the same problem as #343, just on a different field.
The strict validation prevents the Cluster Class from working, unless you hardcode all the values of course.

Again, maybe I'm doing it wrong, not sure, if so please update the CC example in this repo.

Thank you.

@anmazzotti
Copy link
Contributor Author

anmazzotti commented Jul 24, 2024

As mentioned by @alexander-demicev , the version validation was tweaked in one of the latest releases.
I tested with 0.5.0 and I can confirm it works now.
I had to bring the k8s version variable inside the Cluster Class definition, but it does work once that is done. Reference

I don't want to take the liberty to close the issue, but it can be closed from my point of view.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Development

No branches or pull requests

4 participants