Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook validation for Topology NodeDeletionTimeout and NodeDrainTimeout #7104

Open
killianmuldoon opened this issue Aug 22, 2022 · 15 comments · May be fixed by #11257
Open

Webhook validation for Topology NodeDeletionTimeout and NodeDrainTimeout #7104

killianmuldoon opened this issue Aug 22, 2022 · 15 comments · May be fixed by #11257
Assignees
Labels
area/clusterclass Issues or PRs related to clusterclass help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@killianmuldoon
Copy link
Contributor

NodeDeletionTimeout and NodeDrainTimeout were added to Topology managed clusters in #7098 and #6278. Currently the values of these fields are not validated on creation, and validation is instead done when the templates are turned into objects.

This lack of up-front validation lead to the unexpected failure in #7047. We could do some basic validation in the webhook on object creation to ensure these values are correctly formatted and in a given range before creation.

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 22, 2022
@killianmuldoon
Copy link
Contributor Author

/area topology

@sbueringer
Copy link
Member

What would be the valid range for those fields?

@killianmuldoon
Copy link
Contributor Author

We don't have these defined right now in the machine webhook (and I don't know if there's any need to), but defining a min/max is an optional part of this.

I think the main part is to ensure that we do enough validation to catch errors like #7047 on object creation, instead of during the reconcile.

@sbueringer
Copy link
Member

sbueringer commented Aug 22, 2022

Yup. The problem is that metav1.Duration just has type "string" as OpenAPI schema, right?

If it would also use format duration OpenAPI would probably handle it for us? (via: // +kubebuilder:validation:Format)

https://github.com/kubernetes/apiextensions-apiserver/blob/master/pkg/apiserver/validation/formats.go#L49

But given the recent trend we would instead of the marker implement it in the webhook. (the format godoc sounds like we should use time.ParseDuration)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 20, 2022
@sbueringer
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 21, 2022
@fabriziopandini
Copy link
Member

/triage accepted
/remove-kind feature
/kind bug

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. kind/bug Categorizes issue or PR as related to a bug. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. kind/feature Categorizes issue or PR as related to a new feature. labels Nov 22, 2022
@killianmuldoon killianmuldoon added the area/clusterclass Issues or PRs related to clusterclass label May 4, 2023
@fabriziopandini
Copy link
Member

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 12, 2024
@fabriziopandini fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 3, 2024
@k8s-triage-robot
Copy link

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Deprioritize it with /priority important-longterm or /priority backlog
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Aug 1, 2024
@sbueringer
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 21, 2024
@Dhairya-Arora01
Copy link
Contributor

/assign

@JoelSpeed
Copy link
Contributor

What happens to the existing users who have persisted bad values when we update the validation here? Has it been considered to use ratcheting validation at all?

@sbueringer
Copy link
Member

I think it was not considered

@JoelSpeed
Copy link
Contributor

Ratcheting validation exists directly within the API server from Kube 1.30, but since we need to support older versions, ratcheting can either be implemented in a webhook, or, within a couple of well crafted CEL transition rules (though these aren't perfect as they don't cover the create case).

Without ratcheting, this does have the potential to break users on upgrade, they wouldn't be able to write anything to the object until the values of these broken fields were fixed.

@sbueringer
Copy link
Member

Ratcheting validation exists directly within the API server from Kube 1.30

If it's enabled per default it could be okay to just wait until 1.30 is the min supported version (Cluster API v1.10, basically we could then merge in December)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants