Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface errors when a PodTemplate or a ProvReq is invalid #3025

Open
3 tasks
alculquicondor opened this issue Sep 11, 2024 · 6 comments · May be fixed by #3056
Open
3 tasks

Surface errors when a PodTemplate or a ProvReq is invalid #3025

alculquicondor opened this issue Sep 11, 2024 · 6 comments · May be fixed by #3056
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@alculquicondor
Copy link
Contributor

What would you like to be added:

When a ProvReq or PodTemplate is rejected by a webhook, Kueue just logs the error and continues retrying. These errors will not be visible to end-users and they might just interpret them as "kueue is stuck". We should communicate these errors in the Workload object, maybe even produce an event?

Why is this needed:

A cloud provider could have a webhook to validate PodTemplates created for ProvisioningRequests.

These errors need to be surfaced to users so they can fix any problem about them.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@alculquicondor alculquicondor added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 11, 2024
@kannon92
Copy link
Contributor

In jobset, we did look at https://github.com/kubernetes-sigs/kubectl-validate to help with validation of these fields.

Not sure if this would be helpful here as you could try creating these objects and if they fail they you bubble of the error as a condition or event.

@alculquicondor
Copy link
Contributor Author

The error could come from a webhook, for which kubectl-validate wouldn't help.

@mimowo
Copy link
Contributor

mimowo commented Sep 12, 2024

+1 for the feature

The starting point could be to record any ProvReq creation errors here, or a level up errors in the check's message.

@IrvingMg
Copy link
Contributor

/assign

@mimowo
Copy link
Contributor

mimowo commented Sep 19, 2024

I see the event PR, but wondering if this is enough.

In particular, events are temporary objects, so it is not clear if an admin would notice them. OTOH the ProvReq creation is most likely re-attempted, so the event will be generated continuously, so should be easy to notice.

We could also explore the option of exposing the error as a status in the Message field in the AdmissionCheckState:

Message string `json:"message" protobuf:"bytes,6,opt,name=message"`
.

Any opinions on that?

@alculquicondor
Copy link
Contributor Author

Yes, a status message is more important. An event is nice-to-have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants