-
Notifications
You must be signed in to change notification settings - Fork 7
Description
What happened?
When using expired credentials or having other misconfigurations in stackit-cert-manager-webhook cert-manager will infinitely retry (without any exponential backoff!) the Challenge object.
This happens in all error cases where the returned error contains any changing details (e.g., timestamp, request ID, ...), as the error returned by the webhook is persisted in the Challenge object (status.reason) and cert-manager reconciles the entire Challenge object if it detects any change (including the entire .status!).
This is also stated here in a comment inside the acmechallenges Sync function. Sadly, this entire thing isn't properly documented anywhere else 😔
How can we reproduce this?
We've noticed this when someone tried to use a removed service account key, so:
- Create new project
- Create a new service account (no need to actually add it to a project)
- Create a new service account key (persist it for later use and delete it again)
- Deploy cert-manager
- Deploy
stackit-cert-manager-webhook(helm install stackit-cert-manager-webhook -n cert-manager stackit-cert-manager-webhook/stackit-cert-manager-webhook --set stackitSaAuthentication.enabled=trueand create thecert-manager/stackit-sa-authenticationsecret) - Create an issuer and certificate
Observe the issue:
- Check the events of the
Challengeresource kubectl get challenges.acme.cert-manager.io -w(see ~4 changes per second)- Check the cert-manager logs
- Check the
stackit-cert-manager-webhooklogs
Additional context
To properly fix this, we must sanitize every error case where we don't have control of the error. We can still log the "original" error, so we should just state the general thing that failed and optionally tell the user that they should check the stackit-cert-manager-webhook logs (e.g., "failed fetching zone. See the stackit-cert-manager-webhook logs for more details.").
Search
- I did search for other open and closed issues before opening this.
Code of Conduct
- I agree to follow this project's Code of Conduct