-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add ability to register runnables as pre-start hooks #2044
base: main
Are you sure you want to change the base?
Conversation
Hi @terinjokes. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
bebdc3e
to
782eb75
Compare
Can you elaborate? Why not use a sync.Once at the beginning of the reconciliation? What you want to do is definitely already possible today, albeit a bit cumbersome. I am not convinced this is a usecase common enough to warrant adding first class support. |
Thanks @alvaroaleman to taking a look at this PR. Unfortunately, I've found that using ErgonomicsCurrently the reconciler function has a clear scope: it processes a single request to reconcile objects in the cluster. This makes testing it very easy. By attempting to add more functionality to the reconciler function, we have made it more difficult to test: now we must ensure the once function is safe to run in tests, be able to swap the function for one that is safe, reuse a once already done across tests, or split our reconcile function into two. I'm not a fan of any of these options, they all complicate the testing of the controller. CorrectnessAttempting to use The We can attempt to work around this by using Unless one sets the controller option This becomes even harder to do correctly if you run multiple workers with
This allows other workers to continue past the once while the panic is being handled in the panic goroutine, and potentially take destructive actions due to incomplete work by the once function. This pull request, which is roughly 15 lines, runs the prestart hooks before starting workers, so there is no synchronization issues across workers to deal with. A prestart function returning error stops the controller, returning an error from the This follows the normal flow control for errors users of controller-runtime expect, and eliminates the possibility of handling the error case incorrectly.
I've seen this problem often when we need to compute things across the entire cluster before being able to reconcile individual resources. For the current project I'm working on, I need to compute address allocation maps so I don't allocate the same address twice and cause havoc in networking. I've had use cases where I would have used a hook, and due to their absence I usually end up using client-go instead, but that's been a miserable experience. As people continue to build on Kubernetes, especially when combined with Crossplane and kcp, I expect even more people needing to do things between winning a leader election and starting reconcilers. |
Right, a sync.Once doesn't work but you can make it work with a lock. I am open to the idea of running code after leader election was won and before a controller was started but I dislike the idea of tying that to the controller. I'd suggest that you implement something like controller-runtime/pkg/manager/manager.go Lines 339 to 344 in 5378660
manager as the manager is the entity that is responsible for managing the lifecycle of components.
Note that depending on your usecase a |
The easiest way to achieve what you are doing here (running code before a given controller starts) would btw be to construct one through |
I put it in the controller because the content of a prestart hook, in my experience, is tied to the content of the reconciler. The controller is responsible for starting and maintaining watches, and gives us a spot to run between cached being synced and the reconciler function. Putting it in the manager, which I also considered, means it is running separately from a controller setting up watches and waiting for caches to synchronize, and would have to do everything by directly calling the API server. This seems like a waste if we're also trying to cache the very same resources. It would also be a waste if the controller fails to synchronize the caches: it will exit with an error. |
No, it will start the caches first.
Be that as it may for your use case, but then the next person is going to come and want it independent of a specific controller and then we have two options and ppl will ask what to use when and I don't want that. Controller-Runtime is supposed to make the standard use case easy and the advanced use case possible. Running code in leader election before any controllers start is not possible today, doing it before a specific controller starts is by either wrapping it or using a syncingSource, hence no need for this. |
My understanding is that a controller's watches are started and caches as synced in the controller's controller-runtime/pkg/internal/controller/controller.go Lines 181 to 218 in 44c5d50
While the manager has a "Caches" runnable group, which would get started before leader election controllers, I can't find anything in controller-runtime that uses it. The only two structs that implement the
Since the watches are started and the caches synchronized by the controller, I do not understand your proposed solution of wrapping the controller. From my understanding. it's not currently possible to run code between the watches being started and synchronized and the reconcile function being called. This pull request adds that possibility. If I'm misunderstanding something, do you have psuedo-code to get me pointed the right direction by what you mean?
I'm happy to rename the function and expand the Godocs to reduce confusion. |
After the cache got started, getting any informer from it will result in an informer for that given resource being created if it doesn't exist yet. The client you get from the manager uses the cache for reading and autocreates any missing informers, meaning you can just use it after the cache was started and everything will work.
It is used by the
|
Okay, now I understand what you were trying to explain. I can take this pull request in two different directions now:
Do you have a preference? |
That sounds fine to me
No. The |
I disagree that this would be a radical change to builder: the use case is already there for its tests. But I'll table it and open a proposal at a future time. It's not needed for me right now since you've selected option one. |
782eb75
to
04eb9b5
Compare
Edit: I'm not entirely happy with the current way these hooks work, so I'm going to take another go at it. |
de6797a
to
3b513d7
Compare
This is ready for another review, when you have a moment. |
3b513d7
to
fea0031
Compare
Rebased changes on top of 0.17 and the tests are passing locally. /remove-lifecycle stale |
/retest |
e48fcc2
to
00fb43a
Compare
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I've rebased again onto the main branch. /remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
@terinjokes Can I help in any way to get this PR merged? It already has been a while since you started this PR but this feature is also needed for a situation I'm currently in. I could get some more hands if necessary. Please let me know, and thanks for your work :) |
I was about to rebase for 0.19, I've been busy the last 24 hours. Other than that, I'm not sure why it's not being reviewed or merged. |
When implementing a controller that uses leader election, there maybe be work that needs to be done after winning the election but before processing enqueued requests. For example, a controller may need to build up an internal mapping of the current state of the cluster before it can begin reconciling. This changeset adds support for adding prestart hooks to controller-runtime's manager implementation. This hook runs after the manager has been elected leader, immediately before the leader election controllers are started. Related kubernetes-sigs#607
3ea050d
to
07216dd
Compare
@terinjokes: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
When implementing a controller that uses leader election, there maybe be
work that needs to be done after winning the election but before
processing enqueued requests. For example, a controller may need to
build up an internal mapping of the current state of the cluster before
it can begin reconciling.
This changeset adds support for adding prestart hooks to
controller-runtime's manager implementation. This hook runs after the
manager has been elected leader, immediately before the leader election
controllers are started.
Related #607