Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define own WatchErrorHandler for toolscache.Reflector #2987

Open
alexanderstephan opened this issue Oct 18, 2024 · 5 comments
Open

Define own WatchErrorHandler for toolscache.Reflector #2987

alexanderstephan opened this issue Oct 18, 2024 · 5 comments

Comments

@alexanderstephan
Copy link

alexanderstephan commented Oct 18, 2024

client-go sets a default handler for watching errors func DefaultWatchErrorHandler(r *Reflector, err error), see https://github.com/kubernetes/client-go/blob/23900f492969db045e8a36396852381fd189569b/tools/cache/reflector.go#L154. Right now, if it is included in the controller-runtime it will use its logger klog by default. This leads to inconsistent log output by default. Therefore, I propose that the controller-runtime should overwrite this method, i.e., use its own handler.

This handler could look something like this:

func WatchErrorHandler(r *toolscache.Reflector, err error) {
	log := ctrl.Log.WithValues("name", r.Name(), "description", r.TypeDescription())
	switch {
	case apierrors.IsResourceExpired(err) || apierrors.IsGone(err):
		// Don't set LastSyncResourceVersionUnavailable - LIST call with ResourceVersion=RV already
		// has a semantic that it returns data at least as fresh as provided RV.
		// So first try to LIST with setting RV to resource version of last observed object.
		log.Error(err, "watch closed")
	case err == io.EOF:
		// watch closed normally
	case err == io.ErrUnexpectedEOF:
		log.Error(err, "watch closed with unexpected EOF")
	default:
		utilruntime.HandleError(fmt.Errorf("%s: Failed to watch %v: %v", r.Name(), r.TypeDescription(), err))
	}
}

As you can see, we would use a logger with two fields of the reflector now. I exposed these fields recently (see kubernetes/client-go@ce42c29), so the controller-runtime can essentially redefine the default handler of client-go, as the default handler also logs these fields.

I would like to implement this, if it gets your approval.

@sbueringer
Copy link
Member

sbueringer commented Oct 18, 2024

If I see correctly this part is currently refactored upstream: kubernetes/kubernetes#127709

EDIT: Sorry linked the wrong PR, this is the right one: https://github.com/kubernetes/kubernetes/pull/126387/files#diff-9ccdf713e010f73dbebd01e936cb0077fc63e4f5ab941d865ded42da219d84ec

(cc @alvaroaleman)

@troy0820
Copy link
Member

I see your point. When implementing this PR to be able to override the WatchErrorHandler it didn't occur to me that the klog would get in the way of the controller-runtime's logging, etc. Implementing this seems doable and allowing this to be set to the default we provide or the overridden function in the cache options.

However, we run into the problem of trying to stay within parity with upstream not using the upstream default handler. Maintaining that seems like a lot of drift but still doable.

@alexanderstephan
Copy link
Author

Thanks, so my take-away is that this seems reasonable to implement.

@sbueringer So, I should wait until the refactoring is over to avoid merge conflicts? Or, maybe start and use the branch of the PR as basis for now?

@sbueringer
Copy link
Member

sbueringer commented Oct 22, 2024

I could be absolutely misunderstanding this, but I was wondering if the problem still exists if we use the error handler that takes a context (and the logger out of the context).

I would recommend waiting until the PR in upstream is merged and we bumped to a Kubernetes version that has the change. Then we can check if we can simply switch to the error handler that takes a context.

(if they merge the upstream PR we'll have the change in CR directly after the next Kubernetes alpha/beta release which shouldn't take long)

@alexanderstephan
Copy link
Author

Yes, okay seems good, let's wait then. Using the context here would also make sense to me. Also, we don't have this drifting problem that was mentioned before, from my understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants