Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make maxRetries and maxSleep configurable in goleak #93

Open
kerthcet opened this issue Feb 3, 2023 · 9 comments · Fixed by #94
Open

Make maxRetries and maxSleep configurable in goleak #93

kerthcet opened this issue Feb 3, 2023 · 9 comments · Fixed by #94

Comments

@kerthcet
Copy link
Contributor

kerthcet commented Feb 3, 2023

This is for flexibility and users can adjust to their own scenarios. Or do we have any other concerns?
/assign

@prashantv
Copy link
Collaborator

Can you provide some more context on where you'd like to adjust the max retries/sleep behaviour? Is there a reason that the current defaults don't work well.

Rather than exposing the underlying values as-is, would it be better to expose a single value for total maximum timeout (or perhaps use a context)?

sywhang added a commit that referenced this issue Feb 13, 2023
Make goleak options more flexible to adapt to users' variety scenarios

Signed-off-by: Kante Yin <[email protected]>

fix  #93

---------

Signed-off-by: Kante Yin <[email protected]>
Co-authored-by: Sung Yoon Whang <[email protected]>
Co-authored-by: Abhinav Gupta <[email protected]>
@kerthcet
Copy link
Contributor Author

Can you provide some more context on where you'd like to adjust the max retries/sleep behaviour?

The problem is the detecting duration of time is not long enough. also see https://github.com/kubernetes/kubernetes/pull/115456/files#r1097270472

Yes, agree that implementation leak is not something smells good, I'm ok with exposing a single value like Timeout, the problem is we have too many factors here, maxRetries + maxSleep somehow behaves as timeout, a corner case is assuming maxRetries ** maxSleep = 2s, we set timeout to 5s, it still doesn't work.

So can we just remove the maxRetries and make the retry attempts up to non limit, but we will have a default timeout then, which will be exposed for configuration. cc @prashantv

@kerthcet
Copy link
Contributor Author

Any suggestion @prashantv ?

@abhinav
Copy link
Collaborator

abhinav commented Feb 17, 2023

One possible option re: @prashantv's point here:

Instead of exposing two parameters "maxRetries" and "maxSleep", perhaps we can expose a single RetryPolicy type/option?

Strawman:

type RetryPolicy interface {
  Attempt(num int) (delay time.Duration, ok bool)
}

Takes the attempt number, reports the sleep duration for that, or false if we shouldn't retry anymore.

The default retry policy will look something like this, that wouldn't be exported -- at least initially:

type defaultRetryPolicy struct {
	maxRetries int
	maxSleep time.Duration
}

func (p *defaultRetryPolicy) Attempt(i int) (time.Duration, bool) {
  	if i >= o.maxRetries {
		return 0, false
	}

	d := time.Duration(int(time.Microsecond) << uint(i))
	if d > o.maxSleep {
		d = o.maxSleep
	}
	return d, true
}

@prashantv @sywhang WDYT?

meta: We should probably re-open this issue if we're still discussing this.

@kerthcet
Copy link
Contributor Author

But this wouldn't solve my problem, isn't it? What I want is increasing the retry attempts or retry timeout.
so +1 with timeout option.
/reopen

@sywhang sywhang reopened this Feb 17, 2023
@sywhang
Copy link
Contributor

sywhang commented Feb 17, 2023

reopened the issue.

@abhinav I think that's probably a good strawman to begin with.
Not sure how I feel about the exact interface/naming (yet) but it's a good start.

@kerthcet would it not? If we exposed a WithRetryPolicy(RetryPolicy) that lets you specify the exact implementation of RetryPolicy to set your retry attempt count or timeout, i think that should be enough to give you what you need. Is there anything I'm missing here?

@kerthcet
Copy link
Contributor Author

Ah, got the idea, users can implement their own interface to determine whether to retry or not, it should work. One question, does it really make sense to set the retry attempts, how to determine the explicitly value?And yes, I used to expose these two values for I didn't want to change a lot of the package, I should reconsider this again at that moment. :(

@prashantv
Copy link
Collaborator

I like the general idea of a retry policy, though I think the most common use is going to be bumping the timeout, so having a default option to bump the timeout seems useful (and internally it determines how many attempts, sleep per-attempt, etc).

Couple of thoughts on the RetryPolicy interface:

  • Should it return a sleep time.Duration, or just rely on the function sleeping if it wants. Makes the interface just return a boolean about whether to retry: Retry(..) bool
  • Would it be useful to pass in the error for more flexible retry policies

sywhang added a commit that referenced this issue Oct 24, 2023
…ns (#94)" (#116)

This reverts commit 751da59.

Per discussion in #93, we don't want to release this change as-is.
@anitschke
Copy link

I also came here looking for a way to control the maxRetries and maxSleep, however my use case seems a little different than others mentioned in this issue.

It seem like most other people on this thread would like to extend the timeout to give go routines more time to join before considering it an issue.

In my case I was trying to fix a bug where a go routine was leaking for a long time but eventually joins. Before fixing the bug I tried to write a failing unit test that verifies the go routine has joined by using goleak.VerifyNone(t). However in this unit test I was surprised to find that the test passed even though I know that the go routine leaks! It turns out that my problem was that in my unit test the go routine was only leaking for a few hundred milliseconds. As a result goleak.VerifyNone(t) was initially finding the leak on the first try, but eventually it would retry and the leak would go away. I could artificially inflate the time that the go routine leaks for in the unit test to a few seconds, but the downside is that would also inflate testing time. I would prefer to have some option that disables retries all together.

From my point of view a go routine leaking even a few hundred milliseconds is still a logic error as there should be some code that waits for the go routine to property join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants