-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
owner: fix bug of owner manager hang #58622
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @tiancaiamao. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@@ -364,13 +364,13 @@ func (m *ownerManager) campaignLoop(campaignContext context.Context) { | |||
select { | |||
case <-m.etcdSes.Done(): | |||
m.logger.Info("etcd session done, refresh it") | |||
if err2 := m.refreshSession(util2.NewSessionRetryUnlimited, ManagerSessionTTL); err2 != nil { | |||
if err2 := m.refreshSession(campaignContext, util2.NewSessionRetryUnlimited, ManagerSessionTTL); err2 != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core of the fix.
We should pass the campaignContext
.
The code is:
campaignContext, cancel := context.WithCancel(m.ctx)
go campaignLoop(campaignContext)
cancel()
If we use m.ctx
in the campaign loop, cancel()
cannot cancel the ctx!
ownerMgr.etcdSes.Close() | ||
|
||
// This should not hang. | ||
ownerMgr.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the fix, the test hang... because the old code use wrong ctx and cancel in owner manager Close() does make campaign loop break.
@tiancaiamao: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #58622 +/- ##
================================================
+ Coverage 73.1072% 73.5190% +0.4117%
================================================
Files 1676 1676
Lines 463352 463975 +623
================================================
+ Hits 338744 341110 +2366
+ Misses 103774 102004 -1770
- Partials 20834 20861 +27
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a bug, we purposed do that, see
Lines 439 to 444 in 42d4fae
// Note: we must use manager's context to create session. If we use campaign | |
// context and the context is cancelled, the created session cannot be closed | |
// as session close depends on the context. | |
// One drawback is that when you want to break the campaign loop, and the campaign | |
// loop is refreshing the session, it might wait for a long time to return, it | |
// should be fine as long as network is ok, and acceptable to wait when not. |
it's ok to wait, except on shutdown where the context should be cancelled
/hold |
What problem does this PR solve?
Issue Number: close #58620
Problem Summary:
What changed and how does it work?
The owner compaign loop use wrong ctx for refressSession, so
owner.Close()
may hang sometimes.Before the fix, the behavior is:
Call
Close()
may hang, unless we pass actx
toNewOwnerManager
and cancel that ctx:This is not what I expected. Close() should not hang!
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.