-
Notifications
You must be signed in to change notification settings - Fork 751
affinity: add policy, metrics collection and tests #9999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #9999 +/- ##
==========================================
+ Coverage 78.32% 78.41% +0.09%
==========================================
Files 511 511
Lines 68323 68419 +96
==========================================
+ Hits 53514 53651 +137
+ Misses 10923 10883 -40
+ Partials 3886 3885 -1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
01afe2f to
a463d78
Compare
Signed-off-by: lhy1024 <[email protected]>
a463d78 to
6cc401a
Compare
liyishuai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid overloading types.
Clarify "kind" instead of "level".
|
@liyishuai: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
liyishuai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing part in my previous change request
|
@liyishuai: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
After discussion with @liyishuai , this is the only place where |
| func (m *Manager) startAvailabilityCheckLoop() { | ||
| interval := getAvailabilityCheckInterval() | ||
| ticker := time.NewTicker(interval) | ||
| go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need a waitgroup to wait for this goroutine to exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't perform any critical operations, so could we add a TODO statement for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, pls add todolist avoid the running goroutine exist even if the server is closed .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
| defer m.RUnlock() | ||
| for _, storeID := range voterStoreIDs { | ||
| condition, ok := m.unavailableStores[storeID] | ||
| if ok && (!condition.affectsLeaderOnly() || storeID == leaderStoreID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If store 3 is not the leader but has the evit leader label, it should return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's reasonable. We do not check the evict leader for peers who are not the leader.
pkg/schedule/affinity/policy.go
Outdated
| m.RUnlock() | ||
|
|
||
| // Log after releasing lock | ||
| for _, entry := range logEntries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have 100 nodes that are unavailable, we need to log 100 logs every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs will only be generated whenever the overall status changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will change the print quantity.
okJiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
| } | ||
| newAvailability := maxCondition.groupAvailability() | ||
| if newAvailability != groupInfo.GetAvailability() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments to explain why when availability changing, unavailableStore != 0 represents that group has a new unavailableStore, otherwise represents group becomes health.
This contains a great deal of underlying logical relationships. Perhaps modifying the code could make this section simpler and easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor it
| } | ||
| newAvailability := maxCondition.groupAvailability() | ||
| if newAvailability != groupInfo.GetAvailability() { | ||
| groupAvailabilityChanges[groupInfo.ID] = newAvailability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes can contain oldAvailbility and newAvailability. And the log level can be confirmed by compare the oldAvailbility and newAvailability(compare can be wrapped to a function). This makes the code clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce unnecessary log
| if !isUnavailableStoresChanged { | ||
| m.RUnlock() | ||
| return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the store remains unchanged, the group will persist in a degraded state without transitioning to expired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always use GetAvailability to retrieve the availability, which contains the relevant processing logic. We don't update it using GetAvailability because it might be called within a read lock.
Signed-off-by: lhy1024 <[email protected]>
a6bacea to
544bcfe
Compare
[LGTM Timeline notifier]Timeline:
|
|
/test pull-unit-test-next-gen-2 |
10 similar comments
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/test pull-unit-test-next-gen-2 |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/test pull-integration-realcluster-test |
1 similar comment
|
/test pull-integration-realcluster-test |
ref tikv#9764 Signed-off-by: ti-chi-bot <[email protected]>
|
In response to a cherrypick label: new pull request created to branch |
| // which are split into three groups separated by degradedBoundary. | ||
| type storeCondition int | ||
|
|
||
| const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add a comment about the sequence matter.
|
|
||
| // GetNewAvailability uses the given unavailableStores to compute a new groupAvailability. | ||
| // Note that this function does not update runtimeGroupInfo. | ||
| func (g *runtimeGroupInfo) GetNewAvailability(unavailableStores map[uint64]storeCondition) groupAvailability { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge GetNewAvailability and GetAvailability?
| // GetNewAvailability uses the given unavailableStores to compute a new groupAvailability. | ||
| // Note that this function does not update runtimeGroupInfo. | ||
| func (g *runtimeGroupInfo) GetNewAvailability(unavailableStores map[uint64]storeCondition) groupAvailability { | ||
| maxCondition := storeAvailable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxCondition is not readable
| } | ||
|
|
||
| // checkStoresAvailability checks the availability status of stores and invalidates groups with unavailable stores. | ||
| func (m *Manager) checkStoresAvailability() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be checkGroupAvailability?
| affinityRegionCount.Set(float64(m.affinityRegionCount)) | ||
| } | ||
|
|
||
| func (m *Manager) generateUnavailableStores() map[uint64]storeCondition { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func (m *Manager) generateUnavailableStores() map[uint64]storeCondition { | |
| func (m *Manager) collectUnavailableStores() map[uint64]storeCondition { |
ref tikv#9764 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: 混沌DM <[email protected]>
ref tikv#9764 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: 混沌DM <[email protected]> Signed-off-by: lhy1024 <[email protected]>

What problem does this PR solve?
Issue Number: Ref #9764
What is changed and how does it work?
Check List
Tests
Release note