Skip to content

Conversation

@lhy1024
Copy link
Contributor

@lhy1024 lhy1024 commented Dec 4, 2025

What problem does this PR solve?

Issue Number: Ref #9764

What is changed and how does it work?

Check List

Tests

  • Unit test

Release note

None.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 4, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 4, 2025
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 74.28571% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.41%. Comparing base (8c350af) to head (66fb84f).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9999      +/-   ##
==========================================
+ Coverage   78.32%   78.41%   +0.09%     
==========================================
  Files         511      511              
  Lines       68323    68419      +96     
==========================================
+ Hits        53514    53651     +137     
+ Misses      10923    10883      -40     
+ Partials     3886     3885       -1     
Flag Coverage Δ
unittests 78.41% <74.28%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Dec 5, 2025
@lhy1024 lhy1024 marked this pull request as ready for review December 9, 2025 07:43
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 9, 2025
Copy link
Contributor

@liyishuai liyishuai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid overloading types.
Clarify "kind" instead of "level".

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 9, 2025

@liyishuai: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Avoid overloading types.
Clarify "kind" instead of "level".

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@liyishuai liyishuai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing part in my previous change request

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 9, 2025

@liyishuai: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

missing part in my previous change request

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 9, 2025

Avoid overloading types. Clarify "kind" instead of "level".

After discussion with @liyishuai , this is the only place where logEntry is needed, and there are no immediate plans for expansion. If a series of similar entries at the same level are added in the future, we can follow this approach by adding a kind to distinguish them and then move them into logutil.

func (m *Manager) startAvailabilityCheckLoop() {
interval := getAvailabilityCheckInterval()
ticker := time.NewTicker(interval)
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need a waitgroup to wait for this goroutine to exit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't perform any critical operations, so could we add a TODO statement for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pls add todolist avoid the running goroutine exist even if the server is closed .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

defer m.RUnlock()
for _, storeID := range voterStoreIDs {
condition, ok := m.unavailableStores[storeID]
if ok && (!condition.affectsLeaderOnly() || storeID == leaderStoreID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If store 3 is not the leader but has the evit leader label, it should return an error?

Copy link
Member

@HunDunDM HunDunDM Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's reasonable. We do not check the evict leader for peers who are not the leader.

m.RUnlock()

// Log after releasing lock
for _, entry := range logEntries {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have 100 nodes that are unavailable, we need to log 100 logs every time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs will only be generated whenever the overall status changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will change the print quantity.

Copy link
Member

@okJiang okJiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use switch instead of if-else in this function?

Image

}
}
newAvailability := maxCondition.groupAvailability()
if newAvailability != groupInfo.GetAvailability() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments to explain why when availability changing, unavailableStore != 0 represents that group has a new unavailableStore, otherwise represents group becomes health.

This contains a great deal of underlying logical relationships. Perhaps modifying the code could make this section simpler and easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor it

}
newAvailability := maxCondition.groupAvailability()
if newAvailability != groupInfo.GetAvailability() {
groupAvailabilityChanges[groupInfo.ID] = newAvailability
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes can contain oldAvailbility and newAvailability. And the log level can be confirmed by compare the oldAvailbility and newAvailability(compare can be wrapped to a function). This makes the code clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce unnecessary log

Comment on lines +159 to +213
if !isUnavailableStoresChanged {
m.RUnlock()
return false, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the store remains unchanged, the group will persist in a degraded state without transitioning to expired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always use GetAvailability to retrieve the availability, which contains the relevant processing logic. We don't update it using GetAvailability because it might be called within a read lock.

Signed-off-by: lhy1024 <[email protected]>
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 10, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-12-10 07:42:33.545372313 +0000 UTC m=+1026898.359149885: ☑️ agreed by bufferflies.
  • 2025-12-10 08:01:49.223760121 +0000 UTC m=+1028054.037537703: ☑️ agreed by okJiang.

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

10 similar comments
@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-unit-test-next-gen-2

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/retest

2 similar comments
@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/retest

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/retest

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-integration-realcluster-test

1 similar comment
@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 10, 2025

/test pull-integration-realcluster-test

@ti-chi-bot ti-chi-bot bot merged commit f0b6d8b into tikv:master Dec 10, 2025
39 of 42 checks passed
@lhy1024 lhy1024 deleted the affinity-pr2C branch December 10, 2025 13:27
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Dec 10, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #10047.

// which are split into three groups separated by degradedBoundary.
type storeCondition int

const (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add a comment about the sequence matter.


// GetNewAvailability uses the given unavailableStores to compute a new groupAvailability.
// Note that this function does not update runtimeGroupInfo.
func (g *runtimeGroupInfo) GetNewAvailability(unavailableStores map[uint64]storeCondition) groupAvailability {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge GetNewAvailability and GetAvailability?

// GetNewAvailability uses the given unavailableStores to compute a new groupAvailability.
// Note that this function does not update runtimeGroupInfo.
func (g *runtimeGroupInfo) GetNewAvailability(unavailableStores map[uint64]storeCondition) groupAvailability {
maxCondition := storeAvailable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxCondition is not readable

}

// checkStoresAvailability checks the availability status of stores and invalidates groups with unavailable stores.
func (m *Manager) checkStoresAvailability() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be checkGroupAvailability?

affinityRegionCount.Set(float64(m.affinityRegionCount))
}

func (m *Manager) generateUnavailableStores() map[uint64]storeCondition {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (m *Manager) generateUnavailableStores() map[uint64]storeCondition {
func (m *Manager) collectUnavailableStores() map[uint64]storeCondition {

HunDunDM added a commit to HunDunDM/pd that referenced this pull request Dec 11, 2025
lhy1024 added a commit to lhy1024/pd that referenced this pull request Dec 14, 2025
ref tikv#9764

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: 混沌DM <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants