Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(scaler): scale out by group and wait until pods are scheduled #4907

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

liubog2008
Copy link
Member

@liubog2008 liubog2008 commented Feb 24, 2023

What problem does this PR solve?

If there are 3 zones and 9 replicas of TiKV with PodSpreadConstraints, pods may not scheduled as expected. Because all pods of TiKV will be created at the same time and then they will not be scheduled in order.

So pods may be scheduled as below, and then TiKV will fail to scale down more than 1 replica.

  • zone-a: 0, 1, 2
  • zone-b: 3, 4, 5
  • zone-c: 6, 7, 8

What is changed and how does it work?

Code changes

  • Has Go code change
  • Has CI related scripts change

Tests

  • Unit test
  • E2E test
  • Manual test
  • No code

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.


@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@codecov-commenter
Copy link

codecov-commenter commented Feb 24, 2023

Codecov Report

Merging #4907 (c28550e) into master (136086c) will decrease coverage by 0.24%.
The diff coverage is 64.58%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4907      +/-   ##
==========================================
- Coverage   59.44%   59.21%   -0.24%     
==========================================
  Files         227      231       +4     
  Lines       25835    28974    +3139     
==========================================
+ Hits        15358    17157    +1799     
- Misses       9019    10282    +1263     
- Partials     1458     1535      +77     
Flag Coverage Δ
e2e 21.68% <29.16%> (?)
unittest 59.41% <65.21%> (-0.03%) ⬇️

@liubog2008
Copy link
Member Author

/test pull-e2e-kind

1 similar comment
@liubog2008
Copy link
Member Author

/test pull-e2e-kind

if updateReplicasAndDeleteSlots {
setReplicasAndDeleteSlotsByFinished(scalingOutFlag, newSet, oldSet, ordinals, finishedOrdinals)
} else {
resetReplicas(newSet, oldSet)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether resetReplicas(newSet, oldSet) is needed for return controller.RequeueErrorf("tikv.ScaleOut, cluster %s/%s ready to scale out, wait for next round", tc.GetNamespace(), tc.GetName()) in L122.

@csuzhangxc
Copy link
Member

Is this feature controlled by scalePolicy.scaleOutParallelism?

@ti-chi-bot
Copy link
Member

@liubog2008: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants