-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA: refactor PredicateChecker into ClusterSnapshot #7497
Open
towca
wants to merge
9
commits into
kubernetes:master
Choose a base branch
from
towca:jtuznik/dra-predicate-snapshot
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,190
−1,722
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
k8s-ci-robot
added
kind/cleanup
Categorizes issue or PR as related to cleaning up code, process, or technical debt.
cncf-cla: yes
Indicates the PR's author has signed the CNCF CLA.
size/XXL
Denotes a PR that changes 1000+ lines, ignoring generated files.
labels
Nov 14, 2024
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: towca The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
k8s-ci-robot
added
the
approved
Indicates a PR has been approved by an approver from all required OWNERS files.
label
Nov 14, 2024
towca
force-pushed
the
jtuznik/dra-predicate-snapshot
branch
2 times, most recently
from
November 14, 2024 15:34
ed9232e
to
27420ef
Compare
/hold |
k8s-ci-robot
added
the
do-not-merge/hold
Indicates that a PR should not merge because someone has issued a /hold command.
label
Nov 14, 2024
towca
force-pushed
the
jtuznik/dra-predicate-snapshot
branch
2 times, most recently
from
November 19, 2024 14:13
e377759
to
d84511f
Compare
k8s-ci-robot
added
the
needs-rebase
Indicates a PR cannot be merged because it has merge conflicts with HEAD.
label
Nov 19, 2024
/assign @BigDarkClown |
towca
force-pushed
the
jtuznik/dra-predicate-snapshot
branch
from
November 19, 2024 14:35
d84511f
to
d78b5d8
Compare
k8s-ci-robot
removed
the
needs-rebase
Indicates a PR cannot be merged because it has merge conflicts with HEAD.
label
Nov 19, 2024
towca
added a commit
to towca/autoscaler
that referenced
this pull request
Nov 20, 2024
towca
added a commit
to towca/autoscaler
that referenced
this pull request
Nov 20, 2024
DONOTSUBMIT
…hecker This decouples PredicateChecker from the Framework initialization logic, and allows creating multiple PredicateChecker instances while only initializing the framework once. This commit also fixes how CA integrates with Framework metrics. Instead of Registering them they're only Initialized so that CA doesn't expose scheduler metrics. And the initialization is moved from multiple different places to the Handle constructor.
To handle DRA properly, scheduling predicates will need to be run whenever Pods are scheduled in the snapshot. PredicateChecker always needs a ClusterSnapshot to work, and ClusterSnapshot scheduling methods need to run the predicates first. So it makes most sense to have PredicateChecker be a dependency for ClusterSnapshot implementations, and move the PredicateChecker methods to ClusterSnapshot. This commit mirrors PredicateChecker methods in ClusterSnapshot (with the exception of FitsAnyNode which isn't used anywhere and is trivial to do via FitsAnyNodeMatching). Further commits will remove the PredicateChecker interface and move the implementation under clustersnapshot. Dummy methods are added to current ClusterSnapshot implementations to get the tests to pass. Further commits will actually implement them. PredicateError is refactored into a broader SchedulingError so that the ClusterSnapshot methods can return a single error that the callers can use to distinguish between a failing predicate and other, unexpected errors.
PredicateSnapshot implements the ClusterSnapshot methods that need to run predicates on top of a SnapshotBase. testsnapshot pkg is introduced, providing functions abstracting away the snapshot creation for tests. ClusterSnapshot tests are moved near PredicateSnapshot, as it'll be the only "full" implementation.
…he SnapshotBase change
For DRA, this component will have to call the Reserve phase in addition to just checking predicates/filters. The new version also makes more sense in the context of PredicateSnapshot, which is the only context now.
towca
force-pushed
the
jtuznik/dra-predicate-snapshot
branch
from
November 21, 2024 18:48
d78b5d8
to
e4d5002
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
approved
Indicates a PR has been approved by an approver from all required OWNERS files.
area/cluster-autoscaler
cncf-cla: yes
Indicates the PR's author has signed the CNCF CLA.
do-not-merge/hold
Indicates that a PR should not merge because someone has issued a /hold command.
kind/cleanup
Categorizes issue or PR as related to cleaning up code, process, or technical debt.
size/XXL
Denotes a PR that changes 1000+ lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler.
To handle DRA properly, scheduling predicates/filters always need to be run whenever scheduling a pod to a node inside the snapshot (so that the DRA scheduler plugin can compute the necessary allocation). The way that the code is structured currently doesn't make this requirement obvious, and we risk future changes breaking DRA behavior (e.g. new logic that schedules pods inside the snapshot gets added, but doesn't check the predicates). This PR refactors the code so that running predicates is the default behavior when scheduling pods inside the snapshot.
Summary of changes:
Which issue(s) this PR fixes:
The CA/DRA integration is tracked in kubernetes/kubernetes#118612, this is just part of the implementation.
Special notes for your reviewer:
The first commit in the PR is just a squash of #7466 and #7479, and it shouldn't be a part of this review. The PR will be rebased on top of master after the others are merged.
This is intended to be a no-op refactor. It was extracted from #7350 after #7447, #7466, and #7479. This should be the last refactor PR, next ones will introduce actual DRA logic.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: