-
Notifications
You must be signed in to change notification settings - Fork 4.2k
update node info processors to include unschedulable nodes #8520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: elmiko The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
i'm working on adding more unit tests for this behavior, but i wanted to share this solution so we could start talking about it. |
a0ebb28
to
3270172
Compare
i've rewritten this patch to use all nodes as the secondary value instead of using a new list of ready unschedulable nodes. |
i need to do a little more testing on this locally, but i think this is fine for review. |
// Last resort - unready/unschedulable nodes. | ||
for _, node := range nodes { | ||
// we want to check not only the ready nodes, but also ready unschedulable nodes. | ||
for _, node := range append(nodes, allNodes...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure that this is appropriate to append these. theoretically the allNodes
should already contain nodes
. i'm going to test this out using just allNodes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to filtering that happens in obtainNodeLists
, we need to combine both lists of nodes here.
3270172
to
cb2649a
Compare
i updated the argument names in the |
it seems like the update to the mixed node processor needs a little more investigation. |
cb2649a
to
fd53c0b
Compare
it looks like we need both the |
This change updates the `Process` function of the node info processor interface so that it can accept a second list of nodes. The second list contains all the nodes that are not in the first list. This will allow the mixed node info processor to properly detect unready and unschedulable nodes for use as templates.
fd53c0b
to
906a939
Compare
rebased |
@jackfrancis @towca any chance at a review here? |
// we want to check not only the ready nodes, but also ready unschedulable nodes. | ||
// this needs to combine readyNodes and allNodes due to filtering that occurs at | ||
// a higher level. | ||
for _, node := range append(readyNodes, allNodes...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
Isn't readyNodes
a subset of allNodes
? In which case this will range over the nodes in readyNodes
twice.
Also, how do we know the diff of allNodes
- readyNodes
are nodes of type Ready
+ Unschedulable
? Aren't there going to be other types of nodes not classified as readyNodes
in that set (for example, various flavors of NotReady
nodes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't readyNodes a subset of allNodes? In which case this will range over the nodes in readyNodes twice.
readyNodes
is not a pure subset of allNodes
, there is some filtering that occurs to remove some of the nodes in readyNodes
from allNodes
.
if you change this line to only use allNodes
you will see some unit tests fail.
Also, how do we know the diff of allNodes - readyNodes are nodes of type Ready + Unschedulable? Aren't there going to be other types of nodes not classified as readyNodes in that set (for example, various flavors of NotReady nodes)?
i did look at how readyNodes
and allNodes
are created, there is filtering that happens in this function https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/static_autoscaler.go#L993
i have run a series of tests using allNodes
and also using a version of this patch that specifically creates a readyUnschedulableNodes
list. after running both tests, and putting them through CI, i am more convinced that using allNodes
here is the appropriate thing to do. adding a new lister for "ready unschedulable" nodes did not change the results of my testing, and it makes the code more complicated. this is why i went to using the allNodes
approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't there going to be other types of nodes not classified as readyNodes in that set (for example, various flavors of NotReady nodes)?
one more point about this, looking at the function in the mixed node info processor, we can see that there are other conditions than just "ready unschedulable" that are checked for. i think the original intent of this function was to look at all nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@towca does this seem like the right path forward to you as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is some filtering that occurs to remove some of the nodes in readyNodes from allNodes.
Should we be more precise about how we append allNodes
to readyNodes
to avoid duplicates? Or are we confident that the effort to de-dup is equivalent or more costly than duplicate processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted a bit of archaeology and found the PR that added the filtering by readiness to this logic: #72. This is pretty bizarre, because even way back then it seems that this logic would only see Ready Nodes:
ScaleUp()
gets passedreadyNodes
:scaledUp, err := ScaleUp(autoscalingContext, unschedulablePodsToHelp, readyNodes, daemonsets) - Which it then passes straight through to this logic:
nodeInfos, err := GetNodeInfosForGroups(nodes, context.CloudProvider, context.ClientSet,
In any case, IMO the most readable change would be to:
- Start passing
allNodes
instead ofreadyNodes
toTemplateNodeInfoProvider.Process()
without changing the signature. This is what the interface definition suggests anyway. - At the beginning of
MixedTemplateNodeInfoProvider.Process()
, group the passedallNodes
into good and bad candidates utilizingisNodeGoodTemplateCandidate()
. Then iterate over the good ones in the first loop, and over the bad ones in the last loop.
This should work because:
readyNodes
should be a subset ofallNodes
. So the logic should see all the same Nodes as before + additional ones. This is a bit murky because the Node lists are modified byCustomResourcesProcessor
after being listed.CustomResourcesProcessor
implementations should only remove Nodes fromreadyNodes
and hack their Ready condition inallNodes
. This is what the in-tree implementations do, if an out-of-tree implementations breaks the assumption they might not be subsets but IMO this isn't a supported case and such implementations should be ready for things breaking.- The set of conditions checked in
isNodeGoodTemplateCandidate()
(ready && stable && schedulable && !toBeDeleted) is a superset of conditions by whichreadyNodes
are filtered fromallNodes
(ready && schedulable). Both places use the samekube_util.GetReadinessState()
function for determining theready
part, the schedulable part is just checking the same field on the Node. - Based on the two above if a Node is in
allNodes
, but notreadyNodes
,isNodeGoodTemplateCandidate()
should always returnfalse
for it. So all the Nodes fromallNodes - readyNodes
should be categorized as bad candidates like we want. - And if a Node is in
readyNodes
, it should also be inallNodes
and the result ofisNodeGoodTemplateCandidate()
should be identical for both versions. So the good candidates determined fromallNodes
viaisNodeGoodTemplateCandidate()
should be exactly the same as determined fromreadyNodes
like we do now.
Does that make sense? @x13n could you double-check my logic here?
@elmiko IIUC you attempted something like this and got unit test failures? Could you describe what kind? I could definitely see the tests just being too coupled to the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be more precise about how we append allNodes to readyNodes to avoid duplicates? Or are we confident that the effort to de-dup is equivalent or more costly than duplicate processing?
at this point in the processing, i don't think the duplicates is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elmiko IIUC you attempted something like this and got unit test failures? Could you describe what kind? I could definitely see the tests just being too coupled to the current implementation.
i will need to run the unit tests again, but essentially, if only allNodes
is used in the final clause of the mixed node infos processor's Process
function, then a few tests fail. my impression is that the filtering occurring with the custom node processor or the processor that filters out nodes with startup taints is causing the issues.
i can certainly take another look at passing only allNodes
to Process
. i didn't want to break anything else though XD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one of the main failures i see when changing to use only allNodes
:
generated by running go test ./...
in the cluster-autoscaler/core
directory.
--- FAIL: TestScaleUpToMeetNodeGroupMinSize (0.00s)
orchestrator_test.go:1684:
Error Trace: /home/mike/dev/kubernetes-autoscaler/cluster-autoscaler/core/scaleup/orchestrator/orchestrator_test.go:1684
Error: Received unexpected error:
could not compute total resources: No node info for: ng1
Test: TestScaleUpToMeetNodeGroupMinSize
orchestrator_test.go:1685:
Error Trace: /home/mike/dev/kubernetes-autoscaler/cluster-autoscaler/core/scaleup/orchestrator/orchestrator_test.go:1685
Error: Should be true
Test: TestScaleUpToMeetNodeGroupMinSize
orchestrator_test.go:1686:
Error Trace: /home/mike/dev/kubernetes-autoscaler/cluster-autoscaler/core/scaleup/orchestrator/orchestrator_test.go:1686
Error: Not equal:
expected: 1
actual : 0
Test: TestScaleUpToMeetNodeGroupMinSize
panic: runtime error: index out of range [0] with length 0 [recovered]
panic: runtime error: index out of range [0] with length 0
goroutine 81 [running]:
testing.tRunner.func1.2({0x2acda20, 0xc0003fd5a8})
/home/mike/sdk/go1.24.0/src/testing/testing.go:1734 +0x21c
testing.tRunner.func1()
/home/mike/sdk/go1.24.0/src/testing/testing.go:1737 +0x35e
panic({0x2acda20?, 0xc0003fd5a8?})
/home/mike/sdk/go1.24.0/src/runtime/panic.go:787 +0x132
k8s.io/autoscaler/cluster-autoscaler/core/scaleup/orchestrator.TestScaleUpToMeetNodeGroupMinSize(0xc000d16e00)
/home/mike/dev/kubernetes-autoscaler/cluster-autoscaler/core/scaleup/orchestrator/orchestrator_test.go:1687 +0x11fa
testing.tRunner(0xc000d16e00, 0x2e31b28)
/home/mike/sdk/go1.24.0/src/testing/testing.go:1792 +0xf4
created by testing.(*T).Run in goroutine 1
/home/mike/sdk/go1.24.0/src/testing/testing.go:1851 +0x413
FAIL k8s.io/autoscaler/cluster-autoscaler/core/scaleup/orchestrator 0.044s
--- FAIL: TestDeltaForNode (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x246c737]
goroutine 98 [running]:
testing.tRunner.func1.2({0x27e4140, 0x4c18f50})
/home/mike/sdk/go1.24.0/src/testing/testing.go:1734 +0x21c
testing.tRunner.func1()
/home/mike/sdk/go1.24.0/src/testing/testing.go:1737 +0x35e
panic({0x27e4140?, 0x4c18f50?})
/home/mike/sdk/go1.24.0/src/runtime/panic.go:787 +0x132
k8s.io/autoscaler/cluster-autoscaler/simulator/framework.(*NodeInfo).Node(...)
/home/mike/dev/kubernetes-autoscaler/cluster-autoscaler/simulator/framework/infos.go:66
k8s.io/autoscaler/cluster-autoscaler/core/scaleup/resource.(*Manager).DeltaForNode(0xc000ab1ce0, 0xc000aa6008, 0x0, {0x30df800, 0xc000703700})
/home/mike/dev/kubernetes-autoscaler/cluster-autoscaler/core/scaleup/resource/manager.go:64 +0x57
k8s.io/autoscaler/cluster-autoscaler/core/scaleup/resource.TestDeltaForNode(0xc000103500)
/home/mike/dev/kubernetes-autoscaler/cluster-autoscaler/core/scaleup/resource/manager_test.go:79 +0x5e5
testing.tRunner(0xc000103500, 0x2ddcaa0)
/home/mike/sdk/go1.24.0/src/testing/testing.go:1792 +0xf4
created by testing.(*T).Run in goroutine 1
/home/mike/sdk/go1.24.0/src/testing/testing.go:1851 +0x413
FAIL k8s.io/autoscaler/cluster-autoscaler/core/scaleup/resource 0.029s
? k8s.io/autoscaler/cluster-autoscaler/core/test [no test files]
ok k8s.io/autoscaler/cluster-autoscaler/core/utils 0.025s
FAIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, looking at this failure again. i think it's due to my change in the test.
i can put together a patch like this and give it some tests. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR adds a new lister for ready unschedulable nodes, it also connects that lister to a new parameter in the node info processors
Process
function. This change enables the autoscaler to use unschedulable, but otherwise ready, nodes as a last resort when creating node templates for scheduling simulation.Which issue(s) this PR fixes:
Fixes #8380
Special notes for your reviewer:
I'm not sure if this is the best way to solve this problem, but i am proposing this for further discussion and design.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: