KEP-3953: Node Resource Hot Plug #3955

Karthik-K-N · 2023-04-17T08:24:44Z

One-line PR description: Node Resource Hot Plug

Issue link: Node Resource Hot Plug #3953

Other comments:

bart0sh · 2023-04-17T19:49:59Z

/assign @mrunalp @SergeyKanzhelev @klueska

kad · 2023-04-28T16:15:22Z

/cc

ffromani · 2023-05-18T07:29:48Z

/cc

keps/sig-node/3953-dynamic-node-resize/README.md

fmuyassarov · 2023-05-25T11:59:23Z

/cc

keps/sig-node/3953-dynamic-node-resize/README.md

keps/sig-node/3953-node-resource-hot-plug/README.md

kannon92

PRR shadow:

The PRR looks good for alpha.
Thank you for answering more than you needed.

We still need sig approval but the PRR is looking good.

Karthik-K-N · 2025-10-03T17:23:49Z

PRR shadow:

The PRR looks good for alpha. Thank you for answering more than you needed.

We still need sig approval but the PRR is looking good.

Thank you for the review, I have updated the KEP address your review comments and Yes looking for SIG-Node review as well.

ffromani · 2025-10-06T07:19:13Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+   - https://github.com/kubernetes/kubernetes/issues/125579
+   - https://github.com/kubernetes/kubernetes/issues/127793
+
+Hence, it is necessary to handle capacity updates gracefully across the cluster, rather than resetting the cluster components to achieve the same outcome.


I think is worth acknowledging the (slowly?) growing baremetal userbase. There is a lively subset of users using kube on baremetal, often for critical usecases (easy example: telcos). On baremetal, restarting a node takes nontrivial amount of time (minutes on big machines). Restarting the kubelet in these cases causes nontrivial disruptions.

Thank you for pointing out , I have added it to the KEP.

keps/sig-node/3953-node-resource-hot-plug/README.md

ffromani · 2025-10-06T07:32:12Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+#### Story 2
+
+As a Kubernetes Application Developer, I want the kernel to optimize system performance by making better use of local resources when a node is resized, so that my applications run faster with fewer disruptions. This is achieved when there are


On which cases me as Kubernetes Application Developer may I would do that? this feels to me more like a correction of provisioned resources. This feels me more like we are trying to emphasize vertical scalability vs horizontal scalability? Perhaps we want to explore the interaction with in-place pod resize? (not sure you did below)

I agree that this story is loosely tied to Application Developer, I have updated the story to relate with the Application Performance Analyst.

ffromani · 2025-10-06T07:35:04Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+As a Cluster administrator, I want to resize a Kubernetes node dynamically, so that I can quickly hot plug resources without waiting for new nodes to join the cluster.
+


I'm still very sympatethic with the proposal and I personally like it, but still I feel this very angle is not strong enough. Yes, we have bugs. Yes, these are annoying. But echoing the above comment from @thockin , these are bugs we should fix anyway and these are improvements we should have anyway. A safer and faster kubelet restart is a win also if we implement resource hotplug

ffromani · 2025-10-06T07:37:27Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+### Notes/Constraints/Caveats (Optional)
+
+### Risks and Mitigations


Excellent point. I think the assumption is that adding resources is purely additive (e.g. cpuids don't change, you get more cpuids like appending to a slice, existing ones keep their meaning). This should be called out explicitly though as assumption (If it's indeed an assumption)

ffromani · 2025-10-06T08:45:44Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+2. Identify Nodes Affected by Hotplug:
+   * By flagging a Node as being impacted by hotplug, the Cluster Autoscaler could revert to a less reliable but more conservative "scale from 0 nodes" logic.
+
+Given that this KEP and autoscaler are inter-related, the above approaches were discussed in the community with relevant stakeholders, and have decided approaching this problem through the former route. 


I guess "former" is appoach 1? let's call it out unambiguously ("approaching this problem using approach 1")

Yep, Explicitly mentioned it now to avoid confusion.

ffromani · 2025-10-06T08:46:46Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+is lesser than the initial capacity of the node. This is only to point at the fact that the resources have shrunk on the node and may need attention/intervention.
+
+Once the node has transitioned to the NotReady state, it will be reverted to the ReadyState once when the node's capacity is reconfigured to match or exceed the last valid configuration.
+In this case, valid configuration refers to a state which can either be previous hot-plug capacity or the initial capacity in case there was no history of hotplug.


Would a kubelet restart make the node transition to Ready again?

Yes, It will transition to Ready state though we store Node's Initial Allocatable Values in Node object once the kubelet restarts the initial values will be current values of Node.
This is interim solution till we support hotunplug.

And, it is be possible that at this point some of the pods get "removed" from the node if they don't fit anymore

keps/sig-node/3953-node-resource-hot-plug/README.md

deads2k · 2025-10-07T14:59:28Z

Thank you for handling the hot unplug case.

PRR lgtm, rest is up to the sig.

/approve

k8s-ci-robot · 2025-10-07T14:59:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: deads2k, Karthik-K-N
Once this PR has been reviewed and has the lgtm label, please ask for approval from mrunalp. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Co-authored-by: kishen-v <[email protected]>

keps/sig-node/3953-node-resource-hot-plug/README.md

marquiz · 2025-10-10T14:10:16Z

I think the assumption is that adding resources is purely additive (e.g. cpuids don't change, you get more cpuids like appending to a slice, existing ones keep their meaning). This should be called out explicitly though as assumption (If it's indeed an assumption)

Good remark. Maybe we should check if any cpu (ids) have been or memory of a (numa)node has decreased, at least if the topologymanager and friends have been enabled(?)

swatisehgal · 2025-10-14T11:35:45Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+As a Cluster administrator, I want to resize a Kubernetes node dynamically, so that I can quickly hot plug resources without waiting for new nodes to join the cluster.
+


yeah I think the biggest gap is the timing and mechanism of the kubelet restart to have it work seemlessly. Like, in cloud environment a node admin can scale up the node, but would need to know when to restart the kubelet. the cloud sdk could do so, but it doesn't necessarily know kube is running there. on the bare metal side, If someone goes to a rack and hot plugs something, there is nothing that would react to that. Having kubelet be reactive to these changes means less manual work.

I do think in principle the kubelet could just be restarted though. It just wouldn't be seemless

This is a good point and should be captured in the KEP.

swatisehgal · 2025-10-14T11:50:34Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+As the hot-unplug events are not completely handled in this KEP, in such cases, it is imperative to move the node to the NotReady state when the current capacity of the node
+is lesser than the initial capacity of the node. This is only to point at the fact that the resources have shrunk on the node and may need attention/intervention.
+
+Once the node has transitioned to the NotReady state, it will be reverted to the ReadyState once when the node's capacity is reconfigured to match or exceed the last valid configuration.


In case of a hot-unplug event, I’d be curious to understand (and think we should capture here) what would happen to already running workloads if there are no longer enough resources available to accommodate them.

In case of already running workloads and if there are not enough resources available to accommodate them post hot-unplug, the workload may tend to under perform or transition to "Pending" state or get migrated to a suitable node which meets the workload`s resource requirement.

updated the same in KEP as well. Thank you.

Pods that don't "fit in" anymore will be removed from the node. AFAIU, currently with kubelet restart, the kubelet iterates over pods from oldest to newest and kicks out pods that don't fit. I think with this KEP we have the opportunity to do improve the heuristics, taking into account pod priority, pod QoS class etc.

Co-authored-by: kishen-v <[email protected]>

keps/sig-node/3953-node-resource-hot-plug/README.md

haircommander · 2025-10-16T17:35:18Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+            `min(0, 1000 - (1000*containerMemoryRequest)/initialMemoryCapacity)`
+
+- Post up-scale any failure in resync of Resource managers may be lead to incorrect or rejected allocation, which can lead to underperformed or rejected workload.
+  - To mitigate the risks adequate tests should be added to avoid the scenarios where failure to resync resource managers can occur.


how are we going to add tests for this? it's not clear to me how we'll programatically change the resources for automated tests right now. May need to consult with SIG testing?

Hey @haircommander,  The idea we have right now for unit testing is by mock the sysfs structure and modifying the online file. We could structure the tests around the same. For example, flipping the /sys/bus/cpu/devices/cpuX/online to 0 or 1 enables or disables the CPU and the same is expected to be caught by the kubelet during a poll cycle.
Ref:
CPU Hotplug: https://docs.kernel.org/core-api/cpu_hotplug.html#using-cpu-hotplug
Memory Hotplug: https://docs.kernel.org/admin-guide/mm/memory-hotplug.html#onlining-and-offlining-memory-blocks

haircommander · 2025-10-16T17:36:55Z

keps/sig-node/3953-node-resource-hot-plug/README.md

+
+- Handling downsize events
+  - Though, there is no support through this KEP to handle an event of node-downsize, it's the onus of the cluster administrator to resize responsibly to avoid disruption as it lies out of the kubernetes realm.
+  - However, in a situation of downsize an error mode is returned by the kubelet and the node is marked as `NotReady`.


If an admin downsizes then increases resources after, does the node return to ready or stay in not ready until it's restarted? I feel the latter but should probably expand

https://github.com/kubernetes/enhancements/pull/3955/files#diff-c0c31e580715a3a0cec7d9e856ddfb490c3c3fa8e30d53cf4b79610767763b21R263

The current PoC implementation returns the node to Ready state if the resources (or more) become available again. Node is put into NotReady states but other parts of kubelet are not aware of the change (e.g. resource managers are not re-initialized etc). This kind of matches the existing behavior of kubelet not being aware of changes in node capacity.

Example of resizing a resource capacity: 10 (Ready) -> 5 (NotReady) -> 20 (Ready) -> 10 (NotReady) -> 15 (NotReady) -> 20 (Ready).

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Apr 17, 2023

k8s-ci-robot requested review from dchen1107 and derekwaynecarr April 17, 2023 08:24

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 17, 2023

Karthik-K-N mentioned this pull request Apr 17, 2023

Node Resource Hot Plug #3953

Open

4 tasks

Karthik-K-N force-pushed the node-resize branch from a7bc843 to 03e927f Compare April 17, 2023 08:27

Karthik-K-N changed the title ~~Dynamic node resize~~ KEP-3953: Dynamic node resize Apr 17, 2023

k8s-ci-robot assigned klueska, mrunalp and SergeyKanzhelev Apr 17, 2023

k8s-ci-robot requested a review from kad April 28, 2023 16:15

Karthik-K-N force-pushed the node-resize branch from 03e927f to 0c0214a Compare May 17, 2023 12:49

k8s-ci-robot requested a review from ffromani May 18, 2023 07:29

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2023

pacoxu reviewed May 23, 2023

View reviewed changes

keps/sig-node/3953-dynamic-node-resize/README.md Outdated Show resolved Hide resolved

pacoxu reviewed May 23, 2023

View reviewed changes

keps/sig-node/3953-dynamic-node-resize/README.md Outdated Show resolved Hide resolved

Karthik-K-N force-pushed the node-resize branch from 0c0214a to d98d71a Compare May 23, 2023 12:54

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 23, 2023

k8s-ci-robot requested a review from fmuyassarov May 25, 2023 11:59

Karthik-K-N force-pushed the node-resize branch from d98d71a to f6aadcc Compare July 25, 2023 14:36

44past4 mentioned this pull request Sep 22, 2023

Support node allocatable and capacity managed by external controller kubernetes/kubernetes#120833

Closed

sftim reviewed Sep 23, 2023

View reviewed changes

keps/sig-node/3953-dynamic-node-resize/README.md Show resolved Hide resolved

Karthik-K-N force-pushed the node-resize branch from 13467ac to a65903d Compare September 25, 2025 04:24

Add marquiz as co-authour

5da9621

Karthik-K-N mentioned this pull request Sep 30, 2025

KEP-5578: Node Resource Hot-Unplug #5585

Open

kannon92 reviewed Oct 3, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

kannon92 reviewed Oct 3, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

kannon92 reviewed Oct 3, 2025

View reviewed changes

Address PRR review comments

c3f9c1c

ffromani reviewed Oct 6, 2025

View reviewed changes

Address review comments

92c53c6

Karthik-K-N force-pushed the node-resize branch from 00410c8 to 92c53c6 Compare October 6, 2025 13:20

deads2k reviewed Oct 7, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

Address review comments

2b119d4

Co-authored-by: kishen-v <[email protected]>

marquiz reviewed Oct 10, 2025

View reviewed changes

Address review commits

157a1ca

swatisehgal reviewed Oct 14, 2025

View reviewed changes

Address review comments

5b27492

Co-authored-by: kishen-v <[email protected]>

Karthik-K-N force-pushed the node-resize branch from bbe64f0 to 64df6c4 Compare October 15, 2025 12:03

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 15, 2025

Update KEP.yaml

5621db9

Karthik-K-N force-pushed the node-resize branch from 64df6c4 to 5621db9 Compare October 15, 2025 12:09

haircommander reviewed Oct 16, 2025

View reviewed changes

keps/sig-node/3953-node-resource-hot-plug/README.md Outdated Show resolved Hide resolved

haircommander reviewed Oct 16, 2025

View reviewed changes

Address review comments

93decdc

kannon92 mentioned this pull request Oct 20, 2025

self nominate kannon92 for production readiness approval #5662

Open


		#### Story 2

		As a Kubernetes Application Developer, I want the kernel to optimize system performance by making better use of local resources when a node is resized, so that my applications run faster with fewer disruptions. This is achieved when there are

		As a Cluster administrator, I want to resize a Kubernetes node dynamically, so that I can quickly hot plug resources without waiting for new nodes to join the cluster.


		### Notes/Constraints/Caveats (Optional)

		### Risks and Mitigations

KEP-3953: Node Resource Hot Plug #3955

Are you sure you want to change the base?

KEP-3953: Node Resource Hot Plug #3955

Conversation

Karthik-K-N commented Apr 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bart0sh commented Apr 17, 2023

Uh oh!

kad commented Apr 28, 2023

Uh oh!

ffromani commented May 18, 2023

Uh oh!

Uh oh!

Uh oh!

fmuyassarov commented May 25, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kannon92 left a comment

Choose a reason for hiding this comment

Uh oh!

Karthik-K-N commented Oct 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deads2k commented Oct 7, 2025

Uh oh!

k8s-ci-robot commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marquiz commented Oct 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kishen-v Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Karthik-K-N commented Apr 17, 2023 •

edited

Loading

kishen-v Oct 17, 2025 •

edited

Loading