Skip to content

aks-preview: Support VMSS agent pool VM size resize via nodepool update#9732

Open
wenhug wants to merge 2 commits intoAzure:mainfrom
wenhug:wenhuang/agentpool-vmsize-resize
Open

aks-preview: Support VMSS agent pool VM size resize via nodepool update#9732
wenhug wants to merge 2 commits intoAzure:mainfrom
wenhug:wenhuang/agentpool-vmsize-resize

Conversation

@wenhug
Copy link
Copy Markdown
Contributor

@wenhug wenhug commented Mar 26, 2026

Summary

Enable changing the VM size (SKU) of an existing VMSS-based agent pool via az aks nodepool update --node-vm-size <new-size>.

When the user changes the VM size of a VMSS node pool, the AKS RP performs a rolling upgrade:

  1. Surge new nodes with the target VM size
  2. Cordon and drain old nodes
  3. Delete old nodes

This is a preview feature that requires:

  • AFEC registration: Microsoft.ContainerService/AgentPoolVMSSResize
  • RP internal toggle: enable-agentpool-vmsize-resize (currently enabled for E2E + Canary)

The --node-vm-size parameter already existed on nodepool update for VirtualMachines pool autoscaler updates. This PR extends it to also work for VMSS pools.

Usage

# Resize VM size for a VMSS node pool
az aks nodepool update \
  -g MyResourceGroup \
  -n nodepool1 \
  --cluster-name MyManagedCluster \
  --node-vm-size Standard_D4s_v3

RP-side validation

The RP validates the resize request and blocks incompatible combinations:

  • DiskControllerType (SCSI vs NVMe)
  • CPU Architecture (x64 vs ARM64)
  • Confidential Computing (SNP)
  • Hypervisor Generation (V1 vs V2)
  • Combined with K8s version upgrade or node count change

Changes

  • agentpool_decorator.py: Add update_vm_size() method for VMSS pools and integrate it into update_agentpool_profile_preview()
  • _params.py: Mark --node-vm-size as is_preview=True for nodepool update
  • _help.py: Update help text and add VMSS resize CLI example
  • test_agentpool_decorator.py: Add unit tests for update_vm_size

Test plan

  • Unit tests added for update_vm_size (both Standalone and ManagedCluster modes)
  • E2E test exists in AKS RP repo (Scenario_VMSS_VMSize_Resize)
  • Manual validation with preview AFEC registration

Enable changing the VM size (SKU) of an existing VMSS-based agent pool
via `az aks nodepool update --node-vm-size <new-size>`. The RP performs
a rolling upgrade (surge new nodes, drain old, delete old) to replace
nodes with the new VM size.

This preview feature requires:
- AFEC registration: Microsoft.ContainerService/AgentPoolVMSSResize
- RP internal toggle: enable-agentpool-vmsize-resize

Changes:
- agentpool_decorator.py: add update_vm_size() for VMSS pools and call
  it in update_agentpool_profile_preview()
- _params.py: mark --node-vm-size as is_preview for nodepool update
- _help.py: update help text and add VMSS resize example
- test_agentpool_decorator.py: add unit tests for update_vm_size
Copilot AI review requested due to automatic review settings March 26, 2026 21:14
@azure-client-tools-bot-prd
Copy link
Copy Markdown

azure-client-tools-bot-prd bot commented Mar 26, 2026

❌Azure CLI Extensions Breaking Change Test
❌managedcleanroom
rule cmd_name rule_message suggest_message
1006 - ParaAdd managedcleanroom collaboration add-collaborator cmd managedcleanroom collaboration add-collaborator added parameter email please remove parameter email for cmd managedcleanroom collaboration add-collaborator
1007 - ParaRemove managedcleanroom collaboration add-collaborator cmd managedcleanroom collaboration add-collaborator removed parameter object_id please add back parameter object_id for cmd managedcleanroom collaboration add-collaborator
1007 - ParaRemove managedcleanroom collaboration add-collaborator cmd managedcleanroom collaboration add-collaborator removed parameter tenant_id please add back parameter tenant_id for cmd managedcleanroom collaboration add-collaborator
1007 - ParaRemove managedcleanroom collaboration add-collaborator cmd managedcleanroom collaboration add-collaborator removed parameter user_identifier please add back parameter user_identifier for cmd managedcleanroom collaboration add-collaborator
1006 - ParaAdd managedcleanroom collaboration create cmd managedcleanroom collaboration create added parameter consortium_type please remove parameter consortium_type for cmd managedcleanroom collaboration create
1006 - ParaAdd managedcleanroom collaboration create cmd managedcleanroom collaboration create added parameter user_identity please remove parameter user_identity for cmd managedcleanroom collaboration create
1007 - ParaRemove managedcleanroom collaboration create cmd managedcleanroom collaboration create removed parameter collaborators please add back parameter collaborators for cmd managedcleanroom collaboration create
⚠️ 1006 - ParaAdd managedcleanroom collaboration update cmd managedcleanroom collaboration update added parameter consortium_type
⚠️ 1006 - ParaAdd managedcleanroom collaboration update cmd managedcleanroom collaboration update added parameter user_identity
⚠️ 1006 - ParaAdd managedcleanroom consortium create cmd managedcleanroom consortium create added parameter consortium_type
⚠️ 1006 - ParaAdd managedcleanroom consortium update cmd managedcleanroom consortium update added parameter consortium_type

@azure-client-tools-bot-prd
Copy link
Copy Markdown

Hi @wenhug,
Please write the description of changes which can be perceived by customers into HISTORY.rst.
If you want to release a new extension version, please update the version in setup.py as well.

@yonzhan
Copy link
Copy Markdown
Collaborator

yonzhan commented Mar 26, 2026

Thank you for your contribution! We will review the pull request and get back to you soon.

@github-actions
Copy link
Copy Markdown
Contributor

The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR.

Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions).
After that please run the following commands to enable git hooks:

pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>

@github-actions
Copy link
Copy Markdown
Contributor

CodeGen Tools Feedback Collection

Thank you for using our CodeGen tool. We value your feedback, and we would like to know how we can improve our product. Please take a few minutes to fill our codegen survey

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

Hi @wenhug

Release Suggestions

Module: aks-preview

  • Update VERSION to 19.0.0b29 in src/aks-preview/setup.py

Notes

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends az aks nodepool update --node-vm-size to support resizing VMSS-based agent pools (preview), aligning CLI behavior with the RP’s rolling-replacement resize flow.

Changes:

  • Add a VMSS-aware update_vm_size() path to the nodepool update decorator and wire it into update_agentpool_profile_preview().
  • Mark --node-vm-size as a preview parameter for aks nodepool update and update CLI help text/examples accordingly.
  • Add unit tests validating update_vm_size() behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/aks-preview/azext_aks_preview/agentpool_decorator.py Adds update_vm_size() and invokes it during preview nodepool update assembly.
src/aks-preview/azext_aks_preview/_params.py Marks --node-vm-size as preview for aks nodepool update.
src/aks-preview/azext_aks_preview/_help.py Updates help text and adds a VMSS resize example.
src/aks-preview/azext_aks_preview/tests/latest/test_agentpool_decorator.py Adds unit tests for update_vm_size().

Comment on lines +1865 to +1871
# Skip for VirtualMachines pools - they handle VM size via autoscaler path
if self.context.get_vm_set_type() == CONST_VIRTUAL_MACHINES:
return agentpool

node_vm_size = self.context.raw_param.get("node_vm_size")
if node_vm_size:
agentpool.vm_size = node_vm_size
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_auto_scaler_properties() in this same decorator still raises InvalidArgumentValueError whenever raw_param['node_vm_size'] is not None for VMSS pools ("Updating VM size is not supported for virtual machine scale set agentpools."). With VMSS resize now supported via --node-vm-size, this guard will block the new behavior if update_auto_scaler_properties() is invoked as part of the update flow (e.g., by the default update profile). Consider removing or narrowing that validation so VMSS pools can accept --node-vm-size without throwing.

Suggested change
# Skip for VirtualMachines pools - they handle VM size via autoscaler path
if self.context.get_vm_set_type() == CONST_VIRTUAL_MACHINES:
return agentpool
node_vm_size = self.context.raw_param.get("node_vm_size")
if node_vm_size:
agentpool.vm_size = node_vm_size
vm_set_type = self.context.get_vm_set_type()
# Skip for VirtualMachines pools - they handle VM size via autoscaler path
if vm_set_type == CONST_VIRTUAL_MACHINES:
return agentpool
# Only apply direct VM size changes for VMSS pools
if vm_set_type == CONST_VIRTUAL_MACHINE_SCALE_SETS:
node_vm_size = self.context.raw_param.get("node_vm_size")
if node_vm_size:
# Apply the new VM size to the agent pool
agentpool.vm_size = node_vm_size
# Clear the raw_param value so downstream autoscaler validation
# (which still checks node_vm_size for VMSS) does not reject
# this supported VM size update.
self.context.raw_param["node_vm_size"] = None

Copilot uses AI. Check for mistakes.
vm_size="Standard_D4s_v3"
)
self.assertEqual(dec_agentpool_2, ground_truth_agentpool_2)

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new unit tests cover VMSS behavior, but they don't cover the VirtualMachines pool branch where update_vm_size() should be a no-op (VM size changes are handled via the autoscaler update path). Add a test case with type=CONST_VIRTUAL_MACHINES and node_vm_size set to ensure update_vm_size() does not mutate agentpool.vm_size for VMs pools.

Suggested change
# Test case 3: VirtualMachines pool with node_vm_size provided (should be no-op)
dec_3 = AKSPreviewAgentPoolUpdateDecorator(
self.cmd,
self.client,
{"node_vm_size": "Standard_D4s_v3"},
self.resource_type,
self.agentpool_decorator_mode,
)
agentpool_3 = self.create_initialized_agentpool_instance(
vm_size="Standard_D2s_v3"
)
# For VirtualMachines pools, update_vm_size should be a no-op; VM size changes are
# handled via the autoscaler update path.
agentpool_3.type = CONST_VIRTUAL_MACHINES
dec_3.context.attach_agentpool(agentpool_3)
dec_agentpool_3 = dec_3.update_vm_size(agentpool_3)
ground_truth_agentpool_3 = self.create_initialized_agentpool_instance(
vm_size="Standard_D2s_v3"
)
ground_truth_agentpool_3.type = CONST_VIRTUAL_MACHINES
self.assertEqual(dec_agentpool_3, ground_truth_agentpool_3)

Copilot uses AI. Check for mistakes.
Address review comments:

1. Remove the InvalidArgumentValueError in update_auto_scaler_properties()
   that blocked --node-vm-size for VMSS pools. This check was added when
   --node-vm-size only supported VirtualMachines pools, but now VMSS
   pools support VM size resize via rolling upgrade.

2. Add test case for VirtualMachines pool to verify update_vm_size() is
   a no-op (VMs pools handle VM size via the autoscaler update path).

3. Add HISTORY.rst entry for the new feature.
@wenhug
Copy link
Copy Markdown
Contributor Author

wenhug commented Mar 26, 2026

Addressed the two Copilot review comments:

Comment 1 (blocker in update_auto_scaler_properties): Good catch! Removed the InvalidArgumentValueError that rejected --node-vm-size for VMSS pools. That guard was added when --node-vm-size was VMs-pool-only, but now VMSS pools support VM size resize via rolling upgrade.

Comment 2 (VMs pool test): Added a third test case verifying update_vm_size() is a no-op for VirtualMachines pools (handles both Standalone and ManagedCluster decorator modes).

Also added a HISTORY.rst entry per the bot's request.

@FumingZhang
Copy link
Copy Markdown
Member

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Copy Markdown
Member

@FumingZhang FumingZhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


Pending
+++++++
* `az aks nodepool update`: Support `--node-vm-size` to resize VM size of an existing VMSS-based agent pool (preview). Requires AFEC registration `Microsoft.ContainerService/AgentPoolVMSSResize`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to bypass the feature flag validation via custom header? If so, please add a scenario test to ensure the change works as expected.

@FumingZhang
Copy link
Copy Markdown
Member

Please resolve merge conflict and rebase/merge from main to pass the CI checks, @wenhug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AKS Auto-Assign Auto assign by bot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants