Skip to content

Conversation

@pahud
Copy link
Contributor

@pahud pahud commented Nov 12, 2025

Issue # (if applicable)

Closes #35914.

Reason for this change

Users cannot create EKS managed node groups with newer GPU and accelerator instance types (Trainium TRN1/TRN1N/TRN2 and P5/P5E/P5EN series). The isGpuInstanceType() validation function does not recognize these instance types, causing validation errors that block deployment.

Description of changes

Updated the isGpuInstanceType() function in packages/aws-cdk-lib/aws-eks/lib/private/nodegroup.ts to recognize 6 additional GPU and accelerator instance types:

Trainium Instances (AWS Neuron accelerators):

  • TRN1, TRN1N, TRN2

P5 Series GPUs (NVIDIA H100):

  • P5, P5E, P5EN

The implementation adds these instance classes to the existing knownGpuInstanceTypes array, maintaining logical grouping (P-series together, TRN-series as new group). This enables proper AMI type selection (AL2023_X86_64_NEURON for Trainium, AL2023_X86_64_NVIDIA for P5).

Before:

cluster.addNodegroupCapacity('TrainiumNodes', {
  instanceTypes: [new ec2.InstanceType('trn1.2xlarge')],
});
// Error: The specified AMI does not match the instance types architecture

After:

cluster.addNodegroupCapacity('TrainiumNodes', {
  instanceTypes: [new ec2.InstanceType('trn1.2xlarge')],
});
// Works correctly - AL2023_X86_64_NEURON AMI automatically selected

cluster.addNodegroupCapacity('P5Nodes', {
  instanceTypes: [new ec2.InstanceType('p5.48xlarge')],
});
// Works correctly - AL2023_X86_64_NVIDIA AMI automatically selected

Describe any new or updated permissions being added

N/A - This change only updates client-side validation logic. No IAM permissions or resource access changes.

Description of how you validated changes

  • Unit tests: Added 6 new instance types to the GPU instance type test array in nodegroup.test.ts. All 365 unit tests pass with zero regressions.
  • Integration tests: Not required - this is a validation-only change. Existing integration test patterns (integ.eks-inference-nodegroup.ts) demonstrate the correct usage for accelerator instances.
  • Build verification: TypeScript compilation, linting, and full build all pass successfully.
  • Test coverage: Maintained at 48.61% (pre-existing baseline).

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

- Add support for newer AWS GPU instance types (P5, P5E, P5EN)
- Include Trainium (TRN1, TRN1N, TRN2) instance classes in GPU detection
- Update comments to clarify GPU instance type categories
- Add references to AWS documentation for GPU instance types
- Expand test coverage for new GPU instance types
Improves GPU instance type detection for EKS nodegroups to support the latest AWS machine learning and graphics-focused instances.
@aws-cdk-automation aws-cdk-automation requested a review from a team November 12, 2025 18:45
@github-actions github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p2 labels Nov 12, 2025
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Nov 12, 2025
@pahud pahud marked this pull request as ready for review November 12, 2025 18:47
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug This issue is a bug. contribution/core This is a PR that came from AWS. effort/medium Medium work item – several days of effort p2 pr/needs-maintainer-review This PR needs a review from a Core Team Member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(aws-eks): Update isGpuInstanceType to support more recent instance types

2 participants