Skip to content

eks-controller: Nil pointer panic in updateComputeConfig + false drift on elasticLoadBalancing for non–Auto Mode clusters #2619

@demikl

Description

@demikl

Describe the bug

Two problems observed with the ACK EKS controller when EKS Auto Mode is not in use:

  1. Nil pointer panic (runtime error: invalid memory address or nil pointer dereference) in updateComputeConfig when the controller attempts an Auto Mode related update while .spec.computeConfig is absent (which is expected for non–Auto Mode clusters).
  2. Persistent reconcile drift because eks:DescribeCluster may include kubernetesNetworkConfig.elasticLoadBalancing.enabled = false while the Cluster CR spec omits elasticLoadBalancing entirely. The controller treats “absent” vs “false” as a real difference and invokes updateComputeConfig unnecessarily.

Details

  • Some EKS clusters’ eks:DescribeCluster output omits kubernetesNetworkConfig.elasticLoadBalancing. Others include: elasticLoadBalancing: { enabled: false }.
  • None of the clusters actually use EKS Auto Mode.
  • Because line triggering the update path checks any diff at Spec.ComputeConfig / Spec.StorageConfig / Spec.KubernetesNetworkConfig, a diff caused solely by elasticLoadBalancing (false vs absent) leads to updateComputeConfig invocation even when spec.computeConfig is nil.
  • This resulted in repeated reconcile attempts and a panic.
  • We rolled back to controller version v1.5.4 (pre Auto Mode support) to avoid crashes; v1.5.5 and 1.6+ exhibit the issue.
  • AWS Support ticket is open regarding the non-deterministic presence of elasticLoadBalancing in eks:DescribeCluster for clusters not using Auto Mode.

Impact

  • Crash loops (nil pointer) in updateComputeConfig
  • Unnecessary UpdateClusterConfig attempts
  • Forces divergent CR specs (adding elasticLoadBalancing.enabled: false to silence drift) even though feature is unused.

Expected Behavior

  • No panic when .spec.computeConfig is absent.
  • Absence of elasticLoadBalancing and elasticLoadBalancing.enabled=false should be treated as equivalent for non–Auto Mode clusters.
  • Auto Mode update path should run only when the cluster is actually in Auto Mode.

Suggested Direction

  • Introduce an explicit Auto Mode detection: treat cluster as Auto Mode only if spec.computeConfig.enabled == true (or a documented status/field from eks:DescribeCluster confirms it).
  • Skip updateComputeConfig entirely when not Auto Mode.
  • Normalize or ignore the “absent vs false” elasticLoadBalancing difference when Auto Mode is not enabled.
  • Document authoritative indicator of Auto Mode (if one exists) for controllers.

Environment

  • Some EKS clusters return eks:DescribeCluster without kubernetesNetworkConfig.elasticLoadBalancing.
  • Other clusters return the same section with: "elasticLoadBalancing": { "enabled": false }.
  • We do not have EKS Auto Mode enabled on any cluster.
  • An AWS Support ticket is open to understand why the API response is inconsistent and when (or if) the field should appear by default.
  • EKS Controller versions: v1.5.5 and v1.6+ (we rolled back to v1.5.4 to avoid crash loops).
  • Region: eu-west-3.
  • Kubernetes version/platform: 1.32 / eks.16.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions