[BUG] - Kubernetes node pool `node_quantity` state drift when auto-scaler enabled #472

AdamJacobMuller · 2024-03-13T22:01:51Z

Hi,

Describe the bug
If I create a vke cluster like:

resource "vultr_kubernetes" "k8" {
    region  = "ewr"
    label   = "vke-test"
    version = "v1.28.2+1"

    node_pools {
        node_quantity = 1
        plan          = "vc2-1c-2gb"
        label         = "vke-nodepool"
        auto_scaler   = true
        min_nodes     = 1
        max_nodes     = 2
    }
}

Every terraform runs, if my cluster has scaled up from 1 node to 2, terraform sees this and "fixes" node_quantity so that the cluster scales down to 1 node. The autoscaler then sees that 1 node is not enough and scales my cluster back to 2 nodes.

Very disruptive for workflows and workloads which are dependent on the autoscaler.

To Reproduce
Steps to reproduce the behavior:

create cluster with terraform with autoscaler
deploy enough workload to require the cluster to scale up to more than node_quantity node
run terraform plan/apply again
watch cluster scale down to node_quantity then back up to max_nodes (or whatever satisfies your workload)

Expected behavior

if auto_scaler == True:
set node_quantity and max_nodes only
else:
set node_quantity

Additional context

Thank you kindly.

The text was updated successfully, but these errors were encountered:

optik-aper · 2024-04-19T19:53:18Z

My feeling is that removing the value updates would create a workflow expectation which is too opinionated for a provider. Not only would you have to silence/ignore the quantity updates but, you'd have to ignore the new node_pools[...].nodes elements as well. That goes against the spirit of the provider.

Have you tried using the lifecycle rules to ignore_changes automatically? Here's an example for the two forms that a node pool resource takes in our provider:

resource "vultr_kubernetes" "k8" {
    region  = "ewr"
    label   = "vke-test"
    version = "v1.29.2+1"

    node_pools {
        node_quantity = 3
        plan          = "vc2-1c-2gb"
        label         = "vke-nodepool"
        auto_scaler   = true
        min_nodes     = 1
        max_nodes     = 3
    }

    lifecycle {
      ignore_changes = [node_pools]
    }
} 

resource "vultr_kubernetes_node_pools" "k8-np" {
  cluster_id = vultr_kubernetes.k8.id
  node_quantity = 3
  plan          = "vc2-1c-2gb"
  label         = "vke-nodepool-2"
  # auto_scaler   = true
  # min_nodes     = 1
  # max_nodes     = 3
 
  lifecycle {
    ignore_changes = [node_quantity]  
  }
}

With these settings, any updates that come to node_pools in the vultr_kubernetes resource are automatically added to the terraform state file. Same with the vultr_kubernetes_node_pools value for node_quantity.

AdamJacobMuller · 2024-04-24T20:42:51Z

Hi @optik-aper,

Thanks for the lifecycle tip, I didn't know you could do that.

Specifically doing ignore_changes=[node_pools[0].node_quantity] is great and solves my immediate issue.

With regards to the original issue, I still think the way this provider handles things is wrong though (and if you look at other providers for kubernetes clusters they seem to agree)

In my mind there are two modes for things:

A) you're using auto_scaler=true in which case you should specify min_nodes,max_nodes (and it should refuse to accept node_quantity)
B) you're using auto_scaler=false in which case you should specify node_quantity (and it should refuse to accept min_nodes,max_nodes)

This behaviour mirrors how GCP (just the one I'm most familiar with) works for example.

Also, keep in mind, this is also exactly how your web UI for example works right now. If I pick autoscale, I specify min/max, if I pick manual, I specify node quantity.

AdamJacobMuller added the bug label Mar 13, 2024

optik-aper self-assigned this Apr 15, 2024

optik-aper changed the title ~~[BUG] - vke - autoscaler and node_quantity conflict~~ [BUG] - Kubernetes node pool node_quantity state drift when auto-scaler enabled Apr 15, 2024

optik-aper added wontfix This will not be worked on and removed bug labels Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - Kubernetes node pool `node_quantity` state drift when auto-scaler enabled #472

[BUG] - Kubernetes node pool `node_quantity` state drift when auto-scaler enabled #472

AdamJacobMuller commented Mar 13, 2024

optik-aper commented Apr 19, 2024 •

edited

Loading

AdamJacobMuller commented Apr 24, 2024

[BUG] - Kubernetes node pool node_quantity state drift when auto-scaler enabled #472

[BUG] - Kubernetes node pool node_quantity state drift when auto-scaler enabled #472

Comments

AdamJacobMuller commented Mar 13, 2024

optik-aper commented Apr 19, 2024 • edited Loading

AdamJacobMuller commented Apr 24, 2024

[BUG] - Kubernetes node pool `node_quantity` state drift when auto-scaler enabled #472

[BUG] - Kubernetes node pool `node_quantity` state drift when auto-scaler enabled #472

optik-aper commented Apr 19, 2024 •

edited

Loading