Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding second NIC with ip_config hangs cloud_init / terraform ( tofu ) #1592

Open
VGerris opened this issue Oct 13, 2024 · 4 comments
Open
Labels
🐛 bug Something isn't working

Comments

@VGerris
Copy link

VGerris commented Oct 13, 2024

Describe the bug
When two networks are configured and the second has for example dhcp set, terraform doesn´t finish

To Reproduce
Steps to reproduce the behavior:

  1. Create a terraform file with snippet like:
initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    user_data_file_id = [proxmox_virtual_environment_file.cloud_config.id](http://proxmox_virtual_environment_file.cloud_config.id/)
  }

  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }
it works
When at first run also :
initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    user_data_file_id = [proxmox_virtual_environment_file.cloud_config.id](http://proxmox_virtual_environment_file.cloud_config.id/)
  }

  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }
  1. Run tofu apply
  2. VM gets created with 2 NIC
  3. Run tofu destroy
    Add second snippet for 2nd interface:
initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    # next part is added and applies to second NIC
    }
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }
  1. Run tofu apply
  2. See error - it hangs

Please also provide a minimal Terraform configuration that reproduces the issue.

initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    # next part is added and applies to second NIC
    }
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }
    
  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }

and the output of terraform|tofu apply.

VM creating ....

Expected behavior
I would expect tofu to continue, even though the IP may not be fetched.

Additional context
Add any other context about the problem here.

This may be related to cloud-init:
https://forum.proxmox.com/threads/assign-multiple-ip-to-vm-using-cloud-init.116259/

And the other provider may have something similar and a solution:
Telmate/terraform-provider-proxmox#1015

Idealy the IP gets given but when this is not possible because of how cloud-init works, it just continuing and showing the issue seems like a good solution

  • Single or clustered Proxmox: clustered
  • Proxmox version: 8.2
  • Provider version (ideally it should be the latest version): latest
  • Terraform/OpenTofu version: 1.8.3
  • OS (where you run Terraform/OpenTofu from): Ubuntu 24.04
  • Debug logs (TF_LOG=DEBUG terraform apply):
@VGerris VGerris added the 🐛 bug Something isn't working label Oct 13, 2024
@VGerris
Copy link
Author

VGerris commented Oct 13, 2024

Some additional findings.

It seems the network is configured by netplan and cloud-init puts the configuration in :

/etc/netplan/50-cloud-init.yaml

The snippet :
ip_config {
ipv4 {
address = "dhcp"
}
}

results in something like:
network:
version: 2
ethernets:
eth0:
match:
macaddress: "bc:24:11:c8:de:82"
dhcp4: true

When the second snippet is added in the main.tf file as described above, the cloud-init file gets the correct info in it for the second network and everything works fine after boot.

So the problem only occurs at first creation, not when adding it later.
That leads me to believe something in the code is not prepared to handle multiple network configs.
If I set debugging on, one of the last calls regarding networking seems:

https://github.com/bpg/terraform-provider-proxmox/blob/main/proxmoxtf/provider/provider.go#L251

I'm suspecting it may be there where the issue starts.
I am not familiar with Go, so I will see how far I get.

Did anyone else see and have this or better yet, can someone with Go knowledge see if the issue may start there ?

I could start by looking at what the nodeAddress is, can someone point to instructions on how to deploy the provider with updated code ? Thank you

@VGerris
Copy link
Author

VGerris commented Oct 14, 2024

I have been investigating further. Since the creation with terraform never finished and the settings I put in cloud-init did not give me access to the VM, I modified an image to have a root account so I can login at creation time.
Investigating the machine learned that the netplan config looked good, but somehow the network is not set properly to reach the internet, even though both NICs get a DHCP address ( from different servers ).

The terraform process is actually waiting for qemu-tools to reply. When I fix the network by using dhcpcd and install the package and start it, terraform continues and all looks good and as expected.

This seems to indicate that cloud-init somehow is not able to get routing proper when using 2 NICS but also that if that can be fixed in the cloud init script, it may be solvable. The best solution would be to be able to find why cloud-init has an issue completing properly and perhaps even fix it there but as linked above, some people say that it is not supposed to provide access for automation. I tend to disagree because my automation may be run from another net and the VM needs the internet ( which is what I have now and why I encountered this behavior ).

So far I have tried netplan apply and to add ipv6 = false without consistent success.
It would be great if anyone can help finding the network cause of this, then a possible workaround would be include the proper commands in the cloud-init script.

Another workaround I used before is to get the 2nd interface from terraform and then run Ansible to run dhcpcd on the interface, but that doesn´t 'stick' either.
In that case I get the NIC like this :

output "vm_nic_2_name" {
  value = proxmox_virtual_environment_vm.ubuntu_vm.network_interface_names[2]
}

ad then in script that runs Ansible :
sed "s/nic1_replace/$(tofu -chdir=$BASEDIR/terraform-proxmox output vm_nic_2_name | sed -nr 's|.*"(.*)".*|\1|p')/g" inventory_template1.yml > inventory.yml

which sets an Ansbile var that is used like :

    - name: Run dhcpcd on second NIC
      ansible.builtin.command: dhcpcd {{ nic_1 }}
      register: nic

I am gonna look a bit further into the best way to have the network configured properly and post, in the mean while, help and tips are appreciated :)

@VGerris
Copy link
Author

VGerris commented Oct 14, 2024

Based on info on netplan and some reading I found an acceptable work around.

In the cloud-config snippet, write to a file with netplan config:

    write_files:
      - path: /etc/netplan/99-network-config.yaml
        permissions: "0600"
        owner: root
        content: |
          network:
            version: 2
            ethernets:
              ens19:
                dhcp4: true
                match:
                  name: "ens19"
                mtu: 1500
                set-name: "eth1"

Then at the top of runcmd add:

    runcmd:
        - netplan apply
        - .....

Creation takes a bit longer and for some reason the apt update command too, but this configures both interfaces the same as with the double snippet, but with working internet and thus qemu-tools.

Perhaps good to add this to docs.
That's the best I can do for now, without spending tons of more time that is scarce currently :).

This relies on the name of the interface, I am not aware of a way to get the mac or name before so it can used dynamically, but it's good enough for me.

Any improvements are welcome and I can make a PR for the docs if that's appreciated.
Thank you all for maintaining this terraform provider, it is pretty awesome!

@VGerris
Copy link
Author

VGerris commented Oct 15, 2024

turns out there is something more needed because a route is added by default.
there is an option to skip that:

                dhcp4-overrides:
                  use-routes: false

Now when I use a snippet like:

    ip_config {
      ipv4 {
        address = "192.168.56.20/24"
        gateway = "192.168.56.1"
      }
    }

I also get a route set as default and as a consequence the same problem as with 2 dhcp snippets.
In this case the workaround is a bit simpler, to remove that route before anything:
runcmd:
- ip r del default via 192.168.56.1
- apt update

If the use-routes: false option can be made part of the resource:
https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_vm#ip_config
it may well be a solution for this behavior, by simply setting that option in the ip_config snippet.

As the documentation says, and probably better is to omit the gateway, then it does not add a route and everything works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
Status: 🕚 Sometime
Development

No branches or pull requests

1 participant