-
-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: error updating VM: received an HTTP 500 response - Reason: can't lock file '/var/lock/qemu-server/lock-101.conf' - got timeout #995
Comments
Same behavior, may be we can add add "sequentially" option, or parallelism setting ? Especially for slow disk setups, where cloning vm from template may pend for years |
@Qarasique Terraform and OpenTofu supports parallelism CLI argument that controls how many concurrent resource are applied in parallel. @CultureLinux Thanks for the report! The 20s timeout is suspicious, and I think other people mentioned something like that in the past, tho I cant' find reference. We definitely should take a look how it is propagated. This issue itself could also be specific to the underlying storage type. Other users had similar issues on ZFS backed storage in #831 and #868. So there could be some bottlenecks in IO on the PVE host, esp. if clonning image and VMs are on the same physical drive. |
A small update on this issue. I switch the storage destination from (ssh-front : directory type / ssd) to (lvm-lvm : LVM Thin / hdd).
Seems really weird since faster storage breaks and not the slow one. https://pve.proxmox.com/wiki/Storage#_storage_configuration Still an issue on my mind, but you tell me if it's also the case for you |
A final update ! Sorry for creating a ticket for this. |
I don't know, lads. It seems to me that you closed that issue prematurely. Just today tried to spin-up 7 VMs simultaneously via for_each loop and got sameish looking errors after just 2 were created. Terraform:
PVE GUI:
I can only say that all resources were created successfully when we set parallelism=2. Neither of solutions mentioned in this issue or in #868 is exactly a solutions but coincidentally we independently went with parallelism setting too as a current workaround.
It contradicts with what QEMU says in virtio-blk vs virtio-scsi issue but they didn't mention virtio-single so maybe it just didn't exist at that time? Those are default timeouts btw (at least as shown to me on plan and i did not specify any of them in my manifests) but our tasks crashed in less than a minute:
I do not understand that stuff completely myself but it looks like there's a 2 HW disk related settings - "scsi_hardware" and "disk.interface" and it's a bit confusing. When you attach a disk in GUI it clearly shows your chosen SCSI Controller type if disk's interface is "SCSI [scsi0]" but when you choose "Virtio Block [virtio0]" interface it shows nothing thought previously chosen SCSI Controller type still present. Does that mean that "virtio" is an interface and controller type at the same time and if you choose that then "scsi_hardware" just doesn't work in that case? No clue. I didn't want to open a new issue but all-in-all it looks like there's some kind of problem present still. For all it's worth it may be some kind of IO congestion because our SSD's are slow or because all clone operations are done on the same node (because we want these VMs on that specific node). It also may be related to older versions of terraform (1.5.7), bpg (0.46.1) and PVE (8.0.3) but i don't think so. On a side note, maybe default settings of a provider should be changed in correlation with PVE's defaults (since 7.3). Maybe not right now but i think this disparity will only grow with time. Talking about these:
|
Changing defaults is not super straightforward in the current implementation, and may lead to various side-effects. I'm planning to address most of it in #1231, and reduce provider-defined defaults to minimum. |
Hello,
First of all, thanks for your amazing provider 👍
Describe the bug
When cloning multiple vm, the execution get a 500 error to acquire lock even with all timeout_* set to 600
Using the same main.tf always works with only one vm(<20sec).
Tested on last versions of Opentofu/Terraform/Proxmox/bpg provider
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The apply part should stop after 600 secs and not 20 secs
Screenshots
Additional context
TF_LOG=DEBUG terraform apply
):The text was updated successfully, but these errors were encountered: