-
Notifications
You must be signed in to change notification settings - Fork 1.2k
File-based disk-only VM snapshot with KVM as hypervisor #10632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@blueorangutan package |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10632 +/- ##
============================================
+ Coverage 16.41% 16.58% +0.17%
- Complexity 13629 13940 +311
============================================
Files 5702 5743 +41
Lines 503405 509243 +5838
Branches 60976 61886 +910
============================================
+ Hits 82626 84472 +1846
- Misses 411594 415334 +3740
- Partials 9185 9437 +252
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch. |
@blueorangutan package |
@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13204 |
@rohityadavcloud @sureshanaparti @weizhouapache could we run the CI? |
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-13177)
|
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch. |
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch. |
@blueorangutan package |
@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13686 |
@blueorangutan package |
@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 13710 |
@blueorangutan package |
@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 13715 |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 13740 |
@blueorangutan package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoaoJandre, outstanding work! I haven’t gone through all the code yet, nor have I performed more advanced tests. However, here are the basic ones I’ve already done.
Tests descriptions
The following tests were performed to verify basic disk-only VM snapshot creation, deletion and reversion workflows:
- Deployed a VM
- Created snapshot
s1
- Created snapshot
s2
- Tried to revert the VM to the
s1
snapshot. An error was thrown, as expected, because the VM was in theRunning
state - Successfully reverted the VM to the
s1
snapshot - Initialized the VM again and created snapshot
s3
- Stopped the VM and reverted it to the snapshot
s2
- Performed the above step for the
s1
ands3
snapshots - Deleted snapshot
s1
. Since it had two children (s2
ands3
), verified it it was marked asHidden
in the DB and it remained stored in the storage:
select * from vm_snapshots where uuid = '76a90156-a9c8-4ff4-a99b-ae5473c36923'\G
*************************** 1. row ***************************
id: 27
uuid: 76a90156-a9c8-4ff4-a99b-ae5473c36923
name: i-2-26-VM_VS_20250615141733
display_name: s1
description: NULL
vm_id: 26
account_id: 2
domain_id: 1
service_offering_id: 1
vm_snapshot_type: Disk
state: Hidden
parent: NULL
current: 0
update_count: 8
updated: 2025-06-15 14:22:37
created: 2025-06-15 14:17:33
removed: NULL
1 row in set (0.001 sec)
- Deleted the
s2
snapshot - Reverted the
s3
snapshot - Created snapshot
s4
and successfully reverted the VM to it
The following tests were performed to verify volume resizing behavior alongside disk-only VM snapshots:
- Deployed a VM
- Created snapshot
s1
- Successfully resized the VM's root disk from 8 GiB to 20 GiB
qemu-img info -U 777614fc-47bd-4d77-a904-19ad3a817eb1
image: 777614fc-47bd-4d77-a904-19ad3a817eb1
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 3.38 MiB
cluster_size: 65536
backing file: /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/66cbefff-a06b-4ea9-b3e0-aa7dacc0301a
backing file format: qcow2
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
refcount bits: 16
corrupt: false
extended l2: false
Child node '/file':
filename: 777614fc-47bd-4d77-a904-19ad3a817eb1
protocol type: file
file length: 3.44 MiB (3604480 bytes)
disk size: 3.38 MiB
- Created snapshot
s2
- Reverted the VM to the
s1
snapshot and verified that the root disk size changed accordingly - Reverted the VM to the
s2
snapshot and verified that the root disk size changed accordingly
The following tests were performed to verify the behavior of disk-only VM snapshots with VMs with multiple volumes.
- Verified that it was not possible to attach new volumes to VMs that already had VM snapshots
- Deployed a VM with a root disk and data disk
- Wrote data to both disks and created snapshot
s1
- Verified that it was not possible to detach the data disk, because the VM already had VM snapshots
- Wrote data to both disks and created snapshot
s2
- Stopped the VM and reverted to
s1
- Stopped the VM and reverted to
s2
Other tests:
- Verified that it was not possible to create VM snapshots with memory when the VM already had disk-only snapshots
- After deleting the disk-only VM snapshots, verified that it was possible to create VM snapshots with memory
- Verified that it was not possible to create disk-only VM snapshots when the VM already had VM snapshots with memory
- Verified the creation of disk-only VM snapshots with the
Quiesce Instance
parameter defined astrue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoaoJandre, when testing some more advanced workflows of creation, reversion and deletion of disk-only VM snapshots, I created the following scenario:
- Created snapshots
s1
,s2
ands3
- Reverted to snapshot
s2
and created snapshots4
Then, I deleted snapshot s2
and verified that it was marked as Hidden
in the DB. Deleted snapshot s1
and reverted to snapshot s3
:
When reverting to the s3
snapshot, the operation failed:
The logs seem to indicate an error when creating the new delta for the volume:
2025-06-15 15:29:11,565 DEBUG [resource.wrapper.LibvirtRevertDiskOnlyVMSnapshotCommandWrapper] (AgentRequest-Handler-1:[]) (logid:59368419) Creating new delta for volume [4b80abb2-562c-4325-b68f-3655e933daeb] as part of the disk-only VM snapshot revert process for VM [i-2-14-VM].
2025-06-15 15:29:11,565 DEBUG [utils.script.Script] (AgentRequest-Handler-1:[]) (logid:59368419) Executing command [qemu-img create -f qcow2 -F qcow2 -b /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/d0dd008e-9458-4995-b9ca-e0f41bb73556 /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/f9fa34c3-993a-47d1-a7aa-6c08ec23f9dd ].
2025-06-15 15:29:11,584 WARN [utils.script.Script] (AgentRequest-Handler-1:[]) (logid:59368419) Execution of process [8348] for command [qemu-img create -f qcow2 -F qcow2 -b /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/d0dd008e-9458-4995-b9ca-e0f41bb73556 /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/f9fa34c3-993a-47d1-a7aa-6c08ec23f9dd ] failed.
2025-06-15 15:29:11,584 DEBUG [utils.script.Script] (AgentRequest-Handler-1:[]) (logid:59368419) Exit value of process [8348] for command [qemu-img create -f qcow2 -F qcow2 -b /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/d0dd008e-9458-4995-b9ca-e0f41bb73556 /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/f9fa34c3-993a-47d1-a7aa-6c08ec23f9dd ] is [1].
2025-06-15 15:29:11,584 WARN [utils.script.Script] (AgentRequest-Handler-1:[]) (logid:59368419) Process [8348] for command [qemu-img create -f qcow2 -F qcow2 -b /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/d0dd008e-9458-4995-b9ca-e0f41bb73556 /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/f9fa34c3-993a-47d1-a7aa-6c08ec23f9dd ] encountered the error: [qemu-img: /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/f9fa34c3-993a-47d1-a7aa-6c08ec23f9dd: Could not open backing file: Could not open backing file: Could not open '/mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/4b80abb2-562c-4325-b68f-3655e933daeb': No such file or directoryCould not open backing image.].
2025-06-15 15:29:11,584 ERROR [resource.wrapper.LibvirtRevertDiskOnlyVMSnapshotCommandWrapper] (AgentRequest-Handler-1:[]) (logid:59368419) Exception while reverting disk-only VM snapshot for VM [i-2-14-VM]. Deleting leftover deltas. org.apache.cloudstack.utils.qemu.QemuImgException: qemu-img: /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/f9fa34c3-993a-47d1-a7aa-6c08ec23f9dd: Could not open backing file: Could not open backing file: Could not open '/mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/4b80abb2-562c-4325-b68f-3655e933daeb': No such file or directoryCould not open backing image.
at org.apache.cloudstack.utils.qemu.QemuImg.create(QemuImg.java:268)
at org.apache.cloudstack.utils.qemu.QemuImg.create(QemuImg.java:200)
at org.apache.cloudstack.utils.qemu.QemuImg.create(QemuImg.java:297)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRevertDiskOnlyVMSnapshotCommandWrapper.execute(LibvirtRevertDiskOnlyVMSnapshotCommandWrapper.java:71)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRevertDiskOnlyVMSnapshotCommandWrapper.execute(LibvirtRevertDiskOnlyVMSnapshotCommandWrapper.java:43)
--
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
2025-06-15 15:29:11,591 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-1:[]) (logid:59368419) Seq 1-6557804007404863671: { Ans: , MgmtId: 169551026600426, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.Answer":{"result":"false","details":"Exception: org.apache.cloudstack.utils.qemu.QemuImgException
Message: qemu-img: /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/f9fa34c3-993a-47d1-a7aa-6c08ec23f9dd: Could not open backing file: Could not open backing file: Could not open '/mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/4b80abb2-562c-4325-b68f-3655e933daeb': No such file or directoryCould not open backing image.
Stack: org.apache.cloudstack.utils.qemu.QemuImgException: qemu-img: /mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/f9fa34c3-993a-47d1-a7aa-6c08ec23f9dd: Could not open backing file: Could not open backing file: Could not open '/mnt/10d28cdf-71a7-33ad-802e-f4ec9042e4fd/4b80abb2-562c-4325-b68f-3655e933daeb': No such file or directoryCould not open backing image.
at org.apache.cloudstack.utils.qemu.QemuImg.create(QemuImg.java:268)
at org.apache.cloudstack.utils.qemu.QemuImg.create(QemuImg.java:200)
at org.apache.cloudstack.utils.qemu.QemuImg.create(QemuImg.java:297)
Below are uploaded the full workflow execution logs:
@bernardodemarco thank you for the tests, I'll check it out as soon as possible. |
@blueorangutan package |
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 13793 |
@bernardodemarco I've fixed the reported errors and validated that the use case you reported is working. Could you check? |
@blueorangutan package |
@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13819 |
@sureshanaparti could we run the CI here? |
@blueorangutan LLtest |
@DaanHoogland a [LL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Description
This PR implements the spec available at #9524. For more information regarding it, please read the spec.
Furthermore, the following changes that are not contemplated in the spec were added:
snapshot.merge.timeout
agent property was added. It is only considered iflibvirt.events.enabled
is true;libvirt.events.enabled
is true, ACS will register to gather events from Libvirt and will collect information on the process, providing a progress report in the logs. If the configuration is false, the old process is used;Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Basic Tests
I created a test VM to carry out the tests below. Additionally, after performing the relevant operations, the VM's XML and the storage were checked to observe if the snapshots existed.
Snapshot Creation
The tests below were also repeated with the VM stopped.
Snapshot Reversion
Snapshot Removal
Advanced Tests
Deletion Test
All tests were carried out with the VM stopped.
The snapshot was marked as hidden and was not removed from storage.
Snapshot s3 was removed normally. Snapshot s2 was merged with snapshot s4.
Snapshot s4 was marked as hidden and was not removed from storage.
Snapshot s5 was removed normally. Snapshot s4 was merged with the delta of the VM's volume.
Reversion Test
Snapshot s1 was marked as hidden and was not removed from storage.
Concurrent Test
I created 4 VMs and took a VM snapshot of each. Then, I instructed to remove them all at the same time. All snapshots were removed simultaneously and successfully.
Test with Multiple Volumes
I created a VM with one datadisk and attached 8 more datadisks (10 volumes in total), took two VM snapshots, and then instructed to remove one at a time. The snapshots were removed successfully.
Tests Changing the
snapshot.merge.timeout
ConfigTests Related to Volume Resize with Disk-Only VM Snapshots on KVM
qemu-img info
qemu-img info
The last two tests were repeated on a VM with several snapshots, so that a merge between snapshots was performed. The result was the same.
Tests Related to Events:
cloud.usage_event
table that the resize event was correctly triggered, and it was also observed via GUI that the account's resource limit was updated.