Skip to content

Add Support for Migration Statistics API Call#43

Open
phip1611 wants to merge 11 commits intocyberus-technology:gardenlinuxfrom
phip1611:poc-migration-statistics
Open

Add Support for Migration Statistics API Call#43
phip1611 wants to merge 11 commits intocyberus-technology:gardenlinuxfrom
phip1611:poc-migration-statistics

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Nov 25, 2025

TL;DR

This extends the internal API with a vm_progress function and adds a vm.migration-progress HTTP endpoint including support in ch-remote via ch-remote migration-process to query the latest migration progress.

Motivation

Monitoring a live-migration with live updated information is very important for debugging, development, and monitoring. This is something that verbose logging can't achieve as there is a clear desire to have that structured information somewhere on the outside - and also to prevent spammy logs.

The most interesting part is the pre-copy phase where we get information on each new memory iteration. The first version is rather coarse-grained with one update per memory iteration. More to follow.

The ch driver in libvirt will use these information to populate its
virsh domjobinfo information.

Further, the endpoint will be interesting to query information about a
previously failed or canceled live migration.

Prerequisites

The two major pre-requisites were:

Steps to Merge

  • test it locally using ch-remote
  • add libvirt-tests testcase and verify everything works
  • finish refactoring for "non-blocking send-migration" (2026-02-03)
  • deploy it on a node in SAP land and see if it works
  • merge this when we know the libvirt part is also fine (I however think that this is 99% already)

@phip1611 phip1611 self-assigned this Nov 25, 2025
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from e29356d to 4101b11 Compare November 28, 2025 08:46
@phip1611 phip1611 force-pushed the poc-migration-statistics branch 5 times, most recently from f49e577 to fdf5858 Compare December 9, 2025 11:42
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from f612cf6 to e9a3321 Compare December 15, 2025 14:24
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from e9a3321 to ecb5f45 Compare January 8, 2026 14:31
@phip1611 phip1611 changed the base branch from gardenlinux-v48 to gardenlinux January 8, 2026 14:32
@phip1611 phip1611 force-pushed the poc-migration-statistics branch 3 times, most recently from e6c80dd to fe8cd0d Compare January 12, 2026 16:30
@phip1611 phip1611 marked this pull request as ready for review January 12, 2026 16:30
@phip1611 phip1611 changed the title WIP XXX Migration Statistics Add Support for Migration Statistics API Call Jan 12, 2026
@phip1611 phip1611 marked this pull request as draft January 12, 2026 16:35
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from fe8cd0d to a87fe95 Compare January 12, 2026 16:41
@phip1611 phip1611 requested a review from tpressure January 12, 2026 16:42
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from a87fe95 to 6d68c0c Compare January 12, 2026 16:43
/// [live-migration protocol]: super::protocol
#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
pub struct MigrationProgressAndStatus {
/// UNIX timestamp of the start of the live-migration process.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please note that this structure will be public API for ever and once we reploy it, it will be hard to change

@phip1611 phip1611 force-pushed the poc-migration-statistics branch from 6d68c0c to c934e0e Compare January 13, 2026 10:36
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from ceb915e to 5b63ea5 Compare January 22, 2026 14:27
Copy link

@amphi amphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few small comments and nits.

vmm/src/lib.rs Outdated
// Give management software a chance to fetch the migration state.
// The VMM already executes on the other side and keeping Cloud Hypervisor running for a
// couple of more seconds is fine.
info!("Sleeping five seconds before shutting off.");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this waiting, and maybe also the mark_as_finished() a few lines above, should happen in check_migration_result. Otherwise the management software would see the mark_as_failed very late.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree about the sleep. Not sure about the other. Other thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update. all of this need to happen in the end of the migration thread, not in check_migration_result. Otherwise, the API thread is blocked and API users can never query the latest state.

@phip1611 phip1611 force-pushed the poc-migration-statistics branch 3 times, most recently from a83f554 to 7ee8334 Compare January 27, 2026 14:50
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from 7ee8334 to f937d87 Compare January 29, 2026 12:10
@phip1611 phip1611 requested review from amphi and scholzp January 29, 2026 12:10
@phip1611 phip1611 force-pushed the poc-migration-statistics branch 2 times, most recently from 679c6f1 to 307475e Compare February 2, 2026 13:03
@phip1611 phip1611 marked this pull request as draft February 3, 2026 16:20
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from 307475e to f32b6f7 Compare February 3, 2026 16:24
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
The logging is not very spammy nor costly (iterations take seconds to
dozens of minutes) and is clearly a win for us to debug things.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
This is the first commit in a series of commits to introduce a new API
endpoint in Cloud Hypervisor to report progress and live-insights about
an ongoing live migration.

Having live and frequently refreshing statistics/metrics about an
ongoing live migration is especially interesting for debugging and
monitoring. For the first time, we will be able to see how
live-migrations behave and create benchmarking infrastructure around it.

The ch driver in libvirt will use these information to populate its
`virsh domjobinfo` information.

We will add a new API endpoint to query information. Further, the
endpoint will be interesting to query information about a previously
failed or canceled live migration.

Specifically interesting about this API endpoint is that it will be
the first endpoint that needs the "asynchronization" of the API: more
than one API request in parallel. This needs support at least in the
HTTP API and the internal API. The "SendMigration" call is long-running
and active even if someone is querying the new endpoint.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
This is part of the commit series to enable live updates about an
ongoing live migration. See the first commit for an introduction.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
This is part of the commit series to enable live updates about an
ongoing live migration. See the first commit for an introduction.

In this commit, we add the HTTP endpoint to export ongoing VM
live-migration progress.

This work was made possible because of the following fundamental
prerequisites:
- internal API was made async
- http thread was made async

This way, one can send requests to fetch the latest state without
blocking in any code path of the API.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
This is part of the commit series to enable live updates about an
ongoing live migration. See the first commit for an introduction.

This commit prepares the avoidance of naming clashes in the following.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
This is part of the commit series to enable live updates about an
ongoing live migration. See the first commit for an introduction.

This commit actually brings all the functionality together. The first
version has the limitation that we populate the latest snapshot once per
memory iteration, although this is the most interesting part by far. In
a follow-up, we can make this more fine-grained.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
This is part of the commit series to enable live updates about an
ongoing live-migration. See the first commit for an introduction.

There isn't really an error that can happen when we query this endpoint.
A previous snapshot may either be there or not. It also doesn't make
sense here to check if the current VM is running, as users should always
be able to query information about the past (failed or canceled) live
migration.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
This is part of the commit series to enable live updates about an
ongoing live migration. See the first commit for an introduction.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from b171d24 to 6b18d8f Compare February 4, 2026 19:08
@phip1611 phip1611 requested a review from Coffeeri February 5, 2026 12:43
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from 6b18d8f to 002ac3f Compare February 5, 2026 13:30

// Wait for migration to finish
loop {
let response = simple_api_full_command_and_response(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want that? As far as I know ch-remote is just a tool that makes using the REST API easier, thus I would expect that it behaves like the REST API. If I understand correctly, you don't even give the user the possibility to not see the statistics etc. when using ch-remote?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to be discussed with upstream. I have a tendency to handle it this way, as it keep the blocking semantics in ch-remote.

Fair question, tho!

@phip1611 phip1611 marked this pull request as ready for review February 5, 2026 14:19
Time has proven that the previous design was not optimal. Now, the
SendMigration call is not blocking for the duration of the migration.
Instead, it just triggers the migration. Using the new MigrationProgress
endpoint, management software can trigger the state of the migration and
also find information for failed migrations.

A new `keep_alive` parameter for SendMigration will keep the VMM alive
and usable after the migration to ensure management software can fetch
the final state. The management software is then supposed to send a
ShutdownVmm command.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
@phip1611 phip1611 force-pushed the poc-migration-statistics branch from 002ac3f to f66f9ca Compare February 5, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants