Skip to content

Add QEMU on Windows to CI #3475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

arixmkii
Copy link
Contributor

@arixmkii arixmkii commented Apr 26, 2025

For now it will use additional templates, because of incompatible mounts.

This is probably not for 1.1.0.

It is possible to use default.yaml for Windows with changes from #3318 (this would need rebase first, but I checked it using a rebased patch in a forked repo - example run https://github.com/arixmkii/qcw/actions/runs/14681726480/job/41205771124).

@arixmkii
Copy link
Contributor Author

arixmkii commented Apr 26, 2025

time="2025-04-26T17:20:45Z" level=fatal msg="failed to validate YAML file "C:\\a\\lima\\lima\\templates\\experimental\\default-windows.yaml": can't parse builtin Lima version "cfbffd8": cfbffd8 is not in dotted-tri format"

make/git on Windows incorrectly resolve version. I will check it (no such issues, when checkout and build are done with msys2 tools). fixed

Another topic to check - use chocolatey to install QEMU, because msys2 QEMU installation feels slow.

@arixmkii arixmkii marked this pull request as draft April 26, 2025 17:23
@arixmkii
Copy link
Contributor Author

Probably would need to move mounts-windows under _default to not fail validation script.

@arixmkii
Copy link
Contributor Author

Chocolatey QEMU package is not well maintained, so, I chose winget instead, which is a great alternative. There is a known limitation that it is not available out of the box in Windows Server 2022, so, there is a hacky action to add it, which is now archived and will not be needed at all after migration to Windows Server 2025, this is highlighted by the comment.

@arixmkii arixmkii marked this pull request as ready for review April 28, 2025 18:23
@arixmkii
Copy link
Contributor Author

@jandubois @AkihiroSuda I would like to know your opinions on how reasonable is it to extend CI to support this (to not overload CI and not increase costs significantly). From my side there is no rush and I can see reasons to postpone this until #3316 is addressed (via #3318 refresh or other means). Also it might be reasonable to wait for migration to WinServer 2025 to not use now archived https://github.com/Cyberboss/install-winget action.

I authored it now to have proof of concept confirmed and potentially creating reference starting point for its introduction.

@@ -175,6 +175,44 @@ jobs:
$env:_LIMA_WINDOWS_EXTRA_PATH = 'C:\Program Files\Git\usr\bin'
bash.exe -c "./hack/test-templates.sh templates/experimental/wsl2.yaml"

windows-qemu:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we now drop these lines?

if runtime.GOOS == "windows" && runtime.GOARCH == "amd64" {
// https://github.com/lima-vm/lima/pull/3487#issuecomment-2846253560
// > #931 intentionally prevented the code from setting it to max when running on Windows,
// > and kept it at qemu64.
//
// TODO: remove this if "max" works with the latest qemu
defaultX8664 = "qemu64"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience "max" just didn't work well with WHPX acceleration. I tested it on 3 different machines in the past. I only was able to make it work by disabling specific CPU features, which were different on every machine. It was not user friendly default. I can do some canary testing to compare how it works now with newer QEMU/Windows versions and if the failures are as common as they were before.

@arixmkii
Copy link
Contributor Author

arixmkii commented May 2, 2025

I tried to limit both Windows builds to windows-2025 standard. I see QEMU one failed with errors mounting SSHFS (I observed this instability before with standard runners, they are definitely recurring and could very persistent restarting job). WSL2 faced some error and now test in a locked state (it will be terminated after 30 minutes time out, because I can't cancel it manually). I can say that WSL2 was less stable (comparing to Lima 8-cores runner), when I used standard runners, but I mostly faced errors from sysmtemd, this one is new.

I will restart build setting both to windows-2025-8-cores to compare.

Signed-off-by: Arthur Sengileyev <[email protected]>
@arixmkii
Copy link
Contributor Author

arixmkii commented May 2, 2025

It didn't help for QEMU

time="2025-05-02T19:26:57Z" level=info msg="[hostagent] :/c/Users/runneradmin: No such file or directory"
time="2025-05-02T19:26:57Z" level=warning msg="[hostagent] failed to confirm whether /c/Users/runneradmin [remote] is successfully mounted" error="failed to execute script \"wait-for-remote-ready\": stdout=\"\", stderr=\"mux_client_request_session: read from master failed: Connection reset by peer\\r\\nControlSocket /c/Users/runneradmin/.lima/default/ssh.sock already exists, disabling multiplexing\\r\\nsshfs does not seem to be mounted on /c/Users/runneradmin\\n\": exit status 1"

SSHFS is weird on Windows in general and inside runners specifically. Giving some insights on my experience testing this in GH runners for a month or so. It always (or almost always) failed to mount $TEMP, but most of the time managed to mount $HOME, the situation with $TEMP - if temp was tried, but was not mounted the integration tests will still pass.

Troubleshooting the $TEMP issue locally I first managed to replicate it on my dev machine, but the fix was to clean the content of $TEMP folder. It looked like sftp-server might be sensitive to the folder contents, but I didn't try to test this in details.

I'm thinking I will test the standard runners and disable mount tests on Windows platform with a comment of them being flaky - which they indeed are.

Will experiment in my repo on isolated examples and then will update this PR once again.

@jandubois
Copy link
Member

It always (or almost always) failed to mount $TEMP, but most of the time managed to mount $HOME

Is this just another instance of #302? Because $TEMP will be located at $HOME\AppData\Temp?

I always thought the issue was the overlap in the guest, but maybe the overlap on the host is the real problem?

At the time I filed #302 we did not yet have support for specifying a different mountPoint, so it was impossible to tell which side was causing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants