Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WSL checks and run unit tests in CI #25522

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

l0rd
Copy link
Member

@l0rd l0rd commented Mar 10, 2025

The output of wsl --status has changed with newer versions of Windows and WSL. It's output can still be parsed to figure out if the WSL cli is installed and if the WSL feature is enabled but the strings to look for have changed.

The checks have been fixed, some unit tests added and the unit tests are now part of the (cirrus) CI checks on Windows too.

Fixes #25234

Does this PR introduce a user-facing change?

None

Copy link
Contributor

openshift-ci bot commented Mar 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: l0rd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2025
@mheon
Copy link
Member

mheon commented Mar 10, 2025

Not going to lie, I kind of hate string-parsing error messages, but I'm assuming there isn't another way?

@baude
Copy link
Member

baude commented Mar 10, 2025

@l0rd is it possible to begin to migrate support to only wsl2 ?

@l0rd
Copy link
Member Author

l0rd commented Mar 11, 2025

Not going to lie, I kind of hate string-parsing error messages, but I'm assuming there isn't another way?

I hate that too. We introduced that a couple of years ago, and in the meantime, the output message of wsl.exe --status has (of course) changed, and our fragile output parsing got broken. The alternative is using a command that requires admin privileges (dism). Still, we don't want to require that for machine init, so my approach has been to fix the current issue by testing wsl --status with as many versions of the WSL cli as possible and adding the corresponding unit tests.

In the future, we may want to investigate how wsl.exe itself retrieves information about Windows features without requiring admin privileges. But even if we find the mechanism used by wsl.exe there may be no guarantee that it will work with future versions of Windows because it may not be a "public" API (dism is the tool to check Windows features).

@l0rd
Copy link
Member Author

l0rd commented Mar 11, 2025

@l0rd is it possible to begin to migrate support to only wsl2 ?

That's already the case: if the WSL CLI is configured to use version 1 (that means that it uses the underlying WSL Windows feature rather than the Virtual Machine Platform) we try to change it to use version 2. And, if it that fails, machine init fails. This behavior has probably been introduced with the release of v5.0.

@l0rd
Copy link
Member Author

l0rd commented Mar 11, 2025

cc @BlackHole1

winmake.ps1 Outdated
function Local-Unit {
Build-Ginkgo
$skippackages="pkg\machine\apple,pkg\machine\applehv,pkg\machine\e2e,pkg\machine\libkrun,pkg\machine\provider,pkg\machine\proxyenv,pkg\machine\qemu"
Run-Command "./test/tools/build/ginkgo.exe -vv -r --tags `"$remotetags`" -timeout=90m --trace --no-color --skip-package `"$skippackages`" pkg\machine"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we run all unit tests on windows not just pkg/machine? unit tests could be added anywhere.
point in case #25323

If these issue is that these do not pass then we need to do the hard work and have !windows build tags for them.

Comment on lines 36 to 38
Write-Host "`nRunning podman-machine unit tests"
Run-Command ".\winmake localunit"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not done like this.

Merging both tests will make it hard to figure out what failed. But most importantly this breaks all our CI assumptions that we have in our skip tests based on sources. machine tests will only run on machine changes, unit tests might be elsewhere, I do see that this is not the case right now but the next person will overlook this when they enable windows unit tests outside of pkg/machine.

This should really be its own top level task I would think like the existing unit test. That also means we do run them twice for WSL and hyperV, I guess right now they are fast enough so that this doesn't matter much but still.

Comment on lines +25 to +27
wslNotInstalledMessages = []string{"kernel file is not found", "The Windows Subsystem for Linux is not installed"}
vmpDisabledMessages = []string{"enable the Virtual Machine Platform Windows feature", "Enable \"Virtual Machine Platform\""}
wslDisabledMessages = []string{"enable the \"Windows Subsystem for Linux\" optional component"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not apply to non-English speaking countries. We found that when the system language is Chinese, the output of WSL is also in Chinese.

We can only judge through specific strings, so the following sentences cannot work properly in non-English language systems:

  • kernel file is not found
  • The Windows Subsystem for Linux is not installed
  • enable the Virtual Machine Platform Windows feature
  • Enable "Virtual Machine Platform"
  • enable the "Windows Subsystem for Linux" optional component

In certain WSL versions in the Chinese system, Windows Subsystem for Linux will be Linux 的 Windows 子系统, Virtual Machine Platform will be 虚拟机平台

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback @BlackHole1, you are right. I am afraid that Chinese is only the tip of the iceberg, there are 43 language packs for latest Windows. What's your suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our code, we make judgments using the following keywords:

keywords := []string{"Windows Subsystem for Linux", "BIOS", "wsl.exe", "enablevirtualization", "WSL1"}

In our scenario, we require both Windows Subsystem for Linux and Virtual Machine Platform to be enabled at the same time, as this helps us avoid dealing with some edge cases.

For more information, see: https://github.com/oomol-lab/ovm-win/blob/0bf791ef3ff41a2800c47b822b07957f1cc6ffc0/pkg/wsl/check.go#L244-L294


Init System

  1. No enable Features
  2. No Update WSL
PS C:\Users\live> wsl --help

版权所有 (c) Microsoft Corporation。保留所有权利。

用法: wsl.exe [参数]

参数:

   --install
       安装适用于 Linux 的 Windows 子系统。如果未指定任何选项,
       建议的功能将与默认分发版一起安装。

       有关安装选项的完整列表,请访问 https://aka.ms/wslinstall。

   --update
       更新到最新版本的适用于 Linux 的 Windows 子系统。

   --status
       显示适用于 Linux 的 Windows 子系统状态。

   --help
       显示使用情况信息。

PS C:\Users\live> wsl --status
未安装适用于 Linux 的 Windows 子系统。可通过运行 “wsl.exe --install” 进行安装。
有关详细信息,请访问 https://aka.ms/wslinstall

NOTE!!!!
wsl --status exit code is not 0!

Only Updated WSL

wsl --help has too much output, so ignore it.

PS C:\Users\live> wsl --status
默认版本: 2
当前计算机配置不支持 WSL1。
若要使用 WSL1,请启用“Windows Subsystem for Linux”可选组件。
当前计算机配置不支持 WSL2。
请启用“虚拟机平台”可选组件,并确保在 BIOS 中启用虚拟化。
通过运行以下命令启用“虚拟机平台”: wsl.exe --install --no-distribution
有关信息,请访问 https://aka.ms/enablevirtualization

NOTE!!!!
wsl --status exit code is 0!

Updated WSL + Only Enable Windows Subsystem for Linux

PS C:\Users\live> wsl --status
默认版本: 2
当前计算机配置不支持 WSL2。
请启用“虚拟机平台”可选组件,并确保在 BIOS 中启用虚拟化。
通过运行以下命令启用“虚拟机平台”: wsl.exe --install --no-distribution
有关信息,请访问 https://aka.ms/enablevirtualization

Updated WSL + Only Enable Virtual Machine Platform

PS C:\Users\live> wsl --status
默认版本: 2
当前计算机配置不支持 WSL1。
若要使用 WSL1,请启用“Windows Subsystem for Linux”可选组件。

Updated WSL + Enable All

PS C:\Users\live> wsl --status
默认版本: 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to differences between different WSL versions, my suggestion is to first check the user's WSL version. If it is below a certain version, upgrade it to the latest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlackHole1, thank you for your suggestions. However, I am hesitant to make the check too accurate as it augments the risk of false positives (mistakenly assuming that WSL is not installed). A false positive makes podman unusable, as podman init fails, and users can't turn off the WSL check. Given the fragility of the output parsing, we should stay on the safe side and accept a high rate of false negatives (mistakenly assuming that WSL is installed) to avoid any false positives.

Also, we should avoid preemptively updating WSL, even when it's unnecessary. Some company policies may block updating to the latest WSL version, and Podman trying to update at every machine init would be problematic. We do try to update, but only if we find WSL v1.

That said, the problem is far from solved. As you found out, when Windows is configured in a language other than English, we assume that WSL is installed. And, as you suggested, we may need to check the BIOS virtualization support. But considered the risks, a safer approach would be to accept that we may fail to find out that WSL is outdated or not present and, instead, improve the error messages shown when the VM installation fails with a clear explanation of the error and a suggestion on how to move forward.

@BlackHole1
Copy link
Contributor

BlackHole1 commented Mar 12, 2025

Not going to lie, I kind of hate string-parsing error messages, but I'm assuming there isn't another way?

@mheon In fact, there is no way without applying for administrator privileges. Docker for Windows seems to register a service in Windows to avoid the need for administrator privileges, but I'm not sure.

Under non-administrator privileges, there are only two methods:

  1. String parsing
  2. Calling Windows’s private API

is it possible to begin to migrate support to only wsl2 ?

@baude My suggestion is to not support WSL1, but to only support WSL2.

@BlackHole1
Copy link
Contributor

There is still a problem:

  • When the BIOS does not have virtualization support enabled, podman machine init will still fail without notifying the user.

Maybe this can be achieved in another PR.

Copy link

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@l0rd l0rd force-pushed the fix-wsl-check branch 3 times, most recently from 010a6b9 to 05d259d Compare March 13, 2025 10:37
Add a new target in winmake.ps1 to run unit tests and use
use it in a new cirrus task.

Fix machine_windows_test.go to make it work in CI machine.

Add the `!windows` tag on tests files that fail on Windows.

Signed-off-by: Mario Loriedo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. machine release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command machine init fails to check if WSL is installed
5 participants