Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Operating System Boot Issue, Windows Servers #1139

Open
celovasquesjr opened this issue Sep 9, 2024 · 13 comments
Open

Windows Operating System Boot Issue, Windows Servers #1139

celovasquesjr opened this issue Sep 9, 2024 · 13 comments
Assignees
Labels
Bug Bug

Comments

@celovasquesjr
Copy link

I am facing an issue where the Windows Server VM gets stuck on the "TIANO CORE" screen after a reboot. A simple reboot trigger, such as a "Windows Update", causes the problem. It doesn't happen consistently, and it's very difficult to reproduce.

To fix the issue and boot the OS, I have to use the poweroff force button and then start the VM again. After that, it boots normally.

Steps to reproduce the behavior are unclear since the issue doesn't occur consistently. It seems to happen randomly after certain reboot triggers, like Windows Updates.

The VM should reboot and load the OS without getting stuck on the "TIANO CORE" screen.

problem

Could the Windows Update be interfering with any drivers, causing the VM not to boot? Is there a driver issue I should investigate?

(I have already checked logs from inside the VM and on KVM, but nothing useful was generated.)

Has anyone experienced this type of problem before?

@YanVugenfirer
Copy link
Collaborator

Looks like UEFI BIOS failure and not the driver failure.

Can you post QEMU command line?

@celovasquesjr
Copy link
Author

Hello YanVugenfirer,

Here is the QEMU command line:

LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
HOME=/var/lib/libvirt/qemu/domain-280-i-96-880-VM \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-280-i-96-880-VM/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-280-i-96-880-VM/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-280-i-96-880-VM/.config \
/usr/bin/qemu-system-x86_64 \
-name guest=i-96-880-VM,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-280-i-96-880-VM/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE_4M.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/XXX-XXX-XXX-XXX-XXX.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-6.2,usb=off,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,memory-backend=pc.ram \
-accel kvm \
-cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=
on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,hv-time=on \
-m 4096 \
-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":4294967296}' \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid XXX-XXX-XXX-XXX-XXX \
-smbios 'type=1,manufacturer=Apache Software Foundation,product=CloudStack KVM Hypervisor,uuid=XXX-XXX-XXX-XXX-XXX' \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=595,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=localtime \
-no-shutdown \
-boot strict=on \
-device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-pci-bridge,id=pci.7,bus=pci.1,addr=0x0 \
-device pcie-root-port,port=22,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x6 \
-device qemu-xhci,id=usb,bus=pci.3,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
-object '{"qom-type":"secret","id":"libvirt-2-storage-auth-secret0","data":"XXX","keyid":"XXX","iv":"XXX","format":"base64"}' \
-blockdev '{"driver":"rbd","pool":"XXX","image":"XXX","server":[{"host":"XXX.XXX.XXX.XXX","port":"0"},{"host":"XXX.XXX.XXX.XXX","port":"0"},{"host":"XXX.XXX.XXX.XXX","port":"0"},{"host":"XXX.XXX.XXX.XXX","port":"0"},{"host":"XXX.XXX.XXX.XXX","port":"0"}],"user":"XXX","auth-client-required":["cephx","none"],"key-secret":"libvirt-2-storage-auth-secret0","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
-device virtio-blk-pci,bus=pci.5,addr=0x0,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=f01bb52d5ad847f99170 \
-device ide-cd,bus=ide.3,id=sata0-0-3,bootindex=1 \
-netdev tap,fd=601,id=hostnet0,vhost=on,vhostfd=793 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=XXX:XXX:XXX:XXX:XXX:XXX,bus=pci.2,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=593,server=on,wait=off \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-audiodev '{"id":"audio1","driver":"none"}' \
-object '{"qom-type":"tls-creds-x509","id":"vnc-tls-creds0","dir":"/etc/pki/libvirt-vnc","endpoint":"server","verify-peer":true}' \
-vnc XXX.XXX.XXX.XXX:XXX,password=on,tls-creds=vnc-tls-creds0,audiodev=audio1 \
-device cirrus-vga,id=video0,bus=pcie.0,addr=0x1 \
-device i6300esb,id=watchdog0,bus=pci.7,addr=0x1 \
-watchdog-action none \
-device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/111 (label charserial0)

Replace irrelevant information with 'xxx'.

@celovasquesjr
Copy link
Author

Hi YanVugenfirer

Can you help me?

@YanVugenfirer
Copy link
Collaborator

@celovasquesjr Sorry, I am travelling to a conference. Might take some time. In any case, I think the issue is not related to drivers.

@celovasquesjr
Copy link
Author

@YanVugenfirer Thank you! I would appreciate your feedback when you can. Sorry, we really don't know why this is happening.

@xiagao
Copy link

xiagao commented Sep 15, 2024

Hi @celovasquesjr
Could you tell the guest version, host kernel, qemu-kvm version and virtio-win driver version?
From your qemu cmd line, you only use 2 cpus, could you extend it and have a try?

@celovasquesjr
Copy link
Author

Hi @xiagao,

The versions are as follows:

Guest version: So far, the issue has occurred on Windows Server 2019 and 2022
Host kernel: 5.15.0-107-generic
QEMU-KVM version: 6.2.0
VirtIO version: Virtio-win-guest-tools 0.1.229

Regarding the suggestion to extend the number of CPUs, I’d like to emphasize that the boot issue related to UEFI does not appear when the machine is shut down and powered back on. I’m able to successfully boot the OS afterward. This issue is not easy to reproduce consistently. I’ve encountered it a few times when machines updated and rebooted overnight due to updates and scheduled tasks, and by the morning they were stuck on that screen. However, on other Windows machines, this problem has started happening more frequently, with just a 'trigger' from a reboot to cause the issue. Still, as I mentioned, it’s not easy to simulate—it sometimes happens, sometimes it doesn’t.

@ybendito
Copy link
Collaborator

@celovasquesjr I'd suggest to make a test of automatic system reboots with (probably) randomized delay before reboot to understand whether the problem is related to rbd disks or not. If the problem can be reproduced in such test with rbd and can't be reproduced in similar test with local image - this may narrow the problem source.

@xiagao
Copy link

xiagao commented Sep 18, 2024

I also hit a similar issue on Win10-64bit, 15/99 reproducible. @ybendito Could you have a look the similar issue in Jira, I @you there.

@celovasquesjr
Copy link
Author

@ybendito
I will perform the tests as requested and get back to you shortly once I have a response.

@xiagao
Did you run your tests on RBD disks?

@xiagao
Copy link

xiagao commented Sep 19, 2024

@ybendito I will perform the tests as requested and get back to you shortly once I have a response.

@xiagao Did you run your tests on RBD disks?

No, I didn't. My test was on the local host with a qcow2 file as the disk.

@celovasquesjr
Copy link
Author

Hi guys

I couldn't reproduce the issue on any of the RDB or Local disks.

Is there anything else I should check?

Could it be something related to the Windows Update in that specific update? I had issues when the machine automatically rebooted due to the Windows Update during the night.

However, the strange thing is that I also couldn't reproduce the issue by manually installing the updates and rebooting.

@xiagao
Copy link

xiagao commented Sep 27, 2024

Hi guys

I couldn't reproduce the issue on any of the RDB or Local disks.

Is there anything else I should check?

Could it be something related to the Windows Update in that specific update? I had issues when the machine automatically rebooted due to the Windows Update during the night.

However, the strange thing is that I also couldn't reproduce the issue by manually installing the updates and rebooting.

You could have a try with Win10 64bit guests no matter with local disk or rbd. It is possible to reproduce after some repeated reboots.
While it's tough to reproduce on other Windows os in my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants