-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86: add support to use AVX* CPU features #5314
Comments
Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. The FPU work is based upon upstream NOVA kernel commits and Hedron sources. Issue genodelabs#5314 Fixes genodelabs#3914
Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314
I enabled the support for the NOVA kernel, by porting relevant former work to our version, and managed to enable the support for the Seoul VMM on AMD and Intel machines. If all works out, Linux reports something along the lines:
Additionally, during testing I found the following tool very helpful, in order to detect the correct working and that indeed all variants of AVX are enabled and working, https://github.com/travisdowns/avx-turbo.git. Additionally it measures the maximal operation per seconds which are doable. The tool output from within a VM without AVX support reports:
And with AVX enabled:
Additionally, I used on a U4711 notebook a Debian 12 VM with Firefox the MotionMark 1.3.1 from browserbench.org, as already used by @jschlatow during his browser performance analysis on Genodians.org. Even so the results are not very stable and fluctuate, it looks as it seems to have a positive effect, best results:
|
Additionally, I downloaded a video from jellyfish, https://repo.jellyfin.org/jellyfish/jellyfish-30-mbps-hd-h264.mkv, and used ffmpeg to transcode the file, in order to see some impact. The both files are attached, and the diff of the output is below. Some improvements are visible.
|
Another test from the Phoronix test suite, e.g. Bosphorus, manually executed (so not using the test suite), shows following results. The traces and command invocation are part of the attached log files.
sse_x265_Bosphorus_1920x1080.txt |
Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. The FPU work is based upon upstream NOVA kernel commits and Hedron sources. Issue #5314 Fixes #3914
Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314
Merged f5a9d5e and genodelabs/genode-world@eee31d2 to staging. |
some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue genodelabs#5314
Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. This FPU work is based upon upstream NOVA kernel and Hedron commits. Issue genodelabs#5314 Fixes genodelabs#3914
some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue #5314
Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. This FPU work is based upon upstream NOVA kernel and Hedron commits. Issue #5314 Fixes #3914
Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314
depot_autopilot/test-pthread failed last night with
Same occurred with AVX patches from 2024-08-06 at 2024-08-07 03:51:53. |
Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. This FPU work is based upon upstream NOVA kernel and Hedron commits. Issue genodelabs#5314 Fixes genodelabs#3914
some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue #5314
Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. This FPU work is based upon upstream NOVA kernel and Hedron commits. Issue #5314 Fixes #3914
Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314
Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue genodelabs#5314
Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue genodelabs#5314
I added the commits to get AVX working with vbox6, tested with a debian, ubuntu and win10 VM on a modular sculpt. |
@alex-ab would you mind to record the remaining problems with avx-turbo in this issue? I agree that we don't have to fix them if they are specific to the use of the tool only and don't happen in real scenarios. |
Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue #5314
Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue #5314
I found the issue with the test. It divides on TSC frequency calculation by 0 which fails. I added a patch for in vbox6 usage. Instead of reading out the frequency (which is not provided by vbox6), it measures it and then the whole AVX test works.
|
@chelmuth: please add the fixup and the aes commit to staging from my staging branch |
Thanks, merged to staging. |
Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue #5314
Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue #5314
FWIW, 4903595 enables |
Can you report any positive performance (or other) impact? |
@chelmuth well, I did not perform any testing so I cannot comment either way (especially as I have not enabled them in isolation, i.e. AVX/AES was already enabled and could skew the results). |
Fix regression introduced in Issue genodelabs#5314
Fix regression introduced in Issue genodelabs#5314
Fix regression introduced in Issue genodelabs#5314
Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391
Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391
Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391
Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391
The various AVX FPU extensions for x86 CPUs can be used for various media centered and/or in general mathematical optimized work load (beside GPUs). The feature is nowadays common across all relevant CPU vendors in various extensions (AVX, AVX2, AVX512). Especially in the context of the VM, an enablement may improve runtime and/or CPU usage of guest applications, which are capable of using these FPU extensions. Let us enable it.
Steps to work on respectively consider:
The text was updated successfully, but these errors were encountered: