x86: add support to use AVX* CPU features #5314

alex-ab · 2024-08-05T10:46:16Z

The various AVX FPU extensions for x86 CPUs can be used for various media centered and/or in general mathematical optimized work load (beside GPUs). The feature is nowadays common across all relevant CPU vendors in various extensions (AVX, AVX2, AVX512). Especially in the context of the VM, an enablement may improve runtime and/or CPU usage of guest applications, which are capable of using these FPU extensions. Let us enable it.

Steps to work on respectively consider:

nova kernel support
base-hw kernel support
other kernel support
VM session adaptations, e.g. storing/loading more FPU state, size varies depending on host features
Seoul VMM support
VBox6 VMM support
extended Genode framework support, e.g. compiler switches, where appropriate store/load more FPU state

Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. The FPU work is based upon upstream NOVA kernel commits and Hedron sources. Issue genodelabs#5314 Fixes genodelabs#3914

Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314

alex-ab · 2024-08-05T11:28:06Z

I enabled the support for the NOVA kernel, by porting relevant former work to our version, and managed to enable the support for the Seoul VMM on AMD and Intel machines. If all works out, Linux reports something along the lines:

[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point regi
[init -> seoul] VMM: # |   sters'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[init -> seoul] VMM: #   [    0.000000] x86/fpu: xstate_offset[5]:  832, xstate_sizes[5]:   64
[init -> seoul] VMM: #   [    0.000000] x86/fpu: xstate_offset[6]:  896, xstate_sizes[6]:  512
[init -> seoul] VMM: #   [    0.000000] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Enabled xstate features 0xe7, context size is 2432 bytes
[init -> seoul] VMM: # |   , using 'compacted' format.

Additionally, during testing I found the following tool very helpful, in order to detect the correct working and that indeed all variants of AVX are enabled and working, https://github.com/travisdowns/avx-turbo.git. Additionally it measures the maximal operation per seconds which are doable.

The tool output from within a VM without AVX support reports:

CPUID highest leaf    : [ dh]
Running as root       : [NO ]
MSR reads supported   : [NO ]
CPU pinning enabled   : [YES]
CPU supports zeroupper: [NO ]
CPU supports AVX2     : [NO ]
CPU supports AVX-512F : [NO ]
CPU supports AVX-512VL: [NO ]
CPU supports AVX-512BW: [NO ]
CPU supports AVX-512CD: [NO ]
CPUID doesn't support leaf 0x15, falling back to manual TSC calibration.
tsc_freq = 2995.2 MHz (from calibration loop)
CPU brand string: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
2 available CPUs: [0, 1]
Can't use cpuid leaf 0xb to filter out hyperthreads, CPU too old or AMD
2 physical cores: [0, 1]
Will test up to 2 CPUs
Cores | ID          | Description         | OVRLP3 | Mops
1     | pause_only  | pause instruction   |  1.000 | 1649
1     | scalar_iadd | Scalar integer adds |  1.000 | 4290

Cores | ID          | Description         | OVRLP3 |       Mops
2     | pause_only  | pause instruction   |  1.000 | 2829, 2840
2     | scalar_iadd | Scalar integer adds |  1.000 | 3884, 3873

And with AVX enabled:

PUID highest leaf    : [ dh]
Running as root       : [NO ]
MSR reads supported   : [NO ]
CPU pinning enabled   : [YES]
CPU supports zeroupper: [YES]
CPU supports AVX2     : [YES]
CPU supports AVX-512F : [YES]
CPU supports AVX-512VL: [YES]
CPU supports AVX-512BW: [YES]
CPU supports AVX-512CD: [YES]
CPUID doesn't support leaf 0x15, falling back to manual TSC calibration.
tsc_freq = 2995.2 MHz (from calibration loop)
CPU brand string: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
2 available CPUs: [0, 1]
Can't use cpuid leaf 0xb to filter out hyperthreads, CPU too old or AMD
2 physical cores: [0, 1]
Will test up to 2 CPUs
Cores | ID                  | Description                       | OVRLP3 |  Mops
1     | pause_only          | pause instruction                 |  1.000 |  1649
1     | ucomis_clean        | scalar ucomis (w/ vzeroupper)     |  1.000 |  1065
1     | ucomis_dirty        | scalar ucomis (no vzeroupper)     |  1.000 |  1065
1     | scalar_iadd         | Scalar integer adds               |  1.000 |  4290
1     | avx128_iadd         | 128-bit integer serial adds       |  1.000 |  4290
1     | avx256_iadd         | 256-bit integer serial adds       |  1.000 |  4290
1     | avx512_iadd         | 512-bit integer serial adds       |  1.000 |  4290
1     | avx128_iadd16       | 128-bit integer serial adds zmm16 |  1.000 |  4290
1     | avx256_iadd16       | 256-bit integer serial adds zmm16 |  1.000 |  4291
1     | avx512_iadd16       | 512-bit integer serial adds zmm16 |  1.000 |  4290
1     | avx128_iadd_t       | 128-bit integer parallel adds     |  1.000 | 12870
1     | avx256_iadd_t       | 256-bit integer parallel adds     |  1.000 | 12870
1     | avx128_xor_zero     | 128-bit zeroing xor               |  1.000 | 21236
1     | avx256_xor_zero     | 256-bit zeroing xor               |  1.000 | 21240
1     | avx512_xor_zero     | 512-bit zeroing xord              |  1.000 | 21231
1     | avx128_mov_sparse   | 128-bit reg-reg mov               |  1.000 |  4290
1     | avx256_mov_sparse   | 256-bit reg-reg mov               |  1.000 |  4290
1     | avx512_mov_sparse   | 512-bit reg-reg mov               |  1.000 |  4291
1     | avx128_merge_sparse | 128-bit reg-reg merge mov         |  1.000 |  4290
1     | avx256_merge_sparse | 256-bit reg-reg merge mov         |  1.000 |  4290
1     | avx512_merge_sparse | 512-bit reg-reg merge mov         |  1.000 |  4290
1     | avx128_vshift       | 128-bit variable shift (vpsrlvd)  |  1.000 |  4290
1     | avx256_vshift       | 256-bit variable shift (vpsrlvd)  |  1.000 |  4290
1     | avx512_vshift       | 512-bit variable shift (vpsrlvd)  |  1.000 |  4290
1     | avx128_vshift_t     | 128-bit variable shift (vpsrlvd)  |  1.000 |  8580
1     | avx256_vshift_t     | 256-bit variable shift (vpsrlvd)  |  1.000 |  8579
1     | avx512_vshift_t     | 512-bit variable shift (vpsrlvd)  |  1.000 |  4290
1     | avx128_vlzcnt       | 128-bit lzcnt (vplzcntd)          |  1.000 |  1073
1     | avx256_vlzcnt       | 256-bit lzcnt (vplzcntd)          |  1.000 |  1073
1     | avx512_vlzcnt       | 512-bit lzcnt (vplzcntd)          |  1.000 |  1073
1     | avx128_vlzcnt_t     | 128-bit lzcnt (vplzcntd)          |  1.000 |  8581
1     | avx256_vlzcnt_t     | 256-bit lzcnt (vplzcntd)          |  1.000 |  8579
1     | avx512_vlzcnt_t     | 512-bit lzcnt (vplzcntd)          |  1.000 |  4290
1     | avx128_imul         | 128-bit integer muls (vpmuldq)    |  1.000 |   858
1     | avx256_imul         | 256-bit integer muls (vpmuldq)    |  1.000 |   858
1     | avx512_imul         | 512-bit integer muls (vpmuldq)    |  1.000 |   858
1     | avx128_fma_sparse   | 128-bit 64-bit sparse FMAs        |  1.000 |  4290
1     | avx256_fma_sparse   | 256-bit 64-bit sparse FMAs        |  1.000 |  4290
1     | avx512_fma_sparse   | 512-bit 64-bit sparse FMAs        |  1.000 |  4290
1     | avx128_fma          | 128-bit serial DP FMAs            |  1.000 |  1073
1     | avx256_fma          | 256-bit serial DP FMAs            |  1.000 |  1073
1     | avx512_fma          | 512-bit serial DP FMAs            |  1.000 |  1073
1     | avx128_fma_t        | 128-bit parallel DP FMAs          |  1.000 |  8579
1     | avx256_fma_t        | 256-bit parallel DP FMAs          |  1.000 |  8580
1     | avx512_fma_t        | 512-bit parallel DP FMAs          |  1.000 |  4290
1     | avx512_vpermw       | 512-bit serial WORD permute       |  1.000 |  1073
1     | avx512_vpermw_t     | 512-bit parallel WORD permute     |  1.000 |  4290
1     | avx512_vpermd       | 512-bit serial DWORD permute      |  1.000 |  1430
1     | avx512_vpermd_t     | 512-bit parallel DWORD permute    |  1.000 |  4290

Cores | ID                  | Description                       | OVRLP3 |         Mops
2     | pause_only          | pause instruction                 |  1.000 |   2830, 2862
2     | ucomis_clean        | scalar ucomis (w/ vzeroupper)     |  1.000 |   1047, 1047
2     | ucomis_dirty        | scalar ucomis (no vzeroupper)     |  1.000 |   1047, 1046
2     | scalar_iadd         | Scalar integer adds               |  1.000 |   3878, 3884
2     | avx128_iadd         | 128-bit integer serial adds       |  1.000 |   3737, 3742
2     | avx256_iadd         | 256-bit integer serial adds       |  1.000 |   3737, 3746
2     | avx512_iadd         | 512-bit integer serial adds       |  1.000 |   3900, 3900
2     | avx128_iadd16       | 128-bit integer serial adds zmm16 |  1.000 |   3746, 3738
2     | avx256_iadd16       | 256-bit integer serial adds zmm16 |  1.000 |   3735, 3744
2     | avx512_iadd16       | 512-bit integer serial adds zmm16 |  1.000 |   3922, 3919
2     | avx128_iadd_t       | 128-bit integer parallel adds     |  1.000 |   6433, 6434
2     | avx256_iadd_t       | 256-bit integer parallel adds     |  1.000 |   6446, 6440
2     | avx128_xor_zero     | 128-bit zeroing xor               |  1.000 | 10619, 10615
2     | avx256_xor_zero     | 256-bit zeroing xor               |  1.000 | 10608, 10619
2     | avx512_xor_zero     | 512-bit zeroing xord              |  1.000 | 10597, 10613
2     | avx128_mov_sparse   | 128-bit reg-reg mov               |  1.000 |   3873, 3878
2     | avx256_mov_sparse   | 256-bit reg-reg mov               |  1.000 |   3871, 3884
2     | avx512_mov_sparse   | 512-bit reg-reg mov               |  1.000 |   3879, 3874
2     | avx128_merge_sparse | 128-bit reg-reg merge mov         |  1.000 |   3877, 3879
2     | avx256_merge_sparse | 256-bit reg-reg merge mov         |  1.000 |   3878, 3877
2     | avx512_merge_sparse | 512-bit reg-reg merge mov         |  1.000 |   3879, 3878
2     | avx128_vshift       | 128-bit variable shift (vpsrlvd)  |  1.000 |   3914, 3915
2     | avx256_vshift       | 256-bit variable shift (vpsrlvd)  |  1.000 |   3915, 3917
2     | avx512_vshift       | 512-bit variable shift (vpsrlvd)  |  1.000 |   2095, 2095
2     | avx128_vshift_t     | 128-bit variable shift (vpsrlvd)  |  1.000 |   4292, 4293
2     | avx256_vshift_t     | 256-bit variable shift (vpsrlvd)  |  1.000 |   4284, 4291
2     | avx512_vshift_t     | 512-bit variable shift (vpsrlvd)  |  1.000 |   2090, 2091
2     | avx128_vlzcnt       | 128-bit lzcnt (vplzcntd)          |  1.000 |   1072, 1072
2     | avx256_vlzcnt       | 256-bit lzcnt (vplzcntd)          |  1.000 |   1072, 1072
2     | avx512_vlzcnt       | 512-bit lzcnt (vplzcntd)          |  1.000 |   1072, 1072
2     | avx128_vlzcnt_t     | 128-bit lzcnt (vplzcntd)          |  1.000 |   4299, 4295
2     | avx256_vlzcnt_t     | 256-bit lzcnt (vplzcntd)          |  1.000 |   4287, 4307
2     | avx512_vlzcnt_t     | 512-bit lzcnt (vplzcntd)          |  1.000 |   2089, 2092
2     | avx128_imul         | 128-bit integer muls (vpmuldq)    |  1.000 |    858,  858
2     | avx256_imul         | 256-bit integer muls (vpmuldq)    |  1.000 |    858,  858
2     | avx512_imul         | 512-bit integer muls (vpmuldq)    |  1.000 |    858,  858
2     | avx128_fma_sparse   | 128-bit 64-bit sparse FMAs        |  1.000 |   3877, 3877
2     | avx256_fma_sparse   | 256-bit 64-bit sparse FMAs        |  1.000 |   3880, 3878
2     | avx512_fma_sparse   | 512-bit 64-bit sparse FMAs        |  1.000 |   3877, 3874
2     | avx128_fma          | 128-bit serial DP FMAs            |  1.000 |   1072, 1072
2     | avx256_fma          | 256-bit serial DP FMAs            |  1.000 |   1072, 1072
2     | avx512_fma          | 512-bit serial DP FMAs            |  1.000 |   1072, 1072
2     | avx128_fma_t        | 128-bit parallel DP FMAs          |  1.000 |   4293, 4280
2     | avx256_fma_t        | 256-bit parallel DP FMAs          |  1.000 |   4285, 4294
2     | avx512_fma_t        | 512-bit parallel DP FMAs          |  1.000 |   2089, 2091
2     | avx512_vpermw       | 512-bit serial WORD permute       |  1.000 |   1069, 1069
2     | avx512_vpermw_t     | 512-bit parallel WORD permute     |  1.000 |   2145, 2146
2     | avx512_vpermd       | 512-bit serial DWORD permute      |  1.000 |   1430, 1430
2     | avx512_vpermd_t     | 512-bit parallel DWORD permute    |  1.000 |   2149, 2142

Additionally, I used on a U4711 notebook a Debian 12 VM with Firefox the MotionMark 1.3.1 from browserbench.org, as already used by @jschlatow during his browser performance analysis on Genodians.org. Even so the results are not very stable and fluctuate, it looks as it seems to have a positive effect, best results:

w/o  AVX, but with SSE*:  23.74 @ 60fps +- 224.84 %
with AVX commits       : 167.28 @ 60fps +-  29.97 %

alex-ab · 2024-08-05T11:40:41Z

Additionally, I downloaded a video from jellyfish, https://repo.jellyfin.org/jellyfish/jellyfish-30-mbps-hd-h264.mkv, and used ffmpeg to transcode the file, in order to see some impact. The both files are attached, and the diff of the output is below. Some improvements are visible.

ffmpeg -benchmark -i jellyfish-30-mbps-hd-h264.mkv -c:v libx265 -preset medium -crf 20 -c:a copy jellyfish-30-mbps-hd-h265-crf20.mkv

x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
encoded 900 frames in 300.06s (3.00 fps), 11242.09 kb/s, Avg QP:24.48

x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3
encoded 900 frames in 243.94s (3.69 fps), 11242.09 kb/s, Avg QP:24.48

sse_ffmpeg_30.txt
avx_ffmpeg_30.txt

alex-ab · 2024-08-05T11:46:20Z

Another test from the Phoronix test suite, e.g. Bosphorus, manually executed (so not using the test suite), shows following results. The traces and command invocation are part of the attached log files.

x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
encoded 600 frames in 69.54s (8.63 fps), 1271.47 kb/s, Avg QP:33.68

x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3
encoded 600 frames in 52.36s (11.46 fps), 1271.47 kb/s, Avg QP:33.68

sse_x265_Bosphorus_1920x1080.txt
avx_x265_Bosphorus_1920x1080.txt

Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. The FPU work is based upon upstream NOVA kernel commits and Hedron sources. Issue #5314 Fixes #3914

Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314

chelmuth · 2024-08-06T07:09:13Z

Merged f5a9d5e and genodelabs/genode-world@eee31d2 to staging.

some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue genodelabs#5314

Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. This FPU work is based upon upstream NOVA kernel and Hedron commits. Issue genodelabs#5314 Fixes genodelabs#3914

some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue #5314

Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. This FPU work is based upon upstream NOVA kernel and Hedron commits. Issue #5314 Fixes #3914

Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314

Issue #5314

chelmuth · 2024-08-13T06:22:55Z

depot_autopilot/test-pthread failed last night with #UD on x86_64.

[2024-08-13 03:30:11] [init -> depot_autopilot] 1.308 [init -> test-pthread] main thread: start PTHREAD_MUTEX_NORMAL stress test
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.305', cpu 2, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.310', cpu 5, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.307', cpu 6, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.311', cpu 7, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.309', cpu 3, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.308', cpu 1, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.306', cpu 4, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:15] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.303', cpu 7, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:31:40] [init -> depot_autopilot] 
[2024-08-13 03:31:40] [init -> depot_autopilot]  test-pthread                    failed    89.987  timeout 90 sec

Same occurred with AVX patches from 2024-08-06 at 2024-08-07 03:51:53.

Issue genodelabs#5314

Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. This FPU work is based upon upstream NOVA kernel and Hedron commits. Issue genodelabs#5314 Fixes genodelabs#3914

some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue #5314

Add extended FPU state detection and handling (via xsave and friends) to the kernel, which has to store/load more FPU state (~512 -> 2k++) during context switching of threads. Additional the referenced nova branch contains various optimization during VM destruction and cross core IPC resource caching. This FPU work is based upon upstream NOVA kernel and Hedron commits. Issue #5314 Fixes #3914

Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314

Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue genodelabs#5314

Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue genodelabs#5314

Issue genodelabs#5314

alex-ab · 2024-09-13T09:23:07Z

I added the commits to get AVX working with vbox6, tested with a debian, ubuntu and win10 VM on a modular sculpt.

chelmuth · 2024-09-16T09:57:20Z

@alex-ab would you mind to record the remaining problems with avx-turbo in this issue? I agree that we don't have to fix them if they are specific to the use of the tool only and don't happen in real scenarios.

Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue #5314

Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue #5314

Issue #5314

alex-ab · 2024-09-16T11:31:19Z

@alex-ab would you mind to record the remaining problems with avx-turbo in this issue? I agree that we don't have to fix them if they are specific to the use of the tool only and don't happen in real scenarios.

I found the issue with the test. It divides on TSC frequency calculation by 0 which fails. I added a patch for in vbox6 usage. Instead of reading out the frequency (which is not provided by vbox6), it measures it and then the whole AVX test works.

avx_turbo_tsc_calc.txt

--- a/tsc-support.cpp
+++ b/tsc-support.cpp
@@ -41,7 +41,8 @@ uint64_t get_tsc_from_cpuid_inner() {
 
 
     if (family.family == 6) {
-        if (family.model == 0x4E || family.model == 0x5E || family.model == 0x8E || family.model == 0x9E) {
+        printf("%s:%u division by %u is not good !!!\n", __func__, __LINE__, cpuid15.eax);
+        if (cpuid15.eax && (family.model == 0x4E || family.model == 0x5E || family.model == 0x8E || family.model == 0x9E)) {
             // skylake client or kabylake
             return (int64_t)24000000 * cpuid15.ebx / cpuid15.eax; // 24 MHz crystal clock
         }

Issue genodelabs#5314

alex-ab · 2024-09-17T09:01:20Z

@chelmuth: please add the fixup and the aes commit to staging from my staging branch

chelmuth · 2024-09-17T09:55:34Z

Thanks, merged to staging.

Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue #5314

Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue #5314

Issue #5314

Issue genodelabs#5314

cnuke · 2024-10-08T13:18:30Z

FWIW, 4903595 enables RDRAND and RDSEED. I've been using the commit for some time now w/o any noticeable problems.

chelmuth · 2024-10-08T13:24:17Z

Can you report any positive performance (or other) impact?

cnuke · 2024-10-08T13:49:27Z

@chelmuth well, I did not perform any testing so I cannot comment either way (especially as I have not enabled them in isolation, i.e. AVX/AES was already enabled and could skew the results).

Issue #5314

Issue genodelabs#5314

Issue #5314

Fix regression introduced in Issue genodelabs#5314

Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391

Issue genodelabs#5314

Regression introduced in Issue #5314 Fixes #5391

Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391

Regression introduced in Issue #5314 Fixes #5391

alex-ab added the feature label Aug 5, 2024

alex-ab added a commit to alex-ab/genode-world that referenced this issue Aug 5, 2024

seoul: enable AVX support

6edc6a5

Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314

chelmuth pushed a commit to genodelabs/genode-world that referenced this issue Aug 6, 2024

seoul: enable AVX support

eee31d2

Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314

alex-ab added a commit to alex-ab/genode that referenced this issue Aug 9, 2024

vbox5: disable xsave

081fbf8

some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue genodelabs#5314

chelmuth pushed a commit that referenced this issue Aug 12, 2024

vbox5: disable xsave

6e33e90

some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue #5314

chelmuth pushed a commit to genodelabs/genode-world that referenced this issue Aug 12, 2024

seoul: enable AVX support

1b8a6c6

Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314

chelmuth added a commit that referenced this issue Aug 13, 2024

fixup "vbox5: disable xsave" (patch, hash)

a8bcd25

Issue #5314

alex-ab pushed a commit to alex-ab/genode that referenced this issue Aug 13, 2024

fixup "vbox5: disable xsave" (patch, hash)

fb4ab1b

Issue genodelabs#5314

chelmuth pushed a commit that referenced this issue Aug 27, 2024

vbox5: disable xsave

79506e4

some more adjustments are needed for xsave support, but this port is scheduled to be removed. Just disable xsave for the time being to make nightly test happy. Issue #5314

chelmuth pushed a commit to genodelabs/genode-world that referenced this issue Aug 27, 2024

seoul: enable AVX support

294332b

Add preliminary support. Tested on Sculpt 24.04 with nova kernel on AMD and Intel in a Debian 12 VM. genodelabs/genode#5314

alex-ab added a commit to alex-ab/genode that referenced this issue Sep 13, 2024

vm/x86: support extended fpu state transfer

6ff2a39

Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue genodelabs#5314

alex-ab added a commit to alex-ab/genode that referenced this issue Sep 13, 2024

nova: handle invalid FPU guest state

83ff8e5

Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue genodelabs#5314

alex-ab added a commit to alex-ab/genode that referenced this issue Sep 13, 2024

vbox6: enable AVX support

f92ee11

Issue genodelabs#5314

chelmuth pushed a commit that referenced this issue Sep 16, 2024

vm/x86: support extended fpu state transfer

64644c9

Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue #5314

chelmuth pushed a commit that referenced this issue Sep 16, 2024

nova: handle invalid FPU guest state

6ff4a77

Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue #5314

chelmuth pushed a commit that referenced this issue Sep 16, 2024

vbox6: enable AVX support

ad2b1cb

Issue #5314

alex-ab mentioned this issue Sep 16, 2024

Division by zero on TSC calculation when running on Virtualbox 6 travisdowns/avx-turbo#29

Open

alex-ab added a commit to alex-ab/genode that referenced this issue Sep 17, 2024

fixup "vbox6: enable AVX support"

054b9ba

Issue genodelabs#5314

alex-ab added a commit to alex-ab/genode that referenced this issue Sep 17, 2024

vbox6: enable AES hardware instruction support

1305814

Issue genodelabs#5314

nfeske mentioned this issue Oct 4, 2024

Sculpt OS 24.10 #5356

Closed

nfeske pushed a commit that referenced this issue Oct 7, 2024

vm/x86: support extended fpu state transfer

ff506b0

Extend Genode's vCPU FPU state and adjust all users to copy at most FPU data they actually support. Issue #5314

nfeske pushed a commit that referenced this issue Oct 7, 2024

nova: handle invalid FPU guest state

a07b593

Makes the kernel robust against invalid guest FPU state provided by a VMM, e.g. our port of Vbox6. Issue #5314

nfeske pushed a commit that referenced this issue Oct 7, 2024

vbox6: enable AVX support

75266e4

Issue #5314

nfeske pushed a commit that referenced this issue Oct 7, 2024

vbox6: enable AES hardware instruction support

e5df8da

Issue #5314

cnuke added a commit to cnuke/genode that referenced this issue Oct 8, 2024

vbox6: enable RDRAND hardware instruction support

4903595

Issue genodelabs#5314

chelmuth pushed a commit that referenced this issue Oct 15, 2024

vbox6: enable RDRAND hardware instruction support

4084df6

Issue #5314

alex-ab added a commit to alex-ab/genode that referenced this issue Nov 13, 2024

nova: support resume on AVX CPUs

1428dc2

Issue genodelabs#5314

nfeske pushed a commit that referenced this issue Nov 13, 2024

nova: support resume on AVX CPUs

b17ec8e

Issue #5314

chelmuth pushed a commit that referenced this issue Nov 20, 2024

nova: support resume on AVX CPUs

28542e6

Issue #5314

alex-ab added a commit to alex-ab/genode that referenced this issue Nov 30, 2024

nova: avoid assertion during SC cleanup

d15009b

Fix regression introduced in Issue genodelabs#5314

alex-ab added a commit to alex-ab/genode that referenced this issue Nov 30, 2024

nova: avoid assertion during SC cleanup

a80cf35

Fix regression introduced in Issue genodelabs#5314

alex-ab added a commit to alex-ab/genode that referenced this issue Dec 2, 2024

nova: avoid assertion during SC cleanup

60d8f3b

Fix regression introduced in Issue genodelabs#5314

alex-ab added a commit to alex-ab/genode that referenced this issue Dec 2, 2024

nova: avoid assertion during SC cleanup

e93ca5f

Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391

alex-ab added a commit to alex-ab/genode that referenced this issue Dec 2, 2024

nova: avoid assertion during SC cleanup

6fac462

Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391

alex-ab added a commit to alex-ab/genode that referenced this issue Dec 2, 2024

nova: avoid assertion during SC cleanup

f617f0c

Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391

chelmuth pushed a commit to chelmuth/genode that referenced this issue Dec 2, 2024

nova: support resume on AVX CPUs

7bb3206

Issue genodelabs#5314

chelmuth pushed a commit that referenced this issue Dec 2, 2024

nova: avoid assertion during SC cleanup

d6effcd

Regression introduced in Issue #5314 Fixes #5391

chelmuth pushed a commit to chelmuth/genode that referenced this issue Dec 6, 2024

nova: avoid assertion during SC cleanup

c54635d

Regression introduced in Issue genodelabs#5314 Fixes genodelabs#5391

chelmuth pushed a commit that referenced this issue Dec 10, 2024

nova: avoid assertion during SC cleanup

e520dbb

Regression introduced in Issue #5314 Fixes #5391

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86: add support to use AVX* CPU features #5314

x86: add support to use AVX* CPU features #5314

alex-ab commented Aug 5, 2024 •

edited

Loading

alex-ab commented Aug 5, 2024

alex-ab commented Aug 5, 2024

alex-ab commented Aug 5, 2024 •

edited

Loading

chelmuth commented Aug 6, 2024

chelmuth commented Aug 13, 2024

alex-ab commented Sep 13, 2024

chelmuth commented Sep 16, 2024

alex-ab commented Sep 16, 2024 •

edited

Loading

alex-ab commented Sep 17, 2024

chelmuth commented Sep 17, 2024

cnuke commented Oct 8, 2024

chelmuth commented Oct 8, 2024

cnuke commented Oct 8, 2024

x86: add support to use AVX* CPU features #5314

x86: add support to use AVX* CPU features #5314

Comments

alex-ab commented Aug 5, 2024 • edited Loading

alex-ab commented Aug 5, 2024

alex-ab commented Aug 5, 2024

alex-ab commented Aug 5, 2024 • edited Loading

chelmuth commented Aug 6, 2024

chelmuth commented Aug 13, 2024

alex-ab commented Sep 13, 2024

chelmuth commented Sep 16, 2024

alex-ab commented Sep 16, 2024 • edited Loading

alex-ab commented Sep 17, 2024

chelmuth commented Sep 17, 2024

cnuke commented Oct 8, 2024

chelmuth commented Oct 8, 2024

cnuke commented Oct 8, 2024

alex-ab commented Aug 5, 2024 •

edited

Loading

alex-ab commented Aug 5, 2024 •

edited

Loading

alex-ab commented Sep 16, 2024 •

edited

Loading