Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86: add support to use AVX* CPU features #5314

Open
4 of 7 tasks
alex-ab opened this issue Aug 5, 2024 · 13 comments
Open
4 of 7 tasks

x86: add support to use AVX* CPU features #5314

alex-ab opened this issue Aug 5, 2024 · 13 comments
Labels

Comments

@alex-ab
Copy link
Member

alex-ab commented Aug 5, 2024

The various AVX FPU extensions for x86 CPUs can be used for various media centered and/or in general mathematical optimized work load (beside GPUs). The feature is nowadays common across all relevant CPU vendors in various extensions (AVX, AVX2, AVX512). Especially in the context of the VM, an enablement may improve runtime and/or CPU usage of guest applications, which are capable of using these FPU extensions. Let us enable it.

Steps to work on respectively consider:

  • nova kernel support
  • base-hw kernel support
  • other kernel support
  • VM session adaptations, e.g. storing/loading more FPU state, size varies depending on host features
  • Seoul VMM support
  • VBox6 VMM support
  • extended Genode framework support, e.g. compiler switches, where appropriate store/load more FPU state
@alex-ab alex-ab added the feature label Aug 5, 2024
alex-ab added a commit to alex-ab/genode that referenced this issue Aug 5, 2024
Add extended FPU state detection and handling (via xsave and friends) to the
kernel, which has to store/load more FPU state (~512 -> 2k++) during context
switching of threads. Additional the referenced nova branch contains various
optimization during VM destruction and cross core IPC resource caching.

The FPU work is based upon upstream NOVA kernel commits and Hedron sources.

Issue genodelabs#5314
Fixes genodelabs#3914
alex-ab added a commit to alex-ab/genode-world that referenced this issue Aug 5, 2024
Add preliminary support. Tested on Sculpt 24.04 with nova
kernel on AMD and Intel in a Debian 12 VM.

genodelabs/genode#5314
@alex-ab
Copy link
Member Author

alex-ab commented Aug 5, 2024

I enabled the support for the NOVA kernel, by porting relevant former work to our version, and managed to enable the support for the Seoul VMM on AMD and Intel machines. If all works out, Linux reports something along the lines:

[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point regi
[init -> seoul] VMM: # |   sters'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[init -> seoul] VMM: #   [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[init -> seoul] VMM: #   [    0.000000] x86/fpu: xstate_offset[5]:  832, xstate_sizes[5]:   64
[init -> seoul] VMM: #   [    0.000000] x86/fpu: xstate_offset[6]:  896, xstate_sizes[6]:  512
[init -> seoul] VMM: #   [    0.000000] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
[init -> seoul] VMM: #   [    0.000000] x86/fpu: Enabled xstate features 0xe7, context size is 2432 bytes
[init -> seoul] VMM: # |   , using 'compacted' format.

Additionally, during testing I found the following tool very helpful, in order to detect the correct working and that indeed all variants of AVX are enabled and working, https://github.com/travisdowns/avx-turbo.git. Additionally it measures the maximal operation per seconds which are doable.

The tool output from within a VM without AVX support reports:

CPUID highest leaf    : [ dh]
Running as root       : [NO ]
MSR reads supported   : [NO ]
CPU pinning enabled   : [YES]
CPU supports zeroupper: [NO ]
CPU supports AVX2     : [NO ]
CPU supports AVX-512F : [NO ]
CPU supports AVX-512VL: [NO ]
CPU supports AVX-512BW: [NO ]
CPU supports AVX-512CD: [NO ]
CPUID doesn't support leaf 0x15, falling back to manual TSC calibration.
tsc_freq = 2995.2 MHz (from calibration loop)
CPU brand string: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
2 available CPUs: [0, 1]
Can't use cpuid leaf 0xb to filter out hyperthreads, CPU too old or AMD
2 physical cores: [0, 1]
Will test up to 2 CPUs
Cores | ID          | Description         | OVRLP3 | Mops
1     | pause_only  | pause instruction   |  1.000 | 1649
1     | scalar_iadd | Scalar integer adds |  1.000 | 4290

Cores | ID          | Description         | OVRLP3 |       Mops
2     | pause_only  | pause instruction   |  1.000 | 2829, 2840
2     | scalar_iadd | Scalar integer adds |  1.000 | 3884, 3873

And with AVX enabled:

PUID highest leaf    : [ dh]
Running as root       : [NO ]
MSR reads supported   : [NO ]
CPU pinning enabled   : [YES]
CPU supports zeroupper: [YES]
CPU supports AVX2     : [YES]
CPU supports AVX-512F : [YES]
CPU supports AVX-512VL: [YES]
CPU supports AVX-512BW: [YES]
CPU supports AVX-512CD: [YES]
CPUID doesn't support leaf 0x15, falling back to manual TSC calibration.
tsc_freq = 2995.2 MHz (from calibration loop)
CPU brand string: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
2 available CPUs: [0, 1]
Can't use cpuid leaf 0xb to filter out hyperthreads, CPU too old or AMD
2 physical cores: [0, 1]
Will test up to 2 CPUs
Cores | ID                  | Description                       | OVRLP3 |  Mops
1     | pause_only          | pause instruction                 |  1.000 |  1649
1     | ucomis_clean        | scalar ucomis (w/ vzeroupper)     |  1.000 |  1065
1     | ucomis_dirty        | scalar ucomis (no vzeroupper)     |  1.000 |  1065
1     | scalar_iadd         | Scalar integer adds               |  1.000 |  4290
1     | avx128_iadd         | 128-bit integer serial adds       |  1.000 |  4290
1     | avx256_iadd         | 256-bit integer serial adds       |  1.000 |  4290
1     | avx512_iadd         | 512-bit integer serial adds       |  1.000 |  4290
1     | avx128_iadd16       | 128-bit integer serial adds zmm16 |  1.000 |  4290
1     | avx256_iadd16       | 256-bit integer serial adds zmm16 |  1.000 |  4291
1     | avx512_iadd16       | 512-bit integer serial adds zmm16 |  1.000 |  4290
1     | avx128_iadd_t       | 128-bit integer parallel adds     |  1.000 | 12870
1     | avx256_iadd_t       | 256-bit integer parallel adds     |  1.000 | 12870
1     | avx128_xor_zero     | 128-bit zeroing xor               |  1.000 | 21236
1     | avx256_xor_zero     | 256-bit zeroing xor               |  1.000 | 21240
1     | avx512_xor_zero     | 512-bit zeroing xord              |  1.000 | 21231
1     | avx128_mov_sparse   | 128-bit reg-reg mov               |  1.000 |  4290
1     | avx256_mov_sparse   | 256-bit reg-reg mov               |  1.000 |  4290
1     | avx512_mov_sparse   | 512-bit reg-reg mov               |  1.000 |  4291
1     | avx128_merge_sparse | 128-bit reg-reg merge mov         |  1.000 |  4290
1     | avx256_merge_sparse | 256-bit reg-reg merge mov         |  1.000 |  4290
1     | avx512_merge_sparse | 512-bit reg-reg merge mov         |  1.000 |  4290
1     | avx128_vshift       | 128-bit variable shift (vpsrlvd)  |  1.000 |  4290
1     | avx256_vshift       | 256-bit variable shift (vpsrlvd)  |  1.000 |  4290
1     | avx512_vshift       | 512-bit variable shift (vpsrlvd)  |  1.000 |  4290
1     | avx128_vshift_t     | 128-bit variable shift (vpsrlvd)  |  1.000 |  8580
1     | avx256_vshift_t     | 256-bit variable shift (vpsrlvd)  |  1.000 |  8579
1     | avx512_vshift_t     | 512-bit variable shift (vpsrlvd)  |  1.000 |  4290
1     | avx128_vlzcnt       | 128-bit lzcnt (vplzcntd)          |  1.000 |  1073
1     | avx256_vlzcnt       | 256-bit lzcnt (vplzcntd)          |  1.000 |  1073
1     | avx512_vlzcnt       | 512-bit lzcnt (vplzcntd)          |  1.000 |  1073
1     | avx128_vlzcnt_t     | 128-bit lzcnt (vplzcntd)          |  1.000 |  8581
1     | avx256_vlzcnt_t     | 256-bit lzcnt (vplzcntd)          |  1.000 |  8579
1     | avx512_vlzcnt_t     | 512-bit lzcnt (vplzcntd)          |  1.000 |  4290
1     | avx128_imul         | 128-bit integer muls (vpmuldq)    |  1.000 |   858
1     | avx256_imul         | 256-bit integer muls (vpmuldq)    |  1.000 |   858
1     | avx512_imul         | 512-bit integer muls (vpmuldq)    |  1.000 |   858
1     | avx128_fma_sparse   | 128-bit 64-bit sparse FMAs        |  1.000 |  4290
1     | avx256_fma_sparse   | 256-bit 64-bit sparse FMAs        |  1.000 |  4290
1     | avx512_fma_sparse   | 512-bit 64-bit sparse FMAs        |  1.000 |  4290
1     | avx128_fma          | 128-bit serial DP FMAs            |  1.000 |  1073
1     | avx256_fma          | 256-bit serial DP FMAs            |  1.000 |  1073
1     | avx512_fma          | 512-bit serial DP FMAs            |  1.000 |  1073
1     | avx128_fma_t        | 128-bit parallel DP FMAs          |  1.000 |  8579
1     | avx256_fma_t        | 256-bit parallel DP FMAs          |  1.000 |  8580
1     | avx512_fma_t        | 512-bit parallel DP FMAs          |  1.000 |  4290
1     | avx512_vpermw       | 512-bit serial WORD permute       |  1.000 |  1073
1     | avx512_vpermw_t     | 512-bit parallel WORD permute     |  1.000 |  4290
1     | avx512_vpermd       | 512-bit serial DWORD permute      |  1.000 |  1430
1     | avx512_vpermd_t     | 512-bit parallel DWORD permute    |  1.000 |  4290

Cores | ID                  | Description                       | OVRLP3 |         Mops
2     | pause_only          | pause instruction                 |  1.000 |   2830, 2862
2     | ucomis_clean        | scalar ucomis (w/ vzeroupper)     |  1.000 |   1047, 1047
2     | ucomis_dirty        | scalar ucomis (no vzeroupper)     |  1.000 |   1047, 1046
2     | scalar_iadd         | Scalar integer adds               |  1.000 |   3878, 3884
2     | avx128_iadd         | 128-bit integer serial adds       |  1.000 |   3737, 3742
2     | avx256_iadd         | 256-bit integer serial adds       |  1.000 |   3737, 3746
2     | avx512_iadd         | 512-bit integer serial adds       |  1.000 |   3900, 3900
2     | avx128_iadd16       | 128-bit integer serial adds zmm16 |  1.000 |   3746, 3738
2     | avx256_iadd16       | 256-bit integer serial adds zmm16 |  1.000 |   3735, 3744
2     | avx512_iadd16       | 512-bit integer serial adds zmm16 |  1.000 |   3922, 3919
2     | avx128_iadd_t       | 128-bit integer parallel adds     |  1.000 |   6433, 6434
2     | avx256_iadd_t       | 256-bit integer parallel adds     |  1.000 |   6446, 6440
2     | avx128_xor_zero     | 128-bit zeroing xor               |  1.000 | 10619, 10615
2     | avx256_xor_zero     | 256-bit zeroing xor               |  1.000 | 10608, 10619
2     | avx512_xor_zero     | 512-bit zeroing xord              |  1.000 | 10597, 10613
2     | avx128_mov_sparse   | 128-bit reg-reg mov               |  1.000 |   3873, 3878
2     | avx256_mov_sparse   | 256-bit reg-reg mov               |  1.000 |   3871, 3884
2     | avx512_mov_sparse   | 512-bit reg-reg mov               |  1.000 |   3879, 3874
2     | avx128_merge_sparse | 128-bit reg-reg merge mov         |  1.000 |   3877, 3879
2     | avx256_merge_sparse | 256-bit reg-reg merge mov         |  1.000 |   3878, 3877
2     | avx512_merge_sparse | 512-bit reg-reg merge mov         |  1.000 |   3879, 3878
2     | avx128_vshift       | 128-bit variable shift (vpsrlvd)  |  1.000 |   3914, 3915
2     | avx256_vshift       | 256-bit variable shift (vpsrlvd)  |  1.000 |   3915, 3917
2     | avx512_vshift       | 512-bit variable shift (vpsrlvd)  |  1.000 |   2095, 2095
2     | avx128_vshift_t     | 128-bit variable shift (vpsrlvd)  |  1.000 |   4292, 4293
2     | avx256_vshift_t     | 256-bit variable shift (vpsrlvd)  |  1.000 |   4284, 4291
2     | avx512_vshift_t     | 512-bit variable shift (vpsrlvd)  |  1.000 |   2090, 2091
2     | avx128_vlzcnt       | 128-bit lzcnt (vplzcntd)          |  1.000 |   1072, 1072
2     | avx256_vlzcnt       | 256-bit lzcnt (vplzcntd)          |  1.000 |   1072, 1072
2     | avx512_vlzcnt       | 512-bit lzcnt (vplzcntd)          |  1.000 |   1072, 1072
2     | avx128_vlzcnt_t     | 128-bit lzcnt (vplzcntd)          |  1.000 |   4299, 4295
2     | avx256_vlzcnt_t     | 256-bit lzcnt (vplzcntd)          |  1.000 |   4287, 4307
2     | avx512_vlzcnt_t     | 512-bit lzcnt (vplzcntd)          |  1.000 |   2089, 2092
2     | avx128_imul         | 128-bit integer muls (vpmuldq)    |  1.000 |    858,  858
2     | avx256_imul         | 256-bit integer muls (vpmuldq)    |  1.000 |    858,  858
2     | avx512_imul         | 512-bit integer muls (vpmuldq)    |  1.000 |    858,  858
2     | avx128_fma_sparse   | 128-bit 64-bit sparse FMAs        |  1.000 |   3877, 3877
2     | avx256_fma_sparse   | 256-bit 64-bit sparse FMAs        |  1.000 |   3880, 3878
2     | avx512_fma_sparse   | 512-bit 64-bit sparse FMAs        |  1.000 |   3877, 3874
2     | avx128_fma          | 128-bit serial DP FMAs            |  1.000 |   1072, 1072
2     | avx256_fma          | 256-bit serial DP FMAs            |  1.000 |   1072, 1072
2     | avx512_fma          | 512-bit serial DP FMAs            |  1.000 |   1072, 1072
2     | avx128_fma_t        | 128-bit parallel DP FMAs          |  1.000 |   4293, 4280
2     | avx256_fma_t        | 256-bit parallel DP FMAs          |  1.000 |   4285, 4294
2     | avx512_fma_t        | 512-bit parallel DP FMAs          |  1.000 |   2089, 2091
2     | avx512_vpermw       | 512-bit serial WORD permute       |  1.000 |   1069, 1069
2     | avx512_vpermw_t     | 512-bit parallel WORD permute     |  1.000 |   2145, 2146
2     | avx512_vpermd       | 512-bit serial DWORD permute      |  1.000 |   1430, 1430
2     | avx512_vpermd_t     | 512-bit parallel DWORD permute    |  1.000 |   2149, 2142

Additionally, I used on a U4711 notebook a Debian 12 VM with Firefox the MotionMark 1.3.1 from browserbench.org, as already used by @jschlatow during his browser performance analysis on Genodians.org. Even so the results are not very stable and fluctuate, it looks as it seems to have a positive effect, best results:

w/o  AVX, but with SSE*:  23.74 @ 60fps +- 224.84 %
with AVX commits       : 167.28 @ 60fps +-  29.97 %

@alex-ab
Copy link
Member Author

alex-ab commented Aug 5, 2024

Additionally, I downloaded a video from jellyfish, https://repo.jellyfin.org/jellyfish/jellyfish-30-mbps-hd-h264.mkv, and used ffmpeg to transcode the file, in order to see some impact. The both files are attached, and the diff of the output is below. Some improvements are visible.

ffmpeg -benchmark -i jellyfish-30-mbps-hd-h264.mkv -c:v libx265 -preset medium -crf 20 -c:a copy jellyfish-30-mbps-hd-h265-crf20.mkv

x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
encoded 900 frames in 300.06s (3.00 fps), 11242.09 kb/s, Avg QP:24.48

x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3
encoded 900 frames in 243.94s (3.69 fps), 11242.09 kb/s, Avg QP:24.48

sse_ffmpeg_30.txt
avx_ffmpeg_30.txt

@alex-ab
Copy link
Member Author

alex-ab commented Aug 5, 2024

Another test from the Phoronix test suite, e.g. Bosphorus, manually executed (so not using the test suite), shows following results. The traces and command invocation are part of the attached log files.

x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2
encoded 600 frames in 69.54s (8.63 fps), 1271.47 kb/s, Avg QP:33.68

x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3
encoded 600 frames in 52.36s (11.46 fps), 1271.47 kb/s, Avg QP:33.68

sse_x265_Bosphorus_1920x1080.txt
avx_x265_Bosphorus_1920x1080.txt

chelmuth pushed a commit that referenced this issue Aug 6, 2024
Add extended FPU state detection and handling (via xsave and friends) to the
kernel, which has to store/load more FPU state (~512 -> 2k++) during context
switching of threads. Additional the referenced nova branch contains various
optimization during VM destruction and cross core IPC resource caching.

The FPU work is based upon upstream NOVA kernel commits and Hedron sources.

Issue #5314
Fixes #3914
chelmuth pushed a commit to genodelabs/genode-world that referenced this issue Aug 6, 2024
Add preliminary support. Tested on Sculpt 24.04 with nova
kernel on AMD and Intel in a Debian 12 VM.

genodelabs/genode#5314
@chelmuth
Copy link
Member

chelmuth commented Aug 6, 2024

Merged f5a9d5e and genodelabs/genode-world@eee31d2 to staging.

alex-ab added a commit to alex-ab/genode that referenced this issue Aug 9, 2024
some more adjustments are needed for xsave support, but this port is scheduled
to be removed. Just disable xsave for the time being to make nightly test
happy.

Issue genodelabs#5314
alex-ab added a commit to alex-ab/genode that referenced this issue Aug 9, 2024
Add extended FPU state detection and handling (via xsave and friends) to the
kernel, which has to store/load more FPU state (~512 -> 2k++) during context
switching of threads. Additional the referenced nova branch contains various
optimization during VM destruction and cross core IPC resource caching.

This FPU work is based upon upstream NOVA kernel and Hedron commits.

Issue genodelabs#5314
Fixes genodelabs#3914
chelmuth pushed a commit that referenced this issue Aug 12, 2024
some more adjustments are needed for xsave support, but this port is scheduled
to be removed. Just disable xsave for the time being to make nightly test
happy.

Issue #5314
chelmuth pushed a commit that referenced this issue Aug 12, 2024
Add extended FPU state detection and handling (via xsave and friends) to the
kernel, which has to store/load more FPU state (~512 -> 2k++) during context
switching of threads. Additional the referenced nova branch contains various
optimization during VM destruction and cross core IPC resource caching.

This FPU work is based upon upstream NOVA kernel and Hedron commits.

Issue #5314
Fixes #3914
chelmuth pushed a commit to genodelabs/genode-world that referenced this issue Aug 12, 2024
Add preliminary support. Tested on Sculpt 24.04 with nova
kernel on AMD and Intel in a Debian 12 VM.

genodelabs/genode#5314
chelmuth added a commit that referenced this issue Aug 13, 2024
@chelmuth
Copy link
Member

depot_autopilot/test-pthread failed last night with #UD on x86_64.

[2024-08-13 03:30:11] [init -> depot_autopilot] 1.308 [init -> test-pthread] main thread: start PTHREAD_MUTEX_NORMAL stress test
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.305', cpu 2, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.310', cpu 5, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.307', cpu 6, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.311', cpu 7, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.309', cpu 3, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.308', cpu 1, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:11] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.306', cpu 4, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:30:15] Warning: unresolvable exception 6, pd 'init -> dynamic -> test-pthread -> test-pthread', thread 'pthread.303', cpu 7, ip=0x78a53 sp=0x405fed80 bp=0x898e0 no signal handler
[2024-08-13 03:31:40] [init -> depot_autopilot] 
[2024-08-13 03:31:40] [init -> depot_autopilot]  test-pthread                    failed    89.987  timeout 90 sec

Same occurred with AVX patches from 2024-08-06 at 2024-08-07 03:51:53.

alex-ab pushed a commit to alex-ab/genode that referenced this issue Aug 13, 2024
alex-ab added a commit to alex-ab/genode that referenced this issue Aug 13, 2024
Add extended FPU state detection and handling (via xsave and friends) to the
kernel, which has to store/load more FPU state (~512 -> 2k++) during context
switching of threads. Additional the referenced nova branch contains various
optimization during VM destruction and cross core IPC resource caching.

This FPU work is based upon upstream NOVA kernel and Hedron commits.

Issue genodelabs#5314
Fixes genodelabs#3914
chelmuth pushed a commit that referenced this issue Aug 27, 2024
some more adjustments are needed for xsave support, but this port is scheduled
to be removed. Just disable xsave for the time being to make nightly test
happy.

Issue #5314
chelmuth pushed a commit that referenced this issue Aug 27, 2024
Add extended FPU state detection and handling (via xsave and friends) to the
kernel, which has to store/load more FPU state (~512 -> 2k++) during context
switching of threads. Additional the referenced nova branch contains various
optimization during VM destruction and cross core IPC resource caching.

This FPU work is based upon upstream NOVA kernel and Hedron commits.

Issue #5314
Fixes #3914
chelmuth pushed a commit to genodelabs/genode-world that referenced this issue Aug 27, 2024
Add preliminary support. Tested on Sculpt 24.04 with nova
kernel on AMD and Intel in a Debian 12 VM.

genodelabs/genode#5314
alex-ab added a commit to alex-ab/genode that referenced this issue Sep 13, 2024
Extend Genode's vCPU FPU state and adjust all users to copy
at most FPU data they actually support.

Issue genodelabs#5314
alex-ab added a commit to alex-ab/genode that referenced this issue Sep 13, 2024
Makes the kernel robust against invalid guest FPU state provided by a VMM,
e.g. our port of Vbox6.

Issue genodelabs#5314
alex-ab added a commit to alex-ab/genode that referenced this issue Sep 13, 2024
@alex-ab
Copy link
Member Author

alex-ab commented Sep 13, 2024

I added the commits to get AVX working with vbox6, tested with a debian, ubuntu and win10 VM on a modular sculpt.

@chelmuth
Copy link
Member

@alex-ab would you mind to record the remaining problems with avx-turbo in this issue? I agree that we don't have to fix them if they are specific to the use of the tool only and don't happen in real scenarios.

chelmuth pushed a commit that referenced this issue Sep 16, 2024
Extend Genode's vCPU FPU state and adjust all users to copy
at most FPU data they actually support.

Issue #5314
chelmuth pushed a commit that referenced this issue Sep 16, 2024
Makes the kernel robust against invalid guest FPU state provided by a VMM,
e.g. our port of Vbox6.

Issue #5314
chelmuth pushed a commit that referenced this issue Sep 16, 2024
@alex-ab
Copy link
Member Author

alex-ab commented Sep 16, 2024

@alex-ab would you mind to record the remaining problems with avx-turbo in this issue? I agree that we don't have to fix them if they are specific to the use of the tool only and don't happen in real scenarios.

I found the issue with the test. It divides on TSC frequency calculation by 0 which fails. I added a patch for in vbox6 usage. Instead of reading out the frequency (which is not provided by vbox6), it measures it and then the whole AVX test works.

avx_turbo_tsc_calc.txt

--- a/tsc-support.cpp
+++ b/tsc-support.cpp
@@ -41,7 +41,8 @@ uint64_t get_tsc_from_cpuid_inner() {
 
 
     if (family.family == 6) {
-        if (family.model == 0x4E || family.model == 0x5E || family.model == 0x8E || family.model == 0x9E) {
+        printf("%s:%u division by %u is not good !!!\n", __func__, __LINE__, cpuid15.eax);
+        if (cpuid15.eax && (family.model == 0x4E || family.model == 0x5E || family.model == 0x8E || family.model == 0x9E)) {
             // skylake client or kabylake
             return (int64_t)24000000 * cpuid15.ebx / cpuid15.eax; // 24 MHz crystal clock
         }

@alex-ab
Copy link
Member Author

alex-ab commented Sep 17, 2024

@chelmuth: please add the fixup and the aes commit to staging from my staging branch

@chelmuth
Copy link
Member

Thanks, merged to staging.

@nfeske nfeske mentioned this issue Oct 4, 2024
nfeske pushed a commit that referenced this issue Oct 7, 2024
Extend Genode's vCPU FPU state and adjust all users to copy
at most FPU data they actually support.

Issue #5314
nfeske pushed a commit that referenced this issue Oct 7, 2024
Makes the kernel robust against invalid guest FPU state provided by a VMM,
e.g. our port of Vbox6.

Issue #5314
nfeske pushed a commit that referenced this issue Oct 7, 2024
nfeske pushed a commit that referenced this issue Oct 7, 2024
cnuke added a commit to cnuke/genode that referenced this issue Oct 8, 2024
@cnuke
Copy link
Member

cnuke commented Oct 8, 2024

FWIW, 4903595 enables RDRAND and RDSEED. I've been using the commit for some time now w/o any noticeable problems.

@chelmuth
Copy link
Member

chelmuth commented Oct 8, 2024

Can you report any positive performance (or other) impact?

@cnuke
Copy link
Member

cnuke commented Oct 8, 2024

@chelmuth well, I did not perform any testing so I cannot comment either way (especially as I have not enabled them in isolation, i.e. AVX/AES was already enabled and could skew the results).

alex-ab added a commit to alex-ab/genode that referenced this issue Nov 13, 2024
nfeske pushed a commit that referenced this issue Nov 13, 2024
chelmuth pushed a commit that referenced this issue Nov 20, 2024
alex-ab added a commit to alex-ab/genode that referenced this issue Nov 30, 2024
Fix regression introduced in Issue genodelabs#5314
alex-ab added a commit to alex-ab/genode that referenced this issue Nov 30, 2024
Fix regression introduced in Issue genodelabs#5314
alex-ab added a commit to alex-ab/genode that referenced this issue Dec 2, 2024
Fix regression introduced in Issue genodelabs#5314
alex-ab added a commit to alex-ab/genode that referenced this issue Dec 2, 2024
alex-ab added a commit to alex-ab/genode that referenced this issue Dec 2, 2024
alex-ab added a commit to alex-ab/genode that referenced this issue Dec 2, 2024
chelmuth pushed a commit to chelmuth/genode that referenced this issue Dec 2, 2024
chelmuth pushed a commit that referenced this issue Dec 2, 2024
Regression introduced in Issue #5314

Fixes #5391
chelmuth pushed a commit to chelmuth/genode that referenced this issue Dec 6, 2024
chelmuth pushed a commit that referenced this issue Dec 10, 2024
Regression introduced in Issue #5314

Fixes #5391
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants