Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uarch support for AWS r7iz #296

Open
WilliamTambellini opened this issue Feb 7, 2023 · 5 comments
Open

Add uarch support for AWS r7iz #296

WilliamTambellini opened this issue Feb 7, 2023 · 5 comments

Comments

@WilliamTambellini
Copy link
Contributor

WilliamTambellini commented Feb 7, 2023

[wtambellini@r7iz ~]$ cpu_features-9613390/bin/list_cpu_features 
arch            : x86
brand           : Intel(R) Xeon(R) Processor
family          :   6 (0x06)
model           : 143 (0x8F)
stepping        :   3 (0x03)
uarch           : X86_UNKNOWN
flags           : aes,avx,avx2,avx512bitalg,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vbmi2,avx512vl,avx512vnni,avx512vpopcntdq,bmi1,bmi2,cx16,erms,f16c,fma3,movbe,popcnt,rdrnd,sha,sse4_1,sse4_2,ssse3,vpclmulqdq

lscpu

[wtambellini@r7iz ~]$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               143
Model name:          Intel(R) Xeon(R) Processor
Stepping:            3
CPU MHz:             3667.317
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           48K
L1i cache:           32K
L2 cache:            2048K
L3 cache:            61440K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd ida arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear serialize flush_l1d arch_capabilities

https://aws.amazon.com/ec2/instance-types/r7iz/

@toor1245
Copy link
Contributor

@WilliamTambellini, could you provide cpuid dump via cpuid -r -1?

@toor1245
Copy link
Contributor

@WilliamTambellini did you test with the latest commit? Since we already have Sapphire_Rapids detection:
https://github.com/google/cpu_features/blob/main/src/impl_x86__base_implementation.inl#L618

also, we added detection movdiri movdir64b:
https://github.com/google/cpu_features/blob/main/src/impl_x86__base_implementation.inl#L385-L386

@WilliamTambellini
Copy link
Contributor Author

Hi @toor1245
Voila, retested with todays master on r7iz.2xl and

[wtambellini@ahuang-fc37 Release]$ ./list_cpu_features 
arch            : x86
brand           : Intel(R) Xeon(R) Processor
family          :   6 (0x06)
model           : 143 (0x8F)
stepping        :   7 (0x07)
uarch           : INTEL_SPR
flags           : adx,aes,avx,avx2,avx512_bf16,avx512_second_fma,avx512bitalg,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vbmi2,avx512vl,avx512vnni,avx512vpopcntdq,avx_vnni,bmi1,bmi2,clflushopt,clfsh,clwb,cx16,cx8,erms,f16c,fma3,fpu,gfni,lzcnt,mmx,movbe,movdir64b,movdiri,pclmulqdq,popcnt,rdrnd,rdseed,sha,ss,sse,sse2,sse3,sse4_1,sse4_2,ssse3,tsc,vaes,vpclmulqdq
cache_info      : {"level":1,"cache_type":"data","cache_size":49152,"ways":12,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":1,"cache_type":"instruction","cache_size":32768,"ways":8,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":2,"cache_type":"unified","cache_size":2097152,"ways":16,"line_size":64,"tlb_entries":2048,"partitioning":1},{"level":3,"cache_type":"unified","cache_size":62914560,"ways":15,"line_size":64,"tlb_entries":65536,"partitioning":1}

The uarch is now good but no amx flags.
Have you added the code to detect amx flags ?
Best
W.

@toor1245
Copy link
Contributor

toor1245 commented Mar 17, 2023

@WilliamTambellini, yes, we have amx flags detection

https://github.com/google/cpu_features/blob/main/src/impl_x86__base_implementation.inl#L228-L232
image

https://github.com/google/cpu_features/blob/main/src/impl_x86__base_implementation.inl#L447-L449
image

we don't have cpuid dump for Sapphire Rapids, could you send dump to check this flags via cpuid -r -1 command or /proc/cpuinfo

@skaiware
Copy link

Tks @toor1245
AWS (atm) disables AMX on VMs so that s why there is no amx flags in the output.
Here is the cpuid output:

[wtambellini@r7iz2xl ~]$ cpuid -r -1
CPU:
   0x00000000 0x00: eax=0x0000001f ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x00000001 0x00: eax=0x000806f7 ebx=0x00080800 ecx=0xfffab20b edx=0x1f8bfbff
   0x00000002 0x00: eax=0x00feff01 ebx=0x000000f0 ecx=0x00000000 edx=0x00000000
   0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000004 0x00: eax=0x0c004121 ebx=0x02c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x01: eax=0x0c004122 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x02: eax=0x0c004143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
   0x00000004 0x03: eax=0x0c01c163 ebx=0x0380003f ecx=0x0000ffff edx=0x00000004
   0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x00001020
   0x00000006 0x00: eax=0x00000006 ebx=0x00000000 ecx=0x00000001 edx=0x00000000
   0x00000007 0x00: eax=0x00000001 ebx=0xf1bf07ab ecx=0x1a407f7e edx=0xbc004400
   0x00000007 0x01: eax=0x00000030 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000a 0x00: eax=0x08300805 ebx=0x00000000 ecx=0x00000005 edx=0x00008600
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x0000000b 0x01: eax=0x00000003 ebx=0x00000008 ecx=0x00000201 edx=0x00000000
   0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000000
   0x0000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x00: eax=0x000002e7 ebx=0x00000a88 ecx=0x00000a88 edx=0x00000000
   0x0000000d 0x01: eax=0x0000000f ebx=0x00000988 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x02: eax=0x00000100 ebx=0x00000240 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x05: eax=0x00000040 ebx=0x00000440 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x06: eax=0x00000200 ebx=0x00000480 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x07: eax=0x00000400 ebx=0x00000680 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x09: eax=0x00000008 ebx=0x00000a80 ecx=0x00000000 edx=0x00000000
   0x0000000e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000f 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000010 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000011 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000012 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000012 0x01: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000012 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000013 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000014 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000015 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000016 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000017 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000018 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000019 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001b 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001b 0x01: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001d 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001f 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x0000001f 0x01: eax=0x00000003 ebx=0x00000008 ecx=0x00000201 edx=0x00000000
   0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000000
   0x20000000 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x40000000 0x00: eax=0x40000010 ebx=0x4b4d564b ecx=0x564b4d56 edx=0x0000004d
   0x40000001 0x00: eax=0x0100807b ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000002 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000004 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000b 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000d 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000f 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000010 0x00: eax=0x002dc6c0 ebx=0x000f4240 ecx=0x00000000 edx=0x00000000
   0x40000100 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x80000000 0x00: eax=0x80000008 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000121 edx=0x2c100800
   0x80000002 0x00: eax=0x65746e49 ebx=0x2952286c ecx=0x6f655820 edx=0x2952286e
   0x80000003 0x00: eax=0x6f725020 ebx=0x73736563 ecx=0x0000726f edx=0x00000000
   0x80000004 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x08007040 edx=0x00000000
   0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
   0x80000008 0x00: eax=0x00003030 ebx=0x00000200 ecx=0x00000000 edx=0x00000000
   0x80860000 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0xc0000000 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
[wtambellini@r7iz2xl ~]$ cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 143
model name	: Intel(R) Xeon(R) Processor
stepping	: 7
microcode	: 0x2a000080
cpu MHz		: 3835.538
cache size	: 61440 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 31
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd ida arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear serialize flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb
bogomips	: 6000.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants