Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS#5222
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS#5222martin-frbg merged 9 commits intoOpenMathLib:developfrom
Conversation
|
Hi @martin-frbg For a non-Apple CPU, the check should enter this part of get_coretype() (verified on QEMU). Here when the TARGET is set as ARMV8, gotoblas_ARMV9SME is NULL whereas when the TARGET is set to ARMV9SME, gotoblas_ARMV9SME is not NULL and hence the architecture initialization is successful. Please note that for compilation I am using the following command: Also, though the test is on QEMU, the SME sgemmdirect kernel will eventually have to run on a Qualcomm device as well. So I think we need to add support_sme1() check for 0x51 implementer ID here similar to the one added by you for Apple M4 |
|
The way this is supposed to work is that for Linux, it checks a variety of implementer and cpu IDs, and if none of them matches, it runs support_sme1() to see if it should return ARMV9SME. |
|
On QEMU, support_sme1() returns true which I verified using debug prints. I think the issue is somewhere in gotoblas->init returning null. Moreover, the check for (gotoblas && gotoblas->init) is true when the library is compiled with TARGET=ARMV9SME DYNAMIC_ARCH=1. It fails when TARGET=ARMV8 or ARMV8SVE , DYNMAIC_ARCH=1. I believe the init function maps to init_parameter() taken from the generated file setparam-ARMV9SME.c. This object (setparam-ARMV9SME.o) is getting generated in both the cases (ARMV8 and ARMV9SME). Not sure if I am missing something here .. :( |
|
Hi @martin-frbg Were you able to check on this issue? I tried to fix but without any luck. Please let me know if you figure out a solution. |
|
Unfortunately I'm still at the stage of building a kernel with SME support in a Debian VM under qemu (which is a lot slower than anticipated even on a fast x86_64). Wanted to try Arm FVP instead but did not quite figure out how to make that work |
|
Think I got it sorted now (after spending too much time in vain trying to get qemu with SME working on x86_64). |
|
Hi @martin-frbg , The issue of architecture init failure is fixed, but when we compile with |
|
Hmm. This appears to be due to a general flaw in the current implementation of the "direct" SGEMM code - USE_SGEMM_KERNEL_DIRECT is not actually available as a cpu-specific datum at runtime, as the gemm interface is only compiled once for the TARGET cpu. This was probably meant to be an "OR" conditional, instead of tying DYNAMIC_ARCH to it. (Another oddity is that the direct codepath is only available to CBLAS, and only when using C-style row-major order) |
This is sufficient to enable the SME version of the "small matrix SGEMM" kernel on Apple M4
Also added is commented-out code for recognizing the M4 as ARMV9SME - this is not yet useful except for testing, as
none of the ARMV8SVE kernels that the V9SME target builds upon support streaming SVE.