This document contains information related to matrix-multiplication (matmul) micro kernels.
The naming of files has the convention below. Unless explicitly specified, arguments are mandatory.
kai_matmul_<fused_ops>_<dst_info>_<lhs_info>_<rhs_info>_<mr x nr x kacc>_<technology>_<feature>_<instruction>.c
Syntax | Description | Example |
---|---|---|
fused_ops | Optional info on applied fused operation like a clamping activation function. | clamp |
dst_info | Destination matrix info. Similar to lhs_info | |
lhs_info | LHS matrix data type and and packing info. f32 - Floating-point 32-bit q : Quantized s : Symmetric a : Asymmetric i : Signed integer u : Unsigned integer 4 : 4-bit Quantized 8 : 8-bit Quantized dx : Per dimension quantization cx : Per channel quantization c32 : Per block quantization, with block length multiple of 32 scalef16 : Scale factors are stores as floating-point 16-bit p : Matrix is packed |
Example 1 qsi4cxp : qs - Quantized symmetric i4 - Signed Integer 4-bit cx - Per channel quantized p - packed Some other examples : s16s0 - Packing order of data is interleaved s1s0 - Packing order of data is sequential fp16 - Floating-point 16-bit data type |
rhs_info | Similar to lhs_info | |
mr x nr x kacc | The outer loop calculates mr rows and nr columns. kacc is k-accumulations done per inner loop | |
technology | Underlying technology. Arm® Neon™ | neon |
feature | Arm architecture feature used | dotprod, i8mm, sme2 |
instruction | Instruction used. This is optional | mla |