Skip to content

Files

Latest commit

2c7c0c8 · Feb 27, 2025

History

History

matmul

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jan 21, 2025
Jan 21, 2025
Jan 22, 2025
Jan 21, 2025
Feb 6, 2025
Jan 21, 2025
Feb 20, 2025
Jan 27, 2025
Jan 21, 2025
Jan 21, 2025
Jan 22, 2025
Feb 18, 2025
Jan 21, 2025
Feb 27, 2025
Feb 20, 2025
Sep 26, 2024

About

This document contains information related to matrix-multiplication (matmul) micro kernels.

Naming

The naming of files has the convention below. Unless explicitly specified, arguments are mandatory. kai_matmul_<fused_ops>_<dst_info>_<lhs_info>_<rhs_info>_<mr x nr x kacc>_<technology>_<feature>_<instruction>.c

Syntax Description Example
fused_ops Optional info on applied fused operation like a clamping activation function. clamp
dst_info Destination matrix info. Similar to lhs_info
lhs_info LHS matrix data type and and packing info.
f32 - Floating-point 32-bit
q : Quantized
s : Symmetric
a : Asymmetric
i : Signed integer
u : Unsigned integer
4 : 4-bit Quantized
8 : 8-bit Quantized
dx : Per dimension quantization
cx : Per channel quantization
c32 : Per block quantization, with block length multiple of 32
scalef16 : Scale factors are stores as floating-point 16-bit
p : Matrix is packed
Example 1
qsi4cxp :
qs - Quantized symmetric
i4 - Signed Integer 4-bit
cx - Per channel quantized
p - packed
Some other examples :
s16s0 - Packing order of data is interleaved
s1s0 - Packing order of data is sequential
fp16 - Floating-point 16-bit data type
rhs_info Similar to lhs_info
mr x nr x kacc The outer loop calculates mr rows and nr columns. kacc is k-accumulations done per inner loop
technology Underlying technology. Arm® Neon™ neon
feature Arm architecture feature used dotprod, i8mm, sme2
instruction Instruction used. This is optional mla