matmul

Change type of variable in rhs packing kernels.

Feb 27, 2025

2c7c0c8 · Feb 27, 2025

Name	Name	Last commit message	Last commit date
parent directory ..
matmul_clamp_f16_bf16p_bf16p	matmul_clamp_f16_bf16p_bf16p	Disable overlength strings compiler diagnostic in-code	Jan 21, 2025
matmul_clamp_f16_f16_f16p	matmul_clamp_f16_f16_f16p	Disable overlength strings compiler diagnostic in-code	Jan 21, 2025
matmul_clamp_f16_f16p_f16p	matmul_clamp_f16_f16p_f16p	Resolve missing C function prototypes	Jan 22, 2025
matmul_clamp_f32_bf16p_bf16p	matmul_clamp_f32_bf16p_bf16p	Disable overlength strings compiler diagnostic in-code	Jan 21, 2025
matmul_clamp_f32_f32_f32p	matmul_clamp_f32_f32_f32p	Fix the lhs offset name in the interface of F32 <- F32 x F32p	Feb 6, 2025
matmul_clamp_f32_f32p_f32p	matmul_clamp_f32_f32p_f32p	Disable overlength strings compiler diagnostic in-code	Jan 21, 2025
matmul_clamp_f32_qai8dxp_qsi4c32p	matmul_clamp_f32_qai8dxp_qsi4c32p	Optimize F32 <- QAI8DXP (LHS) x QSI4C32P (RHS) for 4x8 i8mm	Feb 20, 2025
matmul_clamp_f32_qai8dxp_qsi4cxp	matmul_clamp_f32_qai8dxp_qsi4cxp	Fix for Int4 per-channel SME GEMM kernel failing with n > 64	Jan 27, 2025
matmul_clamp_f32_qai8dxp_qsi8cxp	matmul_clamp_f32_qai8dxp_qsi8cxp	Disable overlength strings compiler diagnostic in-code	Jan 21, 2025
matmul_clamp_f32_qsi8d32p_qsi4c32p	matmul_clamp_f32_qsi8d32p_qsi4c32p	Disable overlength strings compiler diagnostic in-code	Jan 21, 2025
matmul_clamp_fp32_bf16p_bf16p	matmul_clamp_fp32_bf16p_bf16p	Resolve missing C function prototypes	Jan 22, 2025
matmul_clamp_qai8_qai8_qsi8cxp	matmul_clamp_qai8_qai8_qsi8cxp	Remove unused lhs_stride argument	Feb 18, 2025
matmul_clamp_qai8_qai8p_qsi8cxp	matmul_clamp_qai8_qai8p_qsi8cxp	Disable overlength strings compiler diagnostic in-code	Jan 21, 2025
pack	pack	Change type of variable in rhs packing kernels.	Feb 27, 2025
BUILD.bazel	BUILD.bazel	Optimize F32 <- QAI8DXP (LHS) x QSI4C32P (RHS) for 4x8 i8mm	Feb 20, 2025
README.md	README.md	Add documentation for packing and file names	Sep 26, 2024

README.md

About

This document contains information related to matrix-multiplication (matmul) micro kernels.

Naming

The naming of files has the convention below. Unless explicitly specified, arguments are mandatory. kai_matmul_<fused_ops>_<dst_info>_<lhs_info>_<rhs_info>_<mr x nr x kacc>_<technology>_<feature>_<instruction>.c

Syntax	Description	Example
fused_ops	Optional info on applied fused operation like a clamping activation function.	clamp
dst_info	Destination matrix info. Similar to lhs_info
lhs_info	LHS matrix data type and and packing info. f32 - Floating-point 32-bit q : Quantized s : Symmetric a : Asymmetric i : Signed integer u : Unsigned integer 4 : 4-bit Quantized 8 : 8-bit Quantized dx : Per dimension quantization cx : Per channel quantization c32 : Per block quantization, with block length multiple of 32 scalef16 : Scale factors are stores as floating-point 16-bit p : Matrix is packed	Example 1 qsi4cxp : qs - Quantized symmetric i4 - Signed Integer 4-bit cx - Per channel quantized p - packed Some other examples : s16s0 - Packing order of data is interleaved s1s0 - Packing order of data is sequential fp16 - Floating-point 16-bit data type
rhs_info	Similar to lhs_info
mr x nr x kacc	The outer loop calculates mr rows and nr columns. kacc is k-accumulations done per inner loop
technology	Underlying technology. Arm® Neon™	neon
feature	Arm architecture feature used	dotprod, i8mm, sme2
instruction	Instruction used. This is optional	mla

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

matmul

matmul

README.md

About

Naming

Files

matmul

Directory actions

More options

Directory actions

More options

Latest commit

History

matmul

Folders and files

parent directory

README.md

About

Naming