You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a simple mlir model for LLM. It contains KV cache mlir inside: util.global private mutable @_global_k_caches.layer_idx.0 = #util.uninitialized : tensor<1x2047x4x64xf32>
In run_forward function, it's used by %_global_k_caches.layer_idx.0 = util.global.load @_global_k_caches.layer_idx.0 : tensor<1x2047x4x64xf32>. The actuall
After applying FoldUnitExtentDimsPass:
The KV-cache mlir code is optimized: util.global private mutable @_global_k_caches.layer_idx.0 = #util.uninitialized : tensor<2047x4x64xf32>
These command buffer copy operations (which to copy 2096128 = 2047 * 4 * 64 bytes) are unnecessary and cause lots of performance drops. It can be zero-copy.
Background
Input mlir
I had a simple mlir model for LLM. It contains KV cache mlir inside:
util.global private mutable @_global_k_caches.layer_idx.0 = #util.uninitialized : tensor<1x2047x4x64xf32>
In
run_forward
function, it's used by%_global_k_caches.layer_idx.0 = util.global.load @_global_k_caches.layer_idx.0 : tensor<1x2047x4x64xf32>
. The actuallAfter applying FoldUnitExtentDimsPass:
The KV-cache mlir code is optimized:
util.global private mutable @_global_k_caches.layer_idx.0 = #util.uninitialized : tensor<2047x4x64xf32>
In
run_forward
function, it becomes:After --compile-to=stream
Command copy operations are created:
These command buffer copy operations (which to copy 2096128 = 2047 * 4 * 64 bytes) are unnecessary and cause lots of performance drops.
It can be zero-copy.
IREE version
02d145e
Date: Wed Jan 8 19:39:49 2025 -0800
command
iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-triple=riscv64-pc-linux-gnu --iree-llvmcpu-target-cpu=generic-rv64 --iree-llvmcpu-target-cpu-features="+m,+a,+f,+d,+zba,+zbb,+zfh,+zvl1024b,+v,+zvfh" --iree-llvmcpu-target-abi=lp64d llm-forward.mlir
MLIR file and log:
llm-sample.tar.gz
The text was updated successfully, but these errors were encountered: