Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undesired iree_hal_command_buffer_copy_buffer after apply FoldUnitExtentDimsPass #19878

Open
bhbruce opened this issue Feb 3, 2025 · 0 comments

Comments

@bhbruce
Copy link
Contributor

bhbruce commented Feb 3, 2025

Background

Input mlir

I had a simple mlir model for LLM. It contains KV cache mlir inside:
util.global private mutable @_global_k_caches.layer_idx.0 = #util.uninitialized : tensor<1x2047x4x64xf32>

In run_forward function, it's used by
%_global_k_caches.layer_idx.0 = util.global.load @_global_k_caches.layer_idx.0 : tensor<1x2047x4x64xf32>. The actuall

After applying FoldUnitExtentDimsPass:

The KV-cache mlir code is optimized:
util.global private mutable @_global_k_caches.layer_idx.0 = #util.uninitialized : tensor<2047x4x64xf32>

In run_forward function, it becomes:

%_global_k_caches.layer_idx.0 = util.global.load @_global_k_caches.layer_idx.0 : tensor<2047x4x64xf32>
%expanded = tensor.expand_shape %_global_k_caches.layer_idx.0 [[0, 1], [2], [3]] output_shape [1, 2047, 4, 64] : tensor<2047x4x64xf32> into tensor<1x2047x4x64xf32>

After --compile-to=stream

Command copy operations are created:

stream.cmd.concurrent {
     stream.cmd.copy %arg3[%c0], %arg12[%c0], %c2096128 : !stream.resource<variable>{%c4192256} -> !stream.resource<transient>{%34}
     stream.cmd.copy %arg3[%c2096128], %arg12[%c2096128], %c2096128 : !stream.resource<variable>{%c4192256} -> !stream.resource<transient>{%34}
}

These command buffer copy operations (which to copy 2096128 = 2047 * 4 * 64 bytes) are unnecessary and cause lots of performance drops.
It can be zero-copy.

IREE version

02d145e
Date: Wed Jan 8 19:39:49 2025 -0800

command

iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-triple=riscv64-pc-linux-gnu --iree-llvmcpu-target-cpu=generic-rv64 --iree-llvmcpu-target-cpu-features="+m,+a,+f,+d,+zba,+zbb,+zfh,+zvl1024b,+v,+zvfh" --iree-llvmcpu-target-abi=lp64d llm-forward.mlir

MLIR file and log:
llm-sample.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant