[Codegen] llama 8b fp8 with attention vector distribute fail #19991

AmosLewis · 2025-02-14T04:53:01Z

What happened?

Follow up of [ROCm][Codegen] llama 8b fp8 with attention segfault #19921

New codegen issue llama_f8_attn_bug_log_0213.txt after I rebase iree to

commit 0ff26a7bef803edf3e22588f3e69a51c9335a79b (HEAD -> main, upstream/main)
Author: Prashant Kumar <[email protected]>
Date:   Thu Feb 13 23:26:59 2025 +0530
    [Codegen] Add support to emulate unsupported float type (#19943)

f8_attn_chi_castf32_roctorch.mlir:45778:10: error: 'func.func' op failed to distribute
    %1 = iree_linalg_ext.attention {indexing_maps = [#map, #map1, #map2, #map3, #map4, #map5]} ins(%collapsed, %collapsed_1, %collapsed_2, %extracted, %arg4 : tensor<32x?x128xf8E4M3FNUZ>, tensor<32x?x128xf8E4M3FNUZ>, tensor<32x?x128xf8E4M3FNUZ>, f32, tensor<?x?xf8E4M3FNUZ>) outs(%cast : tensor<32x?x128xf32>) {
         ^
f8_attn_chi_castf32_roctorch.mlir:2706:12: note: called from
    %914 = util.call @sharktank_masked_flash_attention_1_32_128_128_f8E4M3FNUZ_f32_f32(%909, %910, %911, %913, %912) : (tensor<1x32x?x128xf8E4M3FNUZ>, tensor<1x32x?x128xf8E4M3FNUZ>, tensor<1x32x?x128xf8E4M3FNUZ>, tensor<f32>, tensor<?x?xf8E4M3FNUZ>) -> tensor<1x32x?x128xf32>
           ^
f8_attn_chi_castf32_roctorch.mlir:45778:10: note: see current operation:

Steps to reproduce your issue

compile iree

cmake -G Ninja -B ../iree-build  -S . \
    -DCMAKE_BUILD_TYPE=Debug \
    -DIREE_ENABLE_ASSERTIONS=ON \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DIREE_ENABLE_RUNTIME_TRACING=ON \
    -DIREE_BUILD_TRACY=OFF \
    -DIREE_ENABLE_LLD=ON \
    -DIREE_BUILD_PYTHON_BINDINGS=ON \
    -DPython3_EXECUTABLE="$(which python3)" \
    -DIREE_TARGET_BACKEND_CUDA=OFF \
    -DIREE_HAL_DRIVER_HIP=ON \
    -DIREE_TARGET_BACKEND_ROCM=ON .
cmake --build ../iree-build

Download input mlir here f8_attn_chi_castf32_roctorch.mlir,

Optional: Export the 8_attn_chi_castf32_roctorch.mlir manually with nod-ai/shark-ai#907

run the following cmd:

 /home/chi/src/iree-build/tools/iree-compile f8_attn_chi_castf32_roctorch.mlir \
  --iree-hip-target=gfx942 \
  -o=f8_attn_chi_castf32_roctorch.vmfb \
  --iree-hal-target-device=hip \
  --iree-dispatch-creation-enable-aggressive-fusion=true \
  --iree-global-opt-propagate-transposes=true \
  --iree-opt-aggressively-propagate-transposes=true \
  --iree-opt-data-tiling=false \
  --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \
  --iree-hal-indirect-command-buffers=true \
  --iree-stream-resource-memory-model=discrete \
  --iree-hal-memoization=true \
  --iree-opt-strip-assertions

What component(s) does this issue relate to?

Compiler

Version information

commit 0ff26a7 (HEAD -> main, upstream/main)
Author: Prashant Kumar [email protected]
Date: Thu Feb 13 23:26:59 2025 +0530
[Codegen] Add support to emulate unsupported float type (#19943)

Additional context

No response

The text was updated successfully, but these errors were encountered:

pashu123 · 2025-02-14T11:01:10Z

Arising from this dispatch: https://gist.github.com/pashu123/e21bc74fafbc4ce3ae23b0adf3ac75b5

pashu123 · 2025-02-14T11:13:12Z

iree-opt --pass-pipeline="builtin.module(func.func(iree-llvmgpu-vector-distribute,canonicalize,cse))" --split-input-file before_vector_distribute.mlir
https://gist.github.com/pashu123/e22dba342a9bf78c0ee5ccb0522d3855

IanWood1 · 2025-02-14T17:47:16Z

I think the problem is the tensor.expand_shape which is getting put into the dispatch because it is the attention mask, this is similar to the problem fixed by #19838

cc @MaheshRavishankar

AmosLewis added the bug 🐞 Something isn't working label Feb 14, 2025

AmosLewis assigned pashu123 Feb 14, 2025

AmosLewis mentioned this issue Feb 14, 2025

[ROCm][Codegen] llama 8b fp8 with attention segfault #19921

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen] llama 8b fp8 with attention vector distribute fail #19991

[Codegen] llama 8b fp8 with attention vector distribute fail #19991

AmosLewis commented Feb 14, 2025

pashu123 commented Feb 14, 2025

pashu123 commented Feb 14, 2025

IanWood1 commented Feb 14, 2025 •

edited

Loading

[Codegen] llama 8b fp8 with attention vector distribute fail #19991

[Codegen] llama 8b fp8 with attention vector distribute fail #19991

Comments

AmosLewis commented Feb 14, 2025

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

pashu123 commented Feb 14, 2025

pashu123 commented Feb 14, 2025

IanWood1 commented Feb 14, 2025 • edited Loading

IanWood1 commented Feb 14, 2025 •

edited

Loading