[Codegen] Add support to emulate unsupported float type #19943

pashu123 · 2025-02-10T13:54:34Z

This change enables the conversion of types such as f8E4M3FNUZ and f8E5M2FNUZ (emulated via the existing APIs) into f32 operations. The conversion logic is now tightly coupled with the executable target attribute, so that it is applied only for the gfx942 target. This removes the need for manual pass configuration to specify source types and aligns the behaviour with the target’s capabilities. For any new conversion, just populate the conversion target with source and target types.

FIX: #19921 (comment)

Signed-off-by: Chi Liu<[email protected]>

hanhanW

I think it does not fix anything because you do not add the pass to any pipeline.. Can you add an e2e test to https://github.com/iree-org/iree/tree/main/tests/e2e/linalg?

Also, we need better documentation (and description) for the pass and the PR description.

E.g., people who dont have context might ask "what is unsupported float type"

pashu123 · 2025-02-10T18:41:18Z

I think it does not fix anything because you do not add the pass to any pipeline.. Can you add an e2e test to https://github.com/iree-org/iree/tree/main/tests/e2e/linalg?

Also, we need better documentation (and description) for the pass and the PR description.

E.g., people who dont have context might ask "what is unsupported float type"

You're right! After further thinking I'll wrap those emulate APIs in a fn and call it here:

iree/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToROCDL.cpp

Line 183 in 624a9fa

arith::populateExpandBFloat16Patterns(patterns);

rather than creating a pass.

krzysz00 · 2025-02-10T18:52:13Z

@pashu123 Because you need to run --arith-to-amdgpu to get the FP8 conversion instructions, you shouldn't just add the rewrites to ConvertotROCDL like that

This sort of thing does, with how stuff is architected today, likely need to be a pass.

See upstreams --arith-emulate-unsupported-floats

pashu123 · 2025-02-10T19:24:46Z

@pashu123 Because you need to run --arith-to-amdgpu to get the FP8 conversion instructions, you shouldn't just add the rewrites to ConvertotROCDL like that

This sort of thing does, with how stuff is architected today, likely need to be a pass.

See upstreams --arith-emulate-unsupported-floats

If you look at the current pass, it's using APIs from --arith-emulate-unsupported-floats pass. Would you like me to tailor the downstream pass for a subset of arith operations?

krzysz00 · 2025-02-11T18:55:55Z

compiler/src/iree/compiler/Codegen/Common/ConvertUnsupportedFloatArithToF32.cpp

+  Operation *op = getOperation();
+
+  // Add the source types to be converted to the target type.
+  SmallVector<Type> sourceTypes = {Float8E4M3FNUZType::get(context)};


As a thought, could we merge the BF16 -> F32 arithmetic pass into this one?

Also, you'll also want the E5M2 type here

Converts arith operations on unsupported float types to f32.

bjacob · 2025-02-12T15:52:11Z

compiler/src/iree/compiler/Codegen/LLVMGPU/Passes.cpp

+    funcPassManager.addPass([&]() {
+      ConvertUnsupportedFloatArithPassOptions options;
+      // Convert arith operation with the given `source types` to `target`
+      // type.
+      options.sourceTypeStrs = {"f8E4M3FNUZ", "f8E5M2FNUZ"};
+      options.targetTypeStr = "f32";
+      return createConvertUnsupportedFloatArithPass(options);
+    });


The notion of which type is "supported" depends on the combination of (target architecture, specific operation, specific operand for that operation). For example, on CDNA3, we have matrix multiplication instructions that handle f8E4M3FNUZ and f8E5M2FNUZ for the LHS and RHS operands, but not for the accumulator operand. How is that nuance reflected in the logic being introduced in this PR?

https://github.com/llvm/llvm-project/blob/b04a980b5597c61a8df2b489c4894bc0240b8e13/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp#L122 This doesn't touch those operations for now but in future should add the pattern downstream of what it should look for based on the target. I should add the pass under a flag checking the target.

The target should be looked up from the target attribute on the op that the pass is running on, not from a flag.

In the pass's runOnOperation() method, do something like auto target = IREE::HAL::ExecutableTargetAttr::lookup(getOperation()); .

That means that the list of unsupported types shouldn't be a pass option. The pass should be option-less and act on the executable target. Similar to what this PR did (also see its test specifying targets): #19922

Thanks for the info! It was constructive. I've updated the pass based on the target arch. This only populates gfx94{*} source and target types for now and can be updated based on the need.

compiler/src/iree/compiler/Codegen/Common/ConvertUnsupportedFloatArithPass.cpp

bjacob · 2025-02-13T13:01:20Z

compiler/src/iree/compiler/Codegen/Common/test/convert_unsupported_float_arith.mlir

+// CHECK: %[[NEG:.*]] = arith.negf %[[EXT]] : f32
+// CHECK: %[[TRUNC:.*]] = arith.truncf %[[NEG]] {{.*}} : f32 to f8E4M3FNUZ
+// CHECK: return %[[TRUNC]] : f8E4M3FNUZ
+#executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb", {abi = "hip", iree.gpu.target = #iree_gpu.target<arch = "gfx942", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<MFMA_F32_16x16x4_F32>, <MFMA_F32_16x16x16_F16>, <MFMA_F32_32x32x8_F16>, <MFMA_F64_16x16x4_F64>, <MFMA_F32_16x16x16_BF16>, <MFMA_F32_32x32x8_BF16>, <MFMA_F32_16x16x32_F8E5M2FNUZ>, <MFMA_F32_16x16x32_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_I32_16x16x32_I8>, <MFMA_I32_32x32x16_I8>], subgroup_size_choices = [64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647], max_load_instruction_bits = 128, simds_per_wgp = 4, vgpr_space_bits = 16384>>, ukernels = "none"}>


If possible, find ways to elide some irrelevant parts of this target attribute. For instance, the mma array could be empty. (Side note: we should make wgp optional for things like this.)

hanhanW

Please update the PR description with detailed pass description and what is supported in the PR. E.g., Add what source and target conversion types for gfx94{*} series.

[Codegen] Add f8 to f32 pass for arith.negf

582dacd

Signed-off-by: Chi Liu<[email protected]>

pashu123 requested a review from hanhanW as a code owner February 10, 2025 13:54

pashu123 requested review from AmosLewis and krzysz00 February 10, 2025 13:56

pashu123 mentioned this pull request Feb 10, 2025

[Codegen] Add f8 to f32 pass for arith.negf #19942

Closed

AmosLewis approved these changes Feb 10, 2025

View reviewed changes

hanhanW requested changes Feb 10, 2025

View reviewed changes

krzysz00 reviewed Feb 11, 2025

View reviewed changes

pashu123 force-pushed the exp_f8 branch from 09b056b to 74b875f Compare February 12, 2025 10:44

pashu123 requested review from MaheshRavishankar, qedawkins, kuhar and Groverkss as code owners February 12, 2025 10:44

pashu123 force-pushed the exp_f8 branch from 74b875f to e8cc8a8 Compare February 12, 2025 13:39

[Codegen] Add support to emulate unsupported float type

6c5bb93

Converts arith operations on unsupported float types to f32.

pashu123 force-pushed the exp_f8 branch from e8cc8a8 to 6c5bb93 Compare February 12, 2025 14:36

pashu123 requested a review from hanhanW February 12, 2025 14:46

MaheshRavishankar requested a review from bjacob February 12, 2025 14:48

bjacob reviewed Feb 12, 2025

View reviewed changes

pashu123 requested a review from bjacob February 12, 2025 20:52

pashu123 force-pushed the exp_f8 branch from 32fa2f5 to c751e2c Compare February 12, 2025 20:59

hanhanW reviewed Feb 12, 2025

View reviewed changes

pashu123 force-pushed the exp_f8 branch from c751e2c to 61cdd1d Compare February 12, 2025 21:25

pashu123 requested a review from hanhanW February 12, 2025 21:27

pashu123 force-pushed the exp_f8 branch from 61cdd1d to 1ec0041 Compare February 12, 2025 21:28

hanhanW reviewed Feb 12, 2025

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/ConvertUnsupportedFloatArithPass.cpp Outdated Show resolved Hide resolved

pashu123 force-pushed the exp_f8 branch from 1ec0041 to 9482883 Compare February 12, 2025 21:32

pashu123 force-pushed the exp_f8 branch from 9482883 to dd7e6c1 Compare February 12, 2025 21:39

pashu123 requested a review from hanhanW February 12, 2025 21:39

pashu123 force-pushed the exp_f8 branch 3 times, most recently from 0fd49a2 to 1966456 Compare February 13, 2025 12:04

bjacob approved these changes Feb 13, 2025

View reviewed changes

Add conversion based on the target arch.

781805c

pashu123 force-pushed the exp_f8 branch from 1966456 to 781805c Compare February 13, 2025 13:07

hanhanW approved these changes Feb 13, 2025

View reviewed changes

pashu123 merged commit 0ff26a7 into iree-org:main Feb 13, 2025
40 checks passed

AmosLewis mentioned this pull request Feb 14, 2025

[Codegen] llama 8b fp8 with attention vector distribute fail #19991

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen] Add support to emulate unsupported float type #19943

[Codegen] Add support to emulate unsupported float type #19943

pashu123 commented Feb 10, 2025 •

edited

Loading

hanhanW left a comment •

edited

Loading

pashu123 commented Feb 10, 2025

krzysz00 commented Feb 10, 2025

pashu123 commented Feb 10, 2025 •

edited

Loading

krzysz00 Feb 11, 2025

bjacob Feb 12, 2025

pashu123 Feb 12, 2025

bjacob Feb 12, 2025

pashu123 Feb 12, 2025

bjacob Feb 13, 2025

pashu123 Feb 13, 2025

hanhanW left a comment

[Codegen] Add support to emulate unsupported float type #19943

[Codegen] Add support to emulate unsupported float type #19943

Conversation

pashu123 commented Feb 10, 2025 • edited Loading

hanhanW left a comment • edited Loading

Choose a reason for hiding this comment

pashu123 commented Feb 10, 2025

krzysz00 commented Feb 10, 2025

pashu123 commented Feb 10, 2025 • edited Loading

krzysz00 Feb 11, 2025

Choose a reason for hiding this comment

bjacob Feb 12, 2025

Choose a reason for hiding this comment

pashu123 Feb 12, 2025

Choose a reason for hiding this comment

bjacob Feb 12, 2025

Choose a reason for hiding this comment

pashu123 Feb 12, 2025

Choose a reason for hiding this comment

bjacob Feb 13, 2025

Choose a reason for hiding this comment

pashu123 Feb 13, 2025

Choose a reason for hiding this comment

hanhanW left a comment

Choose a reason for hiding this comment

pashu123 commented Feb 10, 2025 •

edited

Loading

hanhanW left a comment •

edited

Loading

pashu123 commented Feb 10, 2025 •

edited

Loading