[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops #5673

lezcano · 2025-01-22T17:52:39Z

We generalise HoistLayoutConversion to lift a given convert_layout dot_operand
above any chain of operations that do not require data movement. We
could totally generalise this in the future to lift it over other ops. We do
this as a first step to keep the code somewhat similar to the previous
one.

Regarding the previous limitations of canHoistDotOpEncV2 I did a bit of archeology:

The "don't hoist past select" was added in this issue Floating Point Exception in F16xF16 Matmul with Pipelining on A100 #2857. I run the repro and with the recent layout fixes, it now passes.
The TruncOps being skipped comes from [BACKEND] Refactor RemoveLayoutConversion pass #2181. I think this is related with the hack that was removed in [BACKEND] Get rid of unpack/pack I32 #5044, so now it should work
Same same for the UIToFpOp, this is now supported after [BACKEND] Get rid of unpack/pack I32 #5044
Mixed dtype hack is not necessary either as now everything works as expected with the convert_layout rework.

We also add proper support for isPure for elementwise_inline_asm ops

On the location of the code, we just leave it in RemoveLayoutConversion.cpp to
take advantage of the rather generic implementation of rewriteSlice. We could totally
move this pass outside of remove-layout-conversion, as it's probably enough to run
it once. This code will go through further changes in the near future, so we'll assess this
then.

lezcano · 2025-01-27T10:01:21Z

test/TritonGPU/combine.mlir

+#blocked = #ttg.blocked<{sizePerThread = [1, 2], threadsPerWarp = [4, 8], warpsPerCTA = [4, 1], order = [1, 0]}>
+#mma = #ttg.nvidia_mma<{versionMajor = 2, versionMinor = 0, warpsPerCTA = [1, 4]}>
+module attributes {"ttg.num-warps" = 4 : i32, "ttg.target" = "cuda:80"} {
+  tt.func @dot_op_hoisted_to_load_with_unsupported_op_and_initializer_above_slice(


This is all codemovement + adding this test that was proposed but not merged in #5349 (comment)
as we now hoist everything as expected

test/TritonGPU/combine.mlir

ThomasRaoux

LGTM. One comment is I'm not sure if we need the speculatively part but that's kind of a detail

lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp

Mogball · 2025-01-28T16:43:50Z

lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp

+    // This could be generalised if necessary
+    if (!loadOp) {
+      auto op = v.getDefiningOp();
+      if (isa<arith::ConstantOp>(op) || noDataMovement(op)) {


Should ConstantOp just be put inside noDataMovement?

Probably yeah, but leaving it as-is for now because it currently works and we are probably going to refactor this pass in the near future so whatever.

We now support all layouts as LL, and reductions support any layout as input. As such, at least in theory, we should be able to propagate layouts freely, even DotOperands, similar to what we do with other layouts. This PR is a bit tentative. Let's see if anything interesting breaks

lezcano requested a review from ptillet as a code owner January 22, 2025 17:52

lezcano marked this pull request as draft January 22, 2025 23:30

lezcano force-pushed the remat_dot branch 2 times, most recently from 1eed10c to a77a54a Compare January 24, 2025 15:00

lezcano changed the title ~~[WIP][LAYOUTS] Remove HoistLayoutConversion in favour of backwardsRemat~~ [LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops Jan 27, 2025

lezcano force-pushed the remat_dot branch from e92bead to 8fbbc33 Compare January 27, 2025 09:58

lezcano commented Jan 27, 2025

View reviewed changes

test/TritonGPU/combine.mlir Outdated Show resolved Hide resolved

lezcano marked this pull request as ready for review January 27, 2025 10:23

lezcano requested a review from ThomasRaoux January 27, 2025 10:32

ThomasRaoux approved these changes Jan 28, 2025

View reviewed changes

lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp Outdated Show resolved Hide resolved

Mogball reviewed Jan 28, 2025

View reviewed changes

lezcano force-pushed the remat_dot branch from ed43393 to ae0da3a Compare January 29, 2025 10:04

lezcano enabled auto-merge (squash) January 29, 2025 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops #5673

[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops #5673

lezcano commented Jan 22, 2025 •

edited

Loading

lezcano Jan 27, 2025 •

edited

Loading

ThomasRaoux left a comment

Mogball Jan 28, 2025

lezcano Jan 29, 2025

[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops #5673

Are you sure you want to change the base?

[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops #5673

Conversation

lezcano commented Jan 22, 2025 • edited Loading

lezcano Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

ThomasRaoux left a comment

Choose a reason for hiding this comment

Mogball Jan 28, 2025

Choose a reason for hiding this comment

lezcano Jan 29, 2025

Choose a reason for hiding this comment

lezcano commented Jan 22, 2025 •

edited

Loading

lezcano Jan 27, 2025 •

edited

Loading