-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops #5673
base: main
Are you sure you want to change the base?
Conversation
1eed10c
to
a77a54a
Compare
#blocked = #ttg.blocked<{sizePerThread = [1, 2], threadsPerWarp = [4, 8], warpsPerCTA = [4, 1], order = [1, 0]}> | ||
#mma = #ttg.nvidia_mma<{versionMajor = 2, versionMinor = 0, warpsPerCTA = [1, 4]}> | ||
module attributes {"ttg.num-warps" = 4 : i32, "ttg.target" = "cuda:80"} { | ||
tt.func @dot_op_hoisted_to_load_with_unsupported_op_and_initializer_above_slice( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all codemovement + adding this test that was proposed but not merged in #5349 (comment)
as we now hoist everything as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One comment is I'm not sure if we need the speculatively part but that's kind of a detail
// This could be generalised if necessary | ||
if (!loadOp) { | ||
auto op = v.getDefiningOp(); | ||
if (isa<arith::ConstantOp>(op) || noDataMovement(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ConstantOp just be put inside noDataMovement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yeah, but leaving it as-is for now because it currently works and we are probably going to refactor this pass in the near future so whatever.
We now support all layouts as LL, and reductions support any layout as input. As such, at least in theory, we should be able to propagate layouts freely, even DotOperands, similar to what we do with other layouts. This PR is a bit tentative. Let's see if anything interesting breaks
We generalise
HoistLayoutConversion
to lift a givenconvert_layout dot_operand
above any chain of operations that do not require data movement. We
could totally generalise this in the future to lift it over other ops. We do
this as a first step to keep the code somewhat similar to the previous
one.
Regarding the previous limitations of
canHoistDotOpEncV2
I did a bit of archeology:UIToFpOp
, this is now supported after [BACKEND] Get rid of unpack/pack I32 #5044convert_layout
rework.We also add proper support for
isPure
forelementwise_inline_asm
opsOn the location of the code, we just leave it in
RemoveLayoutConversion.cpp
totake advantage of the rather generic implementation of
rewriteSlice
. We could totallymove this pass outside of
remove-layout-conversion
, as it's probably enough to runit once. This code will go through further changes in the near future, so we'll assess this
then.