-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reapply "[Layouts] Propagate layouts into conditionals (#5610)" #5725
base: main
Are you sure you want to change the base?
Conversation
if (auto blocked = mlir::dyn_cast<BlockedEncodingAttr>(parent)) { | ||
auto shapePerCTA = | ||
expandMatrixShapeWithBatch(ArrayRef(getShapePerCTA(*this, shape))); | ||
auto shapePerCTATile = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, just create the associated LinearLayout
, with it create a LinearLayoutEncoding
and call getElemsPerThread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems legit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me as the LinearLayout object can be cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be causing a unit test to fail. Digging in...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the unit test itself can be wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the failure. It seems that going through LinearEncodingAttr here is causing extra 0 bases to be added to the register input dimension (this test is Slice(Dot) to LL; incidentally, it's Slice(Dot(Blocked)) that's actually crashing prior to this PR):
/Users/jeffniu/code/triton/unittest/Dialect/TritonGPU/LinearLayoutConversionsTest.cpp:824: Failure
Expected equality of these values:
toLinearLayout({16}, sliceV2)
Which is:
- register=1 -> (8)
register=2 -> (0)
register=4 -> (0)
register=8 -> (0)
register=16 -> (0)
- lane=1 -> (0)
lane=2 -> (0)
lane=4 -> (1)
lane=8 -> (2)
lane=16 -> (4)
- warp is a size 1 dimension
- block is a size 1 dimension
where out dims are: [dim0 (size 16)]
LinearLayout( { {S("register"), {{8}}}, {S("lane"), {{0}, {0}, {1}, {2}, {4}}}, {S("warp"), {}}, {S("block"), {}}, }, {S("dim0")})
Which is:
- register=1 -> (8)
- lane=1 -> (0)
lane=2 -> (0)
lane=4 -> (1)
lane=8 -> (2)
lane=16 -> (4)
- warp is a size 1 dimension
- block is a size 1 dimension
where out dims are: [dim0 (size 16)]
/Users/jeffniu/code/triton/unittest/Dialect/TritonGPU/LinearLayoutConversionsTest.cpp:839: Failure
Expected equality of these values:
toLinearLayout({16}, sliceV3)
Which is:
- register=1 -> (1)
register=2 -> (8)
register=4 -> (0)
- lane=1 -> (2)
lane=2 -> (4)
lane=4 -> (0)
lane=8 -> (0)
lane=16 -> (0)
- warp=1 -> (0)
warp=2 -> (0)
- block is a size 1 dimension
where out dims are: [dim0 (size 16)]
LinearLayout( { {S("register"), {{1}, {8}}}, {S("lane"), {{2}, {4}, {0}, {0}, {0}}}, {S("warp"), {{0}, {0}}}, {S("block"), {}}, }, {S("dim0")})
Which is:
- register=1 -> (1)
register=2 -> (8)
- lane=1 -> (2)
lane=2 -> (4)
lane=4 -> (0)
lane=8 -> (0)
lane=16 -> (0)
- warp=1 -> (0)
warp=2 -> (0)
- block is a size 1 dimension
where out dims are: [dim0 (size 16)]
@lezcano Is this expected? Should I update the unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it seems there is some hack in the SliceEncodingAttr conversion to LinearLayout about these registers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time digging into this and it does not appear to be trivial to untangle. This requires removing the hack that was adding for SliceEncodingAttr::toLinearLayout
, since there is effectively a circular dependency between these functions, but removing that hack is definitely a rabbithole. I will try to stick with this independent logic for now, but given how strange this hack is, it's probably something we should spend time looking at in the near future.
This reverts commit b3bf2b2.
This is a re-land of #5610 with a fix to DotOperandEncodingAttr to resolve a crash exposed in internal tests.
At the same time, this slightly alters the implementation of the algorithm to ensure that layouts don't get pushed back out of conditionals by the forward propagation pass. Originally, the goal of hoisting across conditionals was to ensure that cvts in fused inner loops are placed inside the prologue before pipelining peels the prologue and introduces a chain of dependencies.
My guess is that @lezcano's changes to strengthen layout propagation enabled the forward propagation pass to push the conversion back out of the prologue even after pipelining (which is good), but I explicitly disabled hoisting into chains of conditionals, so these didn't balance out. This PR alters the algorithm to consider hoisting cvts across chains of conditionals only for subslices inside for loops.