Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "[Layouts] Propagate layouts into conditionals (#5610)" #5725

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Mogball
Copy link
Collaborator

@Mogball Mogball commented Jan 28, 2025

This is a re-land of #5610 with a fix to DotOperandEncodingAttr to resolve a crash exposed in internal tests.

At the same time, this slightly alters the implementation of the algorithm to ensure that layouts don't get pushed back out of conditionals by the forward propagation pass. Originally, the goal of hoisting across conditionals was to ensure that cvts in fused inner loops are placed inside the prologue before pipelining peels the prologue and introduces a chain of dependencies.

My guess is that @lezcano's changes to strengthen layout propagation enabled the forward propagation pass to push the conversion back out of the prologue even after pipelining (which is good), but I explicitly disabled hoisting into chains of conditionals, so these didn't balance out. This PR alters the algorithm to consider hoisting cvts across chains of conditionals only for subslices inside for loops.

@Mogball Mogball requested a review from ptillet as a code owner January 28, 2025 06:35
if (auto blocked = mlir::dyn_cast<BlockedEncodingAttr>(parent)) {
auto shapePerCTA =
expandMatrixShapeWithBatch(ArrayRef(getShapePerCTA(*this, shape)));
auto shapePerCTATile =
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lezcano @Jokeren I have no idea if this logic is correct. Would appreciate if one of you two could check this for me :)

Copy link
Contributor

@lezcano lezcano Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, just create the associated LinearLayout, with it create a LinearLayoutEncoding and call getElemsPerThread

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me as the LinearLayout object can be cached.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be causing a unit test to fail. Digging in...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the unit test itself can be wrong

Copy link
Collaborator Author

@Mogball Mogball Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the failure. It seems that going through LinearEncodingAttr here is causing extra 0 bases to be added to the register input dimension (this test is Slice(Dot) to LL; incidentally, it's Slice(Dot(Blocked)) that's actually crashing prior to this PR):

/Users/jeffniu/code/triton/unittest/Dialect/TritonGPU/LinearLayoutConversionsTest.cpp:824: Failure
Expected equality of these values:
  toLinearLayout({16}, sliceV2)
    Which is: 
 - register=1 -> (8)
   register=2 -> (0)
   register=4 -> (0)
   register=8 -> (0)
   register=16 -> (0)
 - lane=1 -> (0)
   lane=2 -> (0)
   lane=4 -> (1)
   lane=8 -> (2)
   lane=16 -> (4)
 - warp is a size 1 dimension
 - block is a size 1 dimension
where out dims are: [dim0 (size 16)]
  LinearLayout( { {S("register"), {{8}}}, {S("lane"), {{0}, {0}, {1}, {2}, {4}}}, {S("warp"), {}}, {S("block"), {}}, }, {S("dim0")})
    Which is: 
 - register=1 -> (8)
 - lane=1 -> (0)
   lane=2 -> (0)
   lane=4 -> (1)
   lane=8 -> (2)
   lane=16 -> (4)
 - warp is a size 1 dimension
 - block is a size 1 dimension
where out dims are: [dim0 (size 16)]
/Users/jeffniu/code/triton/unittest/Dialect/TritonGPU/LinearLayoutConversionsTest.cpp:839: Failure
Expected equality of these values:
  toLinearLayout({16}, sliceV3)
    Which is: 
 - register=1 -> (1)
   register=2 -> (8)
   register=4 -> (0)   
 - lane=1 -> (2)
   lane=2 -> (4)
   lane=4 -> (0)
   lane=8 -> (0)
   lane=16 -> (0)
 - warp=1 -> (0)
   warp=2 -> (0)
 - block is a size 1 dimension
where out dims are: [dim0 (size 16)]
  LinearLayout( { {S("register"), {{1}, {8}}}, {S("lane"), {{2}, {4}, {0}, {0}, {0}}}, {S("warp"), {{0}, {0}}}, {S("block"), {}}, }, {S("dim0")})
    Which is: 
 - register=1 -> (1)
   register=2 -> (8)
 - lane=1 -> (2)
   lane=2 -> (4)
   lane=4 -> (0)
   lane=8 -> (0)
   lane=16 -> (0)
 - warp=1 -> (0)
   warp=2 -> (0)
 - block is a size 1 dimension
where out dims are: [dim0 (size 16)]

@lezcano Is this expected? Should I update the unit tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it seems there is some hack in the SliceEncodingAttr conversion to LinearLayout about these registers...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time digging into this and it does not appear to be trivial to untangle. This requires removing the hack that was adding for SliceEncodingAttr::toLinearLayout, since there is effectively a circular dependency between these functions, but removing that hack is definitely a rabbithole. I will try to stick with this independent logic for now, but given how strange this hack is, it's probably something we should spend time looking at in the near future.

@Mogball Mogball requested a review from lezcano January 28, 2025 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants