[WarpSpec] add support for multiple channels sharing the same smem #9

manman-ren · 2024-12-14T00:15:52Z

location

Summary: We already have channelsGroupedByProducers and channelsGroupedByConsumers. For one-producer-multi-consumer mode, a single buffer will be used, channelsGroupedByProducers is used for this. channelsGroupedByConsumers is to minimize the insertion of sync primitives, a single set of communication ops will be inserted.

For this patch, we want to share the same smem location for multiple channels that are live in different loop nests. We add allocation.shareGroup attributes to the local_allocs corresponding to channels that reuse the same smem location.

In order to reuse the same smem location, we update bufferIdx and phase through all the loop nests that share smem locations. We handle the following cases:
for # persistent loop
for # can be nested under if
for # can be nested under if
Or
for # can be nested under if
for # can be nested under if
Or
for # persistent loop
for # can be nested under if

The generated code will look like
for(accumLoopCount)
t1 = IfOp
forOp # loop A
tmpIdx = accumLoopCount + numStepsA
yield tmpIdx
else yield accumLoopCount
t2 = IfOp
forOp # loop B
tmpIdx = t1 + numStepsB
yield tmpIdx
else yield t1
yield t2 for accumLoopCount

manman-ren · 2024-12-14T00:31:14Z

The implementation makes changes to appendBufferIdxArgs/createNewLoop to add an argument in outer loop for accumLoopCount or to add a constant for a place holder when there is no outer loop. It also changes specializeIfOp to create a result for the if to propagate the accumLoopCount.
We then use a helper function updateAccumLoopCount to correctly link up the values.

Phase 1:
ForOp with accumLoopCount as an argument
If
use accumLoopCount to set initialBufferIdx
ForOp
generate numSteps and create an add op for accumLoopCount + numSteps
Yield for ForOp with accumLoopCount (this will be updated later in updateAccumLoopCount)

htyu · 2024-12-18T00:44:46Z

This is great work, thanks!

BTW, can you include a lit test to help understand what this PR do exactly?

location Summary: We already have channelsGroupedByProducers and channelsGroupedByConsumers. For one-producer-multi-consumer mode, a single buffer will be used, channelsGroupedByProducers is used for this. channelsGroupedByConsumers is to minimize the insertion of sync primitives, a single set of communication ops will be inserted. For this patch, we want to share the same smem location for multiple channels that are live in different loop nests. We add allocation.shareGroup attributes to the local_allocs corresponding to channels that reuse the same smem location. In order to reuse the same smem location, we update bufferIdx and phase through all the loop nests that share smem locations. We handle the following cases: for # persistent loop for # can be nested under if for # can be nested under if Or for # can be nested under if for # can be nested under if Or for # persistent loop for # can be nested under if The generated code will look like for(accumLoopCount) t1 = IfOp forOp # loop A tmpIdx = accumLoopCount + numStepsA yield tmpIdx else yield accumLoopCount t2 = IfOp forOp # loop B tmpIdx = t1 + numStepsB yield tmpIdx else yield t1 yield t2 for accumLoopCount Test Plan: Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

htyu · 2024-12-18T00:43:39Z

lib/Dialect/TritonGPU/Transforms/WSCodePartition.cpp

+    if (kv.second.size() <= 1)
+      continue;
+    bufferMap[kv.first].getDefiningOp()->setAttr(
+        "allocation.shareGroup",


A dumb question, why is this needed if same buffer is already used on the IR?

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 14, 2024

manman-ren added 3 commits December 18, 2024 20:28

add lit test

6f48edd

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix lit test

492969a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren force-pushed the mren/ws-reuse-buffer branch from 1015fdc to 492969a Compare December 19, 2024 17:00

htyu reviewed Dec 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WarpSpec] add support for multiple channels sharing the same smem #9

[WarpSpec] add support for multiple channels sharing the same smem #9

manman-ren commented Dec 14, 2024

manman-ren commented Dec 14, 2024

htyu commented Dec 18, 2024

htyu Dec 18, 2024

[WarpSpec] add support for multiple channels sharing the same smem #9

Are you sure you want to change the base?

[WarpSpec] add support for multiple channels sharing the same smem #9

Conversation

manman-ren commented Dec 14, 2024

manman-ren commented Dec 14, 2024

htyu commented Dec 18, 2024

htyu Dec 18, 2024

Choose a reason for hiding this comment