-
Notifications
You must be signed in to change notification settings - Fork 662
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[AMDGPU] Use shared memory in multi_mma ukernel (#19786)
This achieves about 210 Top/s on CPX-mode MI300X, about 64% of peak 327 Top/s. That's about parity with the non-ukernel codegen path, which also uses shared memory. An earlier revision of this PR was opting out of DistributeMmaToLanes, which was more natural since a kernel that uses shared memory has to perform workgroup-relative indexing in the copies from global to shared memory. That required fine ordering of the pass pipeline, and ended up performing worse, at 180 Top/s vs 210 Top/s. So this PR instead stays on DistributeMmaToLanes, and then adds the negative thread-relative offsets to compensate. This relies on interpreting bitcode to tell exactly how much shared memory to allocate. That takes 2 ms. To avoid doing it redundantly, this is cached, with the `DataTiledMMAAttr` value as key, so this should only run a few times per iree-compile invocation. When it is determined that no shared memory should be allocated, to avoid creating 0-sized tensors, a new `iree_codegen.null_pointer` type is introduced to be passed in lieu of an actual tensor. It lowers to a null pointer (and offset). It is intended to be used with ukernels taking a tensor/memref/pointer argument that is nullable, such as the shared memory argument here. --------- Signed-off-by: Benoit Jacob <[email protected]>
- Loading branch information
Showing
24 changed files
with
608 additions
and
101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.