Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proton][Dialect] Add Proton Device Memory Buffer Init and Allocate Pass #5606

Open
wants to merge 39 commits into
base: proton-dev
Choose a base branch
from

Conversation

CRobeck
Copy link
Contributor

@CRobeck CRobeck commented Jan 14, 2025

Add the init and allocation of the Proton dialect device buffer that can be used in place of the shared memory buffer. The device buffer is just a module local, zero initialized, stack buffer in address space(1).

@CRobeck CRobeck changed the base branch from main to proton-dev January 23, 2025 03:55
@CRobeck CRobeck changed the title [WIP][Proton][Dialect] Add Initial Infrastructure For Proton Shared Memory Buffer [Proton][Dialect] Add Infrastructure For Proton Device Memory Buffer Jan 23, 2025
@CRobeck CRobeck marked this pull request as ready for review January 23, 2025 22:26
@CRobeck CRobeck requested a review from ptillet as a code owner January 23, 2025 22:26
@CRobeck CRobeck changed the title [Proton][Dialect] Add Infrastructure For Proton Device Memory Buffer [Proton][Dialect] Add Infrastructure For Proton Device Memory Buffer Pass Jan 24, 2025
@CRobeck CRobeck changed the title [Proton][Dialect] Add Infrastructure For Proton Device Memory Buffer Pass [Proton][Dialect] Add Proton Device Memory Buffer Pass Jan 24, 2025
@CRobeck CRobeck changed the title [Proton][Dialect] Add Proton Device Memory Buffer Pass [Proton][Dialect] Add Proton Device Memory Buffer Init and Allocate Pass Jan 25, 2025
@fywkevin fywkevin self-assigned this Jan 25, 2025
third_party/amd/backend/compiler.py Outdated Show resolved Hide resolved
lib/Conversion/TritonToTritonGPU/TritonToTritonGPUPass.cpp Outdated Show resolved Hide resolved
third_party/amd/lib/TritonAMDGPUToLLVM/TritonGPUToLLVM.cpp Outdated Show resolved Hide resolved
third_party/nvidia/backend/compiler.py Outdated Show resolved Hide resolved
@@ -10,6 +10,11 @@ void populateRecordOpToLLVMPattern(LLVMTypeConverter &typeConverter,
RewritePatternSet &patterns,
const TargetInfoBase &targetInfo,
PatternBenefit benefit);
void populateInitDeviceBufferOpToLLVMPattern(LLVMTypeConverter &typeConverter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we have our own llvm lowering conversion pass, let's this move to proton/dialect/lib/...



@triton.jit
def softmax_kernel(output_ptr, input_ptr, input_row_stride, output_row_stride, n_rows, n_cols, BLOCK_SIZE: tl.constexpr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the end-to-end testing, you could manually construct a TTGIR with the buffer_alloc_op and read write to it and finally write it back to gmem to check its value in python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think we'll want to go through in another PR and add all the end to end testing at once to make sure we have the code coverage we want.

@CRobeck CRobeck force-pushed the proton_buffer branch 3 times, most recently from 1c9d9ca to 7be389b Compare January 29, 2025 14:23
@CRobeck CRobeck requested a review from fywkevin January 29, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants