Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TMA example for Hopper H100 #214

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ahendriksen
Copy link

This sample code shows how to create a TMA descriptor using the driver API and how to initiate a TMA transfer using inline PTX.

I have not yet gotten the chance to copy over the Makefile from other directories. What is the preferred solution here? Also for creating the Visual Studio solution files?

This example can only be compiled with -arch sm_90. Previous architectures are not supported.

@bathmatt
Copy link

bathmatt commented Aug 7, 2023

Maybe modify the printf to state that the following code will fail. You have

Care must be taken to ensure that the coordinates result in a memory offset
that is aligned to 16 bytes. With 32 bit integer elements, x coordinates
that are not a multiple of 4 result in a non-recoverable error:

Maybe add the following functions should fail due to ....

@ahendriksen
Copy link
Author

Thanks! I agree that seeing the end of the example fail is confusing. I have taken your suggestion and also added (as expected) to each line with a failure. This should make it more obvious that the final lines document that the kernels should fail.

**NOTE**: The following code will fail.
 
Care must be taken to ensure that the coordinates result in a memory offset
that is aligned to 16 bytes. With 32 bit integer elements, x coordinates
that are not a multiple of 4 result in a non-recoverable error:

CUDA error (as expected): an illegal instruction was encountered globalToShmemTMACopy.cu 346
CUDA error (as expected): an illegal instruction was encountered globalToShmemTMACopy.cu 348
CUDA error (as expected): an illegal instruction was encountered globalToShmemTMACopy.cu 350
CUDA error (as expected): an illegal instruction was encountered globalToShmemTMACopy.cu 352

@bathmatt
Copy link

bathmatt commented Aug 7, 2023

Thanks, run first, ask questions later, that's my motto... when I see errors I tend to look for problems instead of thinking, learning exercise :)

@ahendriksen
Copy link
Author

Haha no problem. As they say "you don't have to prepare to win the lottery, but the lottery has to prepare for someone to win". There will always be somebody running the samples in a hurry, even if many people won't. Better to have them covered from the start :)

@kuozhang
Copy link

Thanks! I agree that seeing the end of the example fail is confusing. I have taken your suggestion and also added (as expected) to each line with a failure. This should make it more obvious that the final lines document that the kernels should fail.

**NOTE**: The following code will fail.
 
Care must be taken to ensure that the coordinates result in a memory offset
that is aligned to 16 bytes. With 32 bit integer elements, x coordinates
that are not a multiple of 4 result in a non-recoverable error:

CUDA error (as expected): an illegal instruction was encountered globalToShmemTMACopy.cu 346
CUDA error (as expected): an illegal instruction was encountered globalToShmemTMACopy.cu 348
CUDA error (as expected): an illegal instruction was encountered globalToShmemTMACopy.cu 350
CUDA error (as expected): an illegal instruction was encountered globalToShmemTMACopy.cu 352

so what to do with this?

@kuozhang
Copy link

No MakeFile in this folder, copy one from folder globalToShmemAsyncCopy and change globalToShmemAsyncCopy to globalToShmemTMACopy works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants