You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1752 / 16 = 109.5 so it is not a multiple of 16 and there are incorrect results. 1760 / 16 = 110 and 1744 / 16 = 110 is a multiple of 16 and runs correctly.
This is probably an issue with (16, 16) stmatrix store given the multiple of 16 requirement.
To Reproduce: NVFUSER_ENABLE=fuse_matmul NVFUSER_DISABLE=matmul_expr_eval python profile_matmul.py 1752 4720 584 NN --verbose --validate
rdspring1
changed the title
Incorrect results when problem size M is not divisible by 32.
Incorrect results when problem size M is not divisible by 16.
Feb 25, 2025
Why does the problem size affect the stmatrix? Our TMA stores should be the M,N size of the CTA tile right? So in this case 128x256 (which will be chopped into 64x64 subtiles for the TMA). Still, I would think that stmatrix which is just filling in a 128x256 smem buffer would not cause this. Then again, the TMA load should fill OOB with zeros (or nan if misconfigured), so I would not expect TMA to cause the problem either..
1752 / 16 = 109.5
so it is not a multiple of 16 and there are incorrect results.1760 / 16 = 110
and1744 / 16 = 110
is a multiple of 16 and runs correctly.This is probably an issue with (16, 16) stmatrix store given the multiple of 16 requirement.
To Reproduce:
NVFUSER_ENABLE=fuse_matmul NVFUSER_DISABLE=matmul_expr_eval python profile_matmul.py 1752 4720 584 NN --verbose --validate
Error Message:
The text was updated successfully, but these errors were encountered: