Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assertion in ScanLowering for num_ctas>1 #5680

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

safelix
Copy link

@safelix safelix commented Jan 23, 2025

Enable within-CTA scans if num_ctas>1 and cluster_dims[axis]==1 on Hopper or later.

Assume the clusterCTAId along the scan axis is cctaIdAxis==0, raise runtime assertion otherwise. Combine the clusterCTAId across the scan axis (cctaIdParallel) into the flatIdParallel and compute numParallelLane per CGA instead of per CTA.

This fixes assertions when num_ctas > 1:

assert(numScanBlocks * numParallelBlocks * parallelElementsPerThread *
assert(numScanBlocks * numParallelBlocks * parallelElementsPerThread *

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because see above.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

@safelix safelix requested a review from Jokeren as a code owner January 23, 2025 16:09
@Jokeren
Copy link
Contributor

Jokeren commented Jan 23, 2025

num_ctas > 1 is supported so a test case should be added

@safelix safelix requested a review from ptillet as a code owner January 23, 2025 22:56
@safelix
Copy link
Author

safelix commented Jan 23, 2025

Added the test cases, they pass if the scan is within a CTA and fail if it is across multiple CTAs.

I didn't find any logic which performs accumulation across CTAs in the code. I could try to look into the logic and test coverage for cross-CTA scans in a different PR, but I think fixing the assertion for within-CTA scans can be independent from that.

How should I proceed?

@Jokeren
Copy link
Contributor

Jokeren commented Jan 24, 2025

Added the test cases, they pass if the scan is within a CTA and fail if it is across multiple CTAs.

For now it should happen only within a CTA. Please do not work on anything across CTAs

@safelix safelix marked this pull request as draft January 24, 2025 22:59
Assume the clusterCTAId along the scan axis (cctaIdAxis) is ==0, raise runtime assertion otherwise. Combine the clusterCTAId across the scan axis (cctaIdParallel) into the flatIdParallel and compute numParallelLane per CGA instead of per CTA.
Test BlockedLayout for
    - thread_size=4, num_warps=4, num_ctas=1
    - thread_size=4, num_warps=1, num_ctas=4: CTASplitNum=[1,1]
    - thread_size=4, num_warps=1, num_ctas=4: CTASplitNum=CTAsPerCGA
    - thread_size=1, num_warps=4, num_ctas=4: CTASplitNum=[1,1]
    - thread_size=1, num_warps=4, num_ctas=4: CTASplitNum=CTAsPerCGA
@safelix safelix changed the title Fix ScanLoweringHelper for num_ctas > 1 Fix assertion in ScanLowering for num_ctas>1 Jan 28, 2025
@safelix
Copy link
Author

safelix commented Jan 28, 2025

The initial fix was only superficial, so I implemented within-CTA scan from scratch. I changed testing from end-to-end tests to layout tests to make sure all edge cases are caught. Pytest output for test_scan_layouts:
400 passed, 480 skipped, 19203 deselected in 147.85s (0:02:27)

I'm not sure if the tests are too extensive now, maybe we could just test the combinations with thread_size==1?

Is there anything else I can do to get this merged?

@safelix safelix marked this pull request as ready for review January 28, 2025 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants