Skip to content

Conversation

soundOfDestiny
Copy link
Contributor

In the original code, the producer could commit when no copy is done.

We can observe incorrect results when setting static const int StagesPV == 4; instead of static const int StagesPV = StagesQK;, i.e. using a smaller StagesPV value.

The producer could commit when no copy is done.
@thakkarV
Copy link
Collaborator

@richardmcai

@cameronshinn
Copy link

@soundOfDestiny can you provide a list of steps to reproduce? I cannot reproduce the incorrect results on 77_blackwell_mla_2sm_fp16.

Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@soundOfDestiny
Copy link
Contributor Author

@richardmcai

StagesPV - 1 copies are issued before cutlass::arch::cp_async_wait<StagesPV - 1>(); in Line 996 so no copy is waited for.

@cameronshinn
Copy link

@soundOfDestiny I was looking into this issue, but I didn't see any correctness problems when I set StagesPV = 4. Do you have a configuration you tested that I can use to reproduce?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants