Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoy the pack scheduling for scheduleIterAlg = 3. #1358

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

vin-huang
Copy link
Collaborator

@vin-huang vin-huang commented Nov 18, 2024

There are three tracks for the sparse MM.
so, needed to make sure the pack items that are required will be poped in each iter.

[idea]

  • Used 3 different pack pools to store the pack instructions of A, B, and Metadata
  • Step 1, only put the required pack into the code (the number of required packs may differ for each mfma iteration).
    check if the inserted pack is forfulled the instPerPack, if not insert next pack instructions until statisfied.
    Step 2, if there still have room before mfma, then insert next pack instructions (the # of instruction that going to be inserted will same as #instPerPack)
    Step 3, put another pack or SNop before the mfma instruction according to the needed latency. the combination of insertion may be 2 packs, 1 pack + snop 0, or snop 1.

@vin-huang vin-huang added the gfx94x Run CI on gfx94x label Nov 18, 2024
@vin-huang vin-huang changed the title [Sparse] Pop the pack items by round robin to make sure the remain items will be pop in the same round of mfma. Refactoy the pack scheduling for scheduleIterAlg = 3. Nov 21, 2024
@vin-huang vin-huang force-pushed the pop_pack_in_rr branch 2 times, most recently from 0450d7f to 2036a35 Compare November 24, 2024 14:03
 * Used 3 different pack pools to store the pack instructions of A, B, and Metadata
 * Step 1, only put the required pack into the code (the number of required packs may differ for each mfma iteration).
           check if the inserted pack is forfulled the instPerPack, if not insert next pack instructions until statisfied.
   Step 2, if there still have room before mfma, then insert next pack instructions (#instPerPack)
   Step 3, put another pack or SNop before the mfma instruction according to the needed latency. the combination of insertion may be 2 packs, 1 pack + snop 0, or snop 1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gfx94x Run CI on gfx94x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant