Skip to content

fix(cell): Relax Global-Shared loader/storer to support all possible tile shape.#47

Closed
lcy-seso wants to merge 19 commits intomicrosoft:masterfrom
lcy-seso:row_major
Closed

fix(cell): Relax Global-Shared loader/storer to support all possible tile shape.#47
lcy-seso wants to merge 19 commits intomicrosoft:masterfrom
lcy-seso:row_major

Conversation

@lcy-seso
Copy link
Copy Markdown
Contributor

@lcy-seso lcy-seso commented Jan 23, 2025

resolve #43 resolve #37 resolve #18 resolve #40

It is not necessary to restrict the tile shape utilized by the loader/storer between global and shared memory. While certain tile shapes may not optimally utilize hardware features when accessing global memory, selecting the most appropriate tile shape can be addressed independently.

I would like to emphasize that swizzling is strictly applied within a BaseTile and will not be applied across BaseTiles.

  • This pull request relaxes the restrictions on tile shapes used by the loader/storer, allowing all possible tile shapes to be utilized.

  • Additionally, it aims to improve the implementation by ensuring consistency of all related concepts throughout the codebase.

Tasks:

  • Implement half-typed RowMajor layout.
  • Implement float-typed RowMajor layout.
  • Implement half-typed ColumnMajor layout.
  • Implement float-typed ColumnMajor layout.
  • Verify the performance. (No bank conflicts are observed under ncu.)
  • Clean up all CuTe-related dependencies for global-to-shared data tile transfer.
  • An experimental unified layout for shared memory.

@lcy-seso lcy-seso marked this pull request as draft January 23, 2025 12:56
@lcy-seso lcy-seso changed the title relax swizzle. fix(cell): Relax Global-Shared loader storer to support all possible tile shape. Jan 24, 2025
@lcy-seso lcy-seso changed the title fix(cell): Relax Global-Shared loader storer to support all possible tile shape. fix(cell): Relax Global-Shared loader/storer to support all possible tile shape. Jan 24, 2025
@lcy-seso lcy-seso marked this pull request as ready for review January 24, 2025 06:39
@lcy-seso lcy-seso requested a review from KuangjuX January 24, 2025 07:00
Copy link
Copy Markdown
Contributor

@KuangjuX KuangjuX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit curious: when Swizzle is applied only to a single BaseTile, it can complete the swapping process on one block. However, this might render the functionality of Swizzle ineffective. For example, using $4 \times 64$ as the basic block (since it allows for maximum memory coalescing in global memory), in this case, using Swizzle<3, 3, 3> is actually ineffective because it effectively performs the logic of Swizzle<2, 3, 3>. The same logic is executed in the next basic block as well, which can lead to a 4-way bank conflict.

I'd like to hear how you understand this issue @lcy-seso .

@lcy-seso lcy-seso marked this pull request as draft January 25, 2025 13:28
@lcy-seso lcy-seso force-pushed the row_major branch 2 times, most recently from 04bd051 to ac15cc1 Compare January 27, 2025 11:42
@lcy-seso lcy-seso closed this Mar 11, 2025
@lcy-seso lcy-seso deleted the row_major branch March 11, 2025 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants