Fix some nits for layout #9
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request simplifies the implementation of the
compute_attn_1rowblock
function in theflash_attention_fwd_kernel.h
file by removing support for the causal mask feature. The changes streamline the code and improve maintainability by eliminating unused variables, memory allocations, and operations related to the causal mask.Removal of Causal Mask Support:
Removed all references to the causal mask, including memory allocation (
sCausalMask
), global memory tile (gCausalMask
), and thread slice partitioning (gmem_thr_copy_CausalMask
). [1] [2] [3] [4] [5]Deleted logic for handling the causal mask in shared memory, including initialization, copying, and synchronization steps.
Code Simplification:
Adjusted shared memory layout and memory copy operations to focus solely on the remaining features, such as
ZeroHold
. [1] [2]Updated predicates and identity tensors to remove any references to the causal mask, ensuring the code reflects the simplified functionality. [1] [2]