Add flash params to csrc #5

LoserCheems · 2025-05-14T06:30:01Z

This pull request introduces a new header file, flash.h, which defines the core structures and functions for a CUDA-based multi-head attention mechanism. The changes include the addition of parameter structures for forward and backward passes, utility constants, and function templates for executing the attention mechanism.

Core functionality for multi-head attention:

Definition of parameter structures:
- Added QKV_params to encapsulate query, key, and value tensor pointers, strides, and head-related dimensions.
- Added ZeroHold_params to manage zero-hold states, causal masks, and associated strides for attention mechanisms.
- Introduced Flash_fwd_params and Flash_bwd_params to extend the above structures for forward and backward pass parameters, including dropout, scaling factors, and random state handling.
Function templates for execution:
- Added templates run_mha_fwd_, run_mha_fwd_splitkv_dispatch, and run_mha_bwd_ for executing forward and backward multi-head attention operations with CUDA streams.
Namespace organization:
- Encapsulated all additions within FLASH_NAMESPACE for modularity and clarity.

LoserCheems added 2 commits May 14, 2025 14:28

Add flash params to csrc

9aebc3f

Update zero-hold strides

9689d9b

LoserCheems merged commit 8890fe7 into main May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add flash params to csrc #5

Add flash params to csrc #5

Uh oh!

LoserCheems commented May 14, 2025

Uh oh!

Uh oh!

Add flash params to csrc #5

Add flash params to csrc #5

Uh oh!

Conversation

LoserCheems commented May 14, 2025

Core functionality for multi-head attention:

Uh oh!

Uh oh!