Can Warp Specialization feature in 3.2.x version support kernel like FlashAttention v3 ？ #5860

hiworldwzj · 2025-02-08T03:52:43Z

I have read the description in the 3.2.x README. Currently, does the wrap specialization still not support writing kernel similar to FlashAttention v3 due to the lack of multi-level async task implementation (only one Producer and one Consumer, can not do task0 -> task1 -> task2) ? I look forward to and appreciate your response.

hiworldwzj · 2025-02-08T07:10:16Z

@htyu Can you answer this question? thanks.

htyu · 2025-02-08T08:02:36Z

The warp specialization support that comes with 3.2.x is underpinned by an automatic task partition heuristics, which, for flash attention, will enable a cooperative partition scheme. This means either one-producer-one-consumer mode, or one-producer-dual-consumer mode is supported. In the latter, the two consumer groups will run exactly same code but on different parts of the kernel input. This is similar to what FA3 has adopted.

The task0->task1->task2 partition mode is not supported by the current automatic partition heuristics, though it is supported by the underlying code generation machinery, (known as arbitrary data channel, compared to the cooperative load-mma channel). We will be improving the automatic partition heuristics to include that partition mode, likely based on some latency modeling and analysis.

hiworldwzj · 2025-02-08T09:27:36Z

@htyu thanks very much。

hiworldwzj closed this as completed Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can Warp Specialization feature in 3.2.x version support kernel like FlashAttention v3 ？ #5860

Can Warp Specialization feature in 3.2.x version support kernel like FlashAttention v3 ？ #5860

hiworldwzj commented Feb 8, 2025

hiworldwzj commented Feb 8, 2025

htyu commented Feb 8, 2025

hiworldwzj commented Feb 8, 2025

Can Warp Specialization feature in 3.2.x version support kernel like FlashAttention v3 ？ #5860

Can Warp Specialization feature in 3.2.x version support kernel like FlashAttention v3 ？ #5860

Comments

hiworldwzj commented Feb 8, 2025

hiworldwzj commented Feb 8, 2025

htyu commented Feb 8, 2025

hiworldwzj commented Feb 8, 2025