Assistance on implementing Flash Attention 2 for Turing #1342

samuelzxu · 2024-11-17T18:53:50Z

Hi All,

I've seen that there are commented lines flash-attention/csrc/flash_attn/flash_api.cpp in the codebase suggesting there have been attempts in the past at getting flash attention 2 working. This also suggests that there have been some significant barriers to this effort.

I'd like to try my hand at this task, but would really appreciate insights the authors or any other readers might have on this topic. What are the most significant obstacles? Is it the architecture-specific optimizations, dev time, tiling, or something else?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assistance on implementing Flash Attention 2 for Turing #1342

Assistance on implementing Flash Attention 2 for Turing #1342

samuelzxu commented Nov 17, 2024

Assistance on implementing Flash Attention 2 for Turing #1342

Assistance on implementing Flash Attention 2 for Turing #1342

Comments

samuelzxu commented Nov 17, 2024