Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assistance on implementing Flash Attention 2 for Turing #1342

Open
samuelzxu opened this issue Nov 17, 2024 · 0 comments
Open

Assistance on implementing Flash Attention 2 for Turing #1342

samuelzxu opened this issue Nov 17, 2024 · 0 comments

Comments

@samuelzxu
Copy link

Hi All,

I've seen that there are commented lines flash-attention/csrc/flash_attn/flash_api.cpp in the codebase suggesting there have been attempts in the past at getting flash attention 2 working. This also suggests that there have been some significant barriers to this effort.

I'd like to try my hand at this task, but would really appreciate insights the authors or any other readers might have on this topic. What are the most significant obstacles? Is it the architecture-specific optimizations, dev time, tiling, or something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant