Replies: 2 comments 2 replies
-
Don't think there's much rush? Looks like Flash Attention will take at least some time to properly support it too. |
Beta Was this translation helpful? Give feedback.
-
I am indeed already on it. There's been a lot of confusion around Gemma2, even more than the first Gemma, and it's not even clear at this moment what a correct implementation looks like. The HF Transformers support is incomplete at best, perhaps even straight up broken for longer than 4k context length. It's somewhere near the top of my list due to popular demand, but it's going to take some time to even decide on a correct reference implementation to adhere to. Also as pointed out, flash-attn will need to implement soft-capping. This will probably happen soon, but I obviously can't say for sure. |
Beta Was this translation helpful? Give feedback.
-
Hi, just curious how complicated supporting Gemma-2-27b and 9b architectures with exl2 is likely to be? I'm sure @turboderp is already on it :)
Beta Was this translation helpful? Give feedback.
All reactions