Gemma-2 architecture support #528

arbi-dev · 2024-06-30T21:07:23Z

arbi-dev
Jun 30, 2024

Hi, just curious how complicated supporting Gemma-2-27b and 9b architectures with exl2 is likely to be? I'm sure @turboderp is already on it :)

Vhallo · 2024-07-01T00:30:05Z

Vhallo
Jul 1, 2024

Don't think there's much rush? Looks like Flash Attention will take at least some time to properly support it too.

0 replies

turboderp · 2024-07-01T18:20:06Z

turboderp
Jul 1, 2024
Maintainer

I am indeed already on it. There's been a lot of confusion around Gemma2, even more than the first Gemma, and it's not even clear at this moment what a correct implementation looks like. The HF Transformers support is incomplete at best, perhaps even straight up broken for longer than 4k context length. It's somewhere near the top of my list due to popular demand, but it's going to take some time to even decide on a correct reference implementation to adhere to.

Also as pointed out, flash-attn will need to implement soft-capping. This will probably happen soon, but I obviously can't say for sure.

2 replies

arbi-dev Jul 4, 2024
Author

Thanks for the reply! Some new info on softcapping etc here: https://unsloth.ai/blog/gemma2

turboderp Jul 4, 2024
Maintainer

The softcapping is very straightforward, at least as far as inference goes.

The Gemma2 implementation is finished, too. The only thing missing for full support is this PR in flash-attn. I'm hesitant to push the changes until then, since models aren't going to quantize correctly without it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma-2 architecture support #528

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Gemma-2 architecture support #528

arbi-dev Jun 30, 2024

Replies: 2 comments · 2 replies

Vhallo Jul 1, 2024

turboderp Jul 1, 2024 Maintainer

arbi-dev Jul 4, 2024 Author

turboderp Jul 4, 2024 Maintainer

arbi-dev
Jun 30, 2024

Replies: 2 comments 2 replies

Vhallo
Jul 1, 2024

turboderp
Jul 1, 2024
Maintainer

arbi-dev Jul 4, 2024
Author

turboderp Jul 4, 2024
Maintainer