Gemma capping #34282

ArthurZucker · 2024-10-21T13:03:53Z

What does this PR do?

Adds capping for gemma2, fixes #32877

src/transformers/models/gemma2/modeling_gemma2.py

Cyrilvallez

There are a lot of edge cases in imports which are very hard to deal with with the proposed approach. I think a simpler and more general approach is to do it the other way around:

dump all imports from the modular_xxx.py as is
dump all imports from the dependency files as is (this is currently the case)
Then, in the PostModularConverterCleaner, clean the imports (may even only clean the protected imports, and let ruff remove the other unused, non-protected imports)

This approach is much easier and versatile because in the Cleaner, we have access to the final source code, which is not the case when visiting the modular_xxx.py file (we only see the modular + the dependencies, and it is hard to check imports relative to only the part of the dependency files that we copy in the final file). Thus, it would ensure that all needed imports are present (i.e. we will never reach a weird edge-case when trying to match the imports as we do currently), and we can correctly remove imports that were wrongly added from the dependency files (i.e. see duplicate import in Glm due to Phi3 dependency).
This would greatly simplify the code complexity as well in my opinion.

utils/modular_model_converter.py

vasqu · 2024-10-22T02:27:03Z

src/transformers/models/gemma2/modular_gemma2.py


-        attn_output = torch.nn.functional.scaled_dot_product_attention(
+        attn_output = flex_attention(


Isn't it a bit misleading to use flex attn when we have attn_implementation="sdpa"? My concerns would be

People that previously used sdpa (forced or not) will suddenly have different torch requirements

Sdpa != Flexattn imo, it's a different API, name, and potentially slightly different behaviour

Are the slow tests still passing? We should ensure that it's still behaving the same ish in comparison to eager

Wdyt about making another attn implementation option for flex attn specifically? Not sure if this goes over the goal but control over the specific implementation is always appreciated.

Overall excited to see this, great work!

SDPA version of gemma never "worked" TBH!
I'll probably add a new class for flex attention, this was simpler for testing

ArthurZucker · 2024-10-22T06:36:00Z

Okay @Cyrilvallez good point regarding cleaning! Makes more sense indeed, will update to fix 😉

Cyrilvallez

Very nice approach! Much simpler IMO 🤗 just added some nits for clarity

utils/modular_model_converter.py

HuggingFaceDocBuilderDev · 2024-10-22T12:12:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…nto gemma-capping

…-capping

…haviour (for our tests as well :))

Cyrilvallez

LGTM, I actually love it, I think it's much better to use different attention functions instead of different attention classes (clearer, less duplicated code, and we can easily switch between implementations even after the model has been instantiated)

src/transformers/models/gemma2/modular_gemma2.py

Co-authored-by: Cyril Vallez <[email protected]>

…-capping

kashif reviewed Oct 21, 2024

View reviewed changes

src/transformers/models/gemma2/modeling_gemma2.py Outdated Show resolved Hide resolved

ArthurZucker force-pushed the gemma-capping branch from b0ace40 to 9515b4d Compare October 21, 2024 13:08

ArthurZucker marked this pull request as ready for review October 21, 2024 15:41

ArthurZucker requested a review from Cyrilvallez October 21, 2024 15:42

ArthurZucker added 13 commits October 21, 2024 17:55

softcapping

85d549a

soft cap before the mask

eba5191

style

b9e4a54

...

514a839

super nit

7544feb

update

be1b8c3

fixes

0e0511f

update

03ccc22

small issue with modular

bdda724

fix modular imports

a2b6b12

update

9365c1b

fixup

2108ee3

simplify a hell lot

520120a

ArthurZucker force-pushed the gemma-capping branch from 5d7d66e to 520120a Compare October 21, 2024 15:55

Cyrilvallez reviewed Oct 21, 2024

View reviewed changes

utils/modular_model_converter.py Outdated Show resolved Hide resolved

utils/modular_model_converter.py Outdated Show resolved Hide resolved

utils/modular_model_converter.py Outdated Show resolved Hide resolved

utils/modular_model_converter.py Outdated Show resolved Hide resolved

vasqu reviewed Oct 22, 2024

View reviewed changes

simplify cleaning imports

314ed1f

Cyrilvallez reviewed Oct 22, 2024

View reviewed changes

finish fixing

8830473

ArthurZucker added 2 commits October 22, 2024 14:29

update our design

e4c19d7

nits

7922210

This was referenced Oct 23, 2024

Fix gemma2 with sdpa? #32403

Closed

Add SDPA support for T5 Style Models #30375

Closed

Tests: upgrade test_eager_matches_sdpa_generate #34386

Merged

ArthurZucker and others added 24 commits November 1, 2024 11:58

Merge branch 'gemma-capping' of github.com:huggingface/transformers i…

6f3cabb

…nto gemma-capping

push

a79c4a9

fix

4c6d299

update

607c45d

fix modular order

4598bba

make fix-copies

5727270

updates

198b4c4

update

3d35151

?

da050cd

don't compile for now

e02078c

?

5861bbf

fix some stuff

8c47da2

donc!

09a88d9

fix copies

c06b530

update

89e6f85

fixup

152e0b7

Merge branch 'main' of github.com:huggingface/transformers into gemma…

46d8fa7

…-capping

?

006e869

fix two tests

159c65a

fix?

56ea5b9

for now, don't use head info

4c3deb9

eager when output attentoin and sdpa or flash as it's the simplest be…

9e3609d

…haviour (for our tests as well :))

fix-copies

21edaed

revert sdpa check

b5d9819

ArthurZucker requested a review from Cyrilvallez November 5, 2024 09:37

Cyrilvallez approved these changes Nov 5, 2024

View reviewed changes

ArthurZucker and others added 4 commits November 6, 2024 08:23

Apply suggestions from code review

5a3dade

Co-authored-by: Cyril Vallez <[email protected]>

Merge branch 'main' of github.com:huggingface/transformers into gemma…

faf433b

…-capping

rebase, fix-copies and push

1da75e1

add a slow integration test

aca9120

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma capping #34282

Gemma capping #34282

ArthurZucker commented Oct 21, 2024 •

edited

Loading

Cyrilvallez left a comment

vasqu Oct 22, 2024

ArthurZucker Oct 22, 2024

ArthurZucker commented Oct 22, 2024

Cyrilvallez left a comment

HuggingFaceDocBuilderDev commented Oct 22, 2024

Cyrilvallez left a comment


		attn_output = torch.nn.functional.scaled_dot_product_attention(
		attn_output = flex_attention(

Gemma capping #34282

Are you sure you want to change the base?

Gemma capping #34282

Conversation

ArthurZucker commented Oct 21, 2024 • edited Loading

What does this PR do?

Cyrilvallez left a comment

Choose a reason for hiding this comment

vasqu Oct 22, 2024

Choose a reason for hiding this comment

ArthurZucker Oct 22, 2024

Choose a reason for hiding this comment

ArthurZucker commented Oct 22, 2024

Cyrilvallez left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 22, 2024

Cyrilvallez left a comment

Choose a reason for hiding this comment

ArthurZucker commented Oct 21, 2024 •

edited

Loading