Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default value for mean_resizing in resize_token_embeddings should be False #35357

Open
4 tasks
cyr0930 opened this issue Dec 20, 2024 · 2 comments
Open
4 tasks
Labels

Comments

@cyr0930
Copy link

cyr0930 commented Dec 20, 2024

System Info

transformers>=4.46

Who can help?

@Arthur

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

When running resize_token_embeddings with additional tokens

Expected behavior

It is much slower to run resize_token_embeddings with mean_resizing=True.
Maybe it's because nowadays token embedding size became much larger than before (like gemma2, qwen2, ...).

So I think the default value for mean_resizing in resize_token_embeddings should be False now,
or implementation has to be fix to keep resizing speed as before.

Note: Maybe it happens if I'm using deepspeed stage3, but I didn't thoroughly investigate that.

@cyr0930 cyr0930 added the bug label Dec 20, 2024
@Rocketknight1
Copy link
Member

Rocketknight1 commented Dec 20, 2024

Hi @cyr0930, we think the default value of True gives better downstream performance, even if it's slower. If you find that deepspeed speed specifically is very bad, though, it might be possible to improve that!

@cyr0930
Copy link
Author

cyr0930 commented Dec 20, 2024

Okay I'll check if it depends on deepspeed setting. Thanks for replying :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants