You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
When running resize_token_embeddings with additional tokens
Expected behavior
It is much slower to run resize_token_embeddings with mean_resizing=True.
Maybe it's because nowadays token embedding size became much larger than before (like gemma2, qwen2, ...).
So I think the default value for mean_resizing in resize_token_embeddings should be False now,
or implementation has to be fix to keep resizing speed as before.
Note: Maybe it happens if I'm using deepspeed stage3, but I didn't thoroughly investigate that.
The text was updated successfully, but these errors were encountered:
Hi @cyr0930, we think the default value of True gives better downstream performance, even if it's slower. If you find that deepspeed speed specifically is very bad, though, it might be possible to improve that!
System Info
transformers>=4.46
Who can help?
@Arthur
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When running resize_token_embeddings with additional tokens
Expected behavior
It is much slower to run resize_token_embeddings with mean_resizing=True.
Maybe it's because nowadays token embedding size became much larger than before (like gemma2, qwen2, ...).
So I think the default value for mean_resizing in resize_token_embeddings should be False now,
or implementation has to be fix to keep resizing speed as before.
Note: Maybe it happens if I'm using deepspeed stage3, but I didn't thoroughly investigate that.
The text was updated successfully, but these errors were encountered: