Default value for mean_resizing in resize_token_embeddings should be False #35357

cyr0930 · 2024-12-20T09:23:02Z

System Info

transformers>=4.46

Who can help?

@Arthur

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

When running resize_token_embeddings with additional tokens

Expected behavior

It is much slower to run resize_token_embeddings with mean_resizing=True.
Maybe it's because nowadays token embedding size became much larger than before (like gemma2, qwen2, ...).

So I think the default value for mean_resizing in resize_token_embeddings should be False now,
or implementation has to be fix to keep resizing speed as before.

Note: Maybe it happens if I'm using deepspeed stage3, but I didn't thoroughly investigate that.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2024-12-20T15:08:45Z

Hi @cyr0930, we think the default value of True gives better downstream performance, even if it's slower. If you find that deepspeed speed specifically is very bad, though, it might be possible to improve that!

cyr0930 · 2024-12-20T16:17:42Z

Okay I'll check if it depends on deepspeed setting. Thanks for replying :)

cyr0930 added the bug label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default value for mean_resizing in resize_token_embeddings should be False #35357

Default value for mean_resizing in resize_token_embeddings should be False #35357

cyr0930 commented Dec 20, 2024

Rocketknight1 commented Dec 20, 2024 •

edited

Loading

cyr0930 commented Dec 20, 2024

Default value for mean_resizing in resize_token_embeddings should be False #35357

Default value for mean_resizing in resize_token_embeddings should be False #35357

Comments

cyr0930 commented Dec 20, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Dec 20, 2024 • edited Loading

cyr0930 commented Dec 20, 2024

Rocketknight1 commented Dec 20, 2024 •

edited

Loading