-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default arguments in DebertaConfig
disable relative attention, contrary to the docs and deberta-base
#35335
Comments
Hi @bauwenst, I would prefer not to update the default here, because it might cause an unexpected change in behaviour for users. Instead, I think updating the documentation to explain that relative attention is not enabled by default in the config class is probably the right course of action. |
I understand your point @Rocketknight1, but consider this:
I know at least two academic researchers who have had to throw out all their DeBERTa experiments, and I accidentally avoided this because I initialised |
Hi @bauwenst, you're right that this is a key feature of DeBERTa, but I think we prefer to keep the default as-is both for backward compatibility reasons, but also because this matches the code in the original DeBERTa repo. However, you were totally right to point out that setting this to |
To reiterate my previous comment: Keeping the current defaults means you are assuming that less harm is done by tricking users who want disentangled attention than tricking the tiny number of users above. I don't see how this trade-off makes sense.
I fail to see how this is a case of backwards compatibility.
In other words: by saying you keep it False for backwards compatibility with the DeBERTa repo, you are saying that you are keeping it False so that DeBERTa's default config is a BERT config. This makes no sense, because In even other words: the default value of False, which is now apparently sacred technical debt in |
Regarding backward compatibility, we're just keeping it As a result, I really do think the problem is just that one misleading line in the documentation, as I mentioned! You're welcome to submit a PR to fix it, but I don't want to keep going back and forth over the default value |
The current default DeBERTa model in That's not a matter of perfectionism. That's just incorrect. You are offering a BERT model and calling it DeBERTa. It makes very little sense to change documentation to reflect that you have a bug rather than just fixing the bug. Here's a better idea: let's figure out a system where an additional field is added to the
This would give you your supposed "backward compatibility" while actually instantiating DeBERTa from the default |
System Info
transformers 4.47.0
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The documentation for
DebertaConfig
says thatYet, the most important part of DeBERTa, namely the relative attention, is disabled by default in the model and in the config:
transformers/src/transformers/models/deberta/modeling_deberta.py
Line 191 in 9613933
transformers/src/transformers/models/deberta/configuration_deberta.py
Lines 71 to 75 in 9613933
transformers/src/transformers/models/deberta/configuration_deberta.py
Line 120 in 9613933
Even when users request a given amount of
max_relative_positions
, relative attention stays disabled as long as that option is set to False.transformers/src/transformers/models/deberta/modeling_deberta.py
Lines 201 to 210 in 9613933
And indeed:
This prints False, and when you instantiate a new DeBERTa model, e.g. like
...there are no relative positional embeddings in the model, only absolute positional embeddings. This model will also not do any disentangled attention.
Expected behavior
Conform to the documentation by setting
relative_attention=True
in theDebertaConfig
by default.I would also add a warning when relative attention is False, so that users know very clearly that despite using a DeBERTa model, they are not getting the core feature offered by DeBERTa, namely the relative attention.
The text was updated successfully, but these errors were encountered: