Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modernbert logits do not have gradient #35386

Open
3 of 4 tasks
andersonbcdefg opened this issue Dec 21, 2024 · 1 comment
Open
3 of 4 tasks

modernbert logits do not have gradient #35386

andersonbcdefg opened this issue Dec 21, 2024 · 1 comment
Labels

Comments

@andersonbcdefg
Copy link

System Info

latest transformers version (from source), python 3.10

Who can help?

@ArthurZ

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

model_id = "answerdotai/ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id).to("cuda")

Create a simple input

inputs = {
"input_ids": torch.randint(0, 1000, (1, 10)).cuda(),
"attention_mask": torch.ones(1, 10).cuda()
}

Set to train mode and check all parameters

model.train()
for name, param in model.named_parameters():
print(f"{name}: requires_grad = {param.requires_grad}")

Do forward pass

outputs = model(**inputs)
print("\nOutput logits requires_grad:", outputs.logits.requires_grad)
print("Output logits grad_fn:", outputs.logits.grad_fn)

Expected behavior

When I do this, the output is:
Output logits requires_grad: False
Output logits grad_fn: None

Despite explicitly setting all the parameters to requires_grad = True! And when printing all the params, they all are correctly set to requires_grad = True.

Just to sanity check, I ran the same code but set model_id = "bert-base-uncased", and got:
Output logits requires_grad: True
Output logits grad_fn: <ViewBackward0 object at 0x7f0ca6abf370>

So it's def a ModernBERT specific problem!

@NielsRogge
Copy link
Contributor

That's expected since the from_pretrained method puts a model in evaluation mode by default. Try setting model.train() to put it in training mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants