modernbert logits do not have gradient #35386

andersonbcdefg · 2024-12-21T15:15:26Z

System Info

latest transformers version (from source), python 3.10

Who can help?

@ArthurZ

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

model_id = "answerdotai/ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id).to("cuda")

Create a simple input

inputs = {
"input_ids": torch.randint(0, 1000, (1, 10)).cuda(),
"attention_mask": torch.ones(1, 10).cuda()
}

Set to train mode and check all parameters

model.train()
for name, param in model.named_parameters():
print(f"{name}: requires_grad = {param.requires_grad}")

Do forward pass

outputs = model(**inputs)
print("\nOutput logits requires_grad:", outputs.logits.requires_grad)
print("Output logits grad_fn:", outputs.logits.grad_fn)

Expected behavior

When I do this, the output is:
Output logits requires_grad: False
Output logits grad_fn: None

Despite explicitly setting all the parameters to requires_grad = True! And when printing all the params, they all are correctly set to requires_grad = True.

Just to sanity check, I ran the same code but set model_id = "bert-base-uncased", and got:
Output logits requires_grad: True
Output logits grad_fn: <ViewBackward0 object at 0x7f0ca6abf370>

So it's def a ModernBERT specific problem!

The text was updated successfully, but these errors were encountered:

NielsRogge · 2024-12-22T17:12:57Z

That's expected since the from_pretrained method puts a model in evaluation mode by default. Try setting model.train() to put it in training mode.

andersonbcdefg added the bug label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modernbert logits do not have gradient #35386

modernbert logits do not have gradient #35386

andersonbcdefg commented Dec 21, 2024

NielsRogge commented Dec 22, 2024

modernbert logits do not have gradient #35386

modernbert logits do not have gradient #35386

Comments

andersonbcdefg commented Dec 21, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Create a simple input

Set to train mode and check all parameters

Do forward pass

Expected behavior

NielsRogge commented Dec 22, 2024