Reusing transformer weights for custom NER leads to bad performance #12396

cuihaoleo · 2023-03-09T18:32:38Z

cuihaoleo
Mar 9, 2023

Hi, I'm trying to train my NER model on top of the frozen transformer weights in en_core_web_trf. The goal is to save some disk spaces when I pack my NER with existing en_core_web_trf components.

I've been able to train my NER model alone with 75~80% F1 using the training config generated by https://spacy.io/usage/. If I understand correctly, the trained pipelines do not have the same transformer weight as en_core_web_trf. So I end up a separate transformer weight (which is big) if I still use en_core_web_trf for other tasks.

But my school knowledge is that with transformer (or exactly BERT) models, you just finetune the last classification layers for different tasks. It look reasonable that I reuse the transformer weights in en_core_web_trf. After looking into the docs, I came up with a modified config like this:

[components.transformer]
source = "en_core_web_trf"
component = "transformer"
...
[training]
frozen_components = ["transformer"]
annotating_components = ["transformer"]
...

spacy train ran. But I got only 57% F1, much worse than not freezing the transformer weights. I'm wondering if this is expected (why?), or I did anything wrong.

Answered by thomashacker

Mar 13, 2023

Hey, thanks for your post!

Unfortunately, a custom NER model with a frozen transformer isn't going to train well. One suggestion could be to try the custom NER with use_upper = true (docs) , but the performance is probably still not going to improve.

View full answer

thomashacker · 2023-03-13T12:21:01Z

thomashacker
Mar 13, 2023

Hey, thanks for your post!

Unfortunately, a custom NER model with a frozen transformer isn't going to train well. One suggestion could be to try the custom NER with use_upper = true (docs) , but the performance is probably still not going to improve.

2 replies

cuihaoleo Mar 13, 2023
Author

Thank you for replying. I tried use_upper but you are right it didn't seem to improve the performance.

Is there an explanation for this? I thought that a BERT model should support different downstream tasks without retraining the transformer layers.

svlandeg Apr 19, 2023
Maintainer

I agree with you that in theory that should be the case, but the transformer in en_core_web_trf has in fact already been fine-tuned to other tasks and is probably already too tailored towards those.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reusing transformer weights for custom NER leads to bad performance #12396

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Reusing transformer weights for custom NER leads to bad performance #12396

cuihaoleo Mar 9, 2023

Replies: 1 comment · 2 replies

thomashacker Mar 13, 2023

cuihaoleo Mar 13, 2023 Author

svlandeg Apr 19, 2023 Maintainer

cuihaoleo
Mar 9, 2023

Replies: 1 comment 2 replies

thomashacker
Mar 13, 2023

cuihaoleo Mar 13, 2023
Author

svlandeg Apr 19, 2023
Maintainer