Reusing transformer weights for custom NER leads to bad performance #12396
-
Hi, I'm trying to train my NER model on top of the frozen transformer weights in I've been able to train my NER model alone with 75~80% F1 using the training config generated by https://spacy.io/usage/. If I understand correctly, the trained pipelines do not have the same transformer weight as en_core_web_trf. So I end up a separate transformer weight (which is big) if I still use But my school knowledge is that with transformer (or exactly BERT) models, you just finetune the last classification layers for different tasks. It look reasonable that I reuse the transformer weights in
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hey, thanks for your post! Unfortunately, a custom NER model with a frozen transformer isn't going to train well. One suggestion could be to try the custom NER with |
Beta Was this translation helpful? Give feedback.
Hey, thanks for your post!
Unfortunately, a custom NER model with a frozen transformer isn't going to train well. One suggestion could be to try the custom NER with
use_upper = true
(docs) , but the performance is probably still not going to improve.