Does spacy's NER models generalize - How can I make a NER model to detect correctly new words? #6594
Replies: 4 comments 17 replies
-
I am having same problem above with SpaCy 3rc02 as well. I have used this project as template: https://github.com/explosion/projects/tree/v3/pipelines/ner_wikiner Training ended with %97 accuracy. If text contains entities from training dataset, it works otherwise no result for similar cases above... I have tried to add word vectors as well. Nothing has changed. |
Beta Was this translation helpful? Give feedback.
-
The models should in fact already do that. How big is the training set you're training on? You want to inspect specifically the number of training instances per class, and the variety in them. The bigger and more varied your training data, the more general your model will be. To determine whether or not you're overfitting, it would be a good idea to get a dev dataset that is independent of your training dataset, and measure the performance (F-score, accuracy, ...) on that dev dataset while you're training. When your training loss keeps going down but your dev performance gets worse, that's the point where you're overfitting. |
Beta Was this translation helpful? Give feedback.
-
@echatzikyriakidis I think that the best way is to try with external word embeddings, trained with a huge corpus. |
Beta Was this translation helpful? Give feedback.
-
As someone said further upthread, the NER model should absolutely be learning That said, in general you can try to use data augmentation to improve this. For When understanding why a particular entity is found or not, it's good to keep
You can look at each of these for a given example and ask yourself which might Is the context OK? If you replaced the entity with a blank in the sentence,
This would not be a GPE, and would probably be an ORG. But it could also be a Is the literal token OK? This is usually easy to check. Unknown tokens are
Maybe John was hired by a company called "The", but the model will have a hard Is the shape what you would expect? To avoid just memorizing tokens, spaCy The takeaway is that if your text is not formatted like the |
Beta Was this translation helpful? Give feedback.
-
I have created a German model and I test it with:
"Boris Johnson wurde in Google gearbeitet"
The result I get is two entities detected for Boris Johnson and Google and they are correct.
Both Boris Johnson and Google were in my dataset.
However, when I test the model and replace Google with something else, e.g., Yahoo it doesn't work.
Why is that? I read that spaCy models generalize and learn from local features and surrounding context.
How can I make a model to generalize and detect new pieces of text. New names, new companies etc. Not only the ones that exist in the training dataset.
Beta Was this translation helpful? Give feedback.
All reactions