NER differences in spaCy v2 and v3. #8804
Replies: 1 comment 5 replies
-
My understanding is that the NER architecture didn't change much between v2 and v3. However, some data augmentations (case modification) were accidentally left out for the 3.0 models (see #8380). This has been resolved in the 3.1 models, so I would suggest you try them. It's not obvious to me that your issues have anything to do with case augmentation, though I would note that some of your entities have titles ("Rev Dr Hkalam Samson"), which sometimes have inconsistent annotations (annotators may be unclear about whether to include the titles in a PERSON entity or not). I think this is resolved in the version of OntoNotes we're using but it's still something worth keeping in mind. |
Beta Was this translation helpful? Give feedback.
-
I've noticed recently that there are some differences in spaCy NER performance for recognizing person names with 3 tokens. One example would be this snippet. Entity of interest here is
Min Aung Hlaing
:spaCy NER v2 (2.3.7):
spaCy NER v3 (3.0.6):
I think v2 is doing a better job compared to v3 in general.
My main questions is: What are the main differences (if any) between v2 and v3 NER. Is this documented somewhere?
FYI: The outputs of
en_core_web_lg
models are more "consistent"/"equivalent" across spaCy v2 and v3. Not sure why so much difference in theen_core_web_md
models.Beta Was this translation helpful? Give feedback.
All reactions