NER differences in spaCy v2 and v3. #8804

narayanacharya6 · 2021-07-25T07:56:51Z

narayanacharya6
Jul 25, 2021

I've noticed recently that there are some differences in spaCy NER performance for recognizing person names with 3 tokens. One example would be this snippet. Entity of interest here is Min Aung Hlaing:

import spacy
nlp = spacy.load("en_core_web_md")
target_entity_types=["person", "norp", "fac", "org", "gpe", "loc", "product", "event", "law", "language", "work_of_art"]

content = "KBC Chairman Discusses Returning IDPs With Burma Army Chief Kachin Baptist Convention (KBC) Rev Dr Hkalam Samson and six other KBC representatives met with Snr-Gen Min Aung Hlaing, commander-in-chief of the Myanmar Armed Forces, in Kachin State’s capital, Myitkyina, where they discussed returning displaced Kachin, the peace process and the Myitsone dam project, among other things."

doc = nlp(content)
entities = [(str(ent), ent.label_) for ent in doc.ents if ent.label_.lower() in target_entity_types]
entities

spaCy NER v2 (2.3.7):

[('KBC', 'ORG'), ('Discusses Returning', 'PERSON'), ('Burma Army', 'ORG'), ('Kachin Baptist Convention', 'PERSON'), ('Rev Dr Hkalam Samson', 'PERSON'), ('KBC', 'ORG'), ('Snr-Gen Min Aung Hlaing', 'PERSON'),  ('the Myanmar Armed Forces', 'ORG'),  ('Kachin State’s', 'GPE'),  ('Myitkyina', 'GPE'),  ('Kachin', 'GPE'),  ('Myitsone', 'GPE')]

spaCy NER v3 (3.0.6):

[('KBC', 'ORG'), ('Kachin Baptist Convention', 'ORG'), ('Rev Dr Hkalam Samson', 'PERSON'), ('KBC', 'ORG'), ('Snr-Gen Min', 'ORG'), ('Hlaing', 'GPE'), ('the Myanmar Armed Forces', 'ORG'), ('Kachin State’s', 'GPE'), ('Myitkyina', 'PERSON'), ('Kachin', 'GPE')]

I think v2 is doing a better job compared to v3 in general.
My main questions is: What are the main differences (if any) between v2 and v3 NER. Is this documented somewhere?

FYI: The outputs of en_core_web_lg models are more "consistent"/"equivalent" across spaCy v2 and v3. Not sure why so much difference in the en_core_web_md models.

polm · 2021-07-26T02:58:42Z

polm
Jul 26, 2021

My understanding is that the NER architecture didn't change much between v2 and v3. However, some data augmentations (case modification) were accidentally left out for the 3.0 models (see #8380). This has been resolved in the 3.1 models, so I would suggest you try them.

It's not obvious to me that your issues have anything to do with case augmentation, though I would note that some of your entities have titles ("Rev Dr Hkalam Samson"), which sometimes have inconsistent annotations (annotators may be unclear about whether to include the titles in a PERSON entity or not). I think this is resolved in the version of OntoNotes we're using but it's still something worth keeping in mind.

5 replies

narayanacharya6 Jul 26, 2021
Author

Thanks for your response @polm, I saw that discussion but I thought my issue was probably not related.

My follow up question would be why do I see such differences with the md model but not the lg. I am guessing data augmentations were probably missed for both the md and lg models and not just the md model, right? In any case, I'll try 3.1 and see if I continue to see the same issue.

narayanacharya6 Jul 26, 2021
Author

Well spaCy 3.1.0 did not help unfortunately :( I was expecting Snr-Gen Min Aung Hlaing as one span PERSON entity.

This is the whole output for the same code snippet from above:

[('KBC', 'ORG'), ('Burma Army', 'ORG'), ('Rev Dr Hkalam Samson', 'PERSON'), ('KBC', 'ORG'), ('Snr-Gen Min', 'PERSON'), ('Aung Hlaing', 'PERSON'), ('the Myanmar Armed Forces', 'ORG'), ('Kachin State’s', 'ORG'), ('Kachin', 'GPE')]

Possible next steps for me would be:

Correct these using the entity ruler OR
Annotate these samples and fine-tune NER. I am not sure how much help would a few examples be.

Anything else you would suggest to address such cases?

polm Jul 27, 2021

I think those are both reasonable approaches, though as you say if you only have a handful of examples the fine tuning won't help (and you'll have to deal with catastrophic forgetting).

Unfortunately I think you may have just gotten unlucky here with the model changes. As a corollary to issues with accuracy mentioned in #3052, there's no guarantee errors will be consistent between model changes.

polm Jul 27, 2021

About medium vs large - augmentations were consistent between medium and large. I would expect larger models to be more stable in their predictions in general.

narayanacharya6 Jul 27, 2021
Author

Understood. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER differences in spaCy v2 and v3. #8804

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

NER differences in spaCy v2 and v3. #8804

narayanacharya6 Jul 25, 2021

Replies: 1 comment · 5 replies

polm Jul 26, 2021

narayanacharya6 Jul 26, 2021 Author

narayanacharya6 Jul 26, 2021 Author

polm Jul 27, 2021

polm Jul 27, 2021

narayanacharya6 Jul 27, 2021 Author

narayanacharya6
Jul 25, 2021

Replies: 1 comment 5 replies

polm
Jul 26, 2021

narayanacharya6 Jul 26, 2021
Author

narayanacharya6 Jul 26, 2021
Author

narayanacharya6 Jul 27, 2021
Author