Skip to content

Italian NER clarification #6489

Discussion options

You must be logged in to vote

Hi, the English and Italian models are trained on unrelated datasets (OntoNotes vs. WikiNER) that don't use the same label schemes.

If you have data with entity annotation, you can train a new model or extend an existing model with new types. Here are the basics for how to get started: https://spacy.io/usage/training#ner

The WikiNER corpus is available under a CC BY 4.0 license, so if it made sense for your task (if Wikipedia-style texts are similar to the texts you want to process), you could annotate your own additional entity types on this data and then train or extend a model using examples that contain both the old and new types. Otherwise I'm not familiar with what's available for I…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / it Italian language data and models feat / ner Feature: Named Entity Recognizer
2 participants
Converted from issue

This discussion was converted from issue #6489 on December 10, 2020 13:12.