Italian NER clarification #6489
-
Hello everyone First of all, I'm kind of a noob of spaCy and NLP in general, so please be gentle if I'm asking trivial questions or not using proper names :) I coded a little about NER in english, and now I'm starting to use spaCy for NER in italian medical records. At the moment I'm trying to use it for very simple sentences, but I noticed that "simple" entities, like dates for example, are not recognized. Thanks a lot for your time and patience! Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi, the English and Italian models are trained on unrelated datasets (OntoNotes vs. WikiNER) that don't use the same label schemes. If you have data with entity annotation, you can train a new model or extend an existing model with new types. Here are the basics for how to get started: https://spacy.io/usage/training#ner The WikiNER corpus is available under a CC BY 4.0 license, so if it made sense for your task (if Wikipedia-style texts are similar to the texts you want to process), you could annotate your own additional entity types on this data and then train or extend a model using examples that contain both the old and new types. Otherwise I'm not familiar with what's available for Italian, you may want to see if there are other existing datasets could be useful. |
Beta Was this translation helpful? Give feedback.
Hi, the English and Italian models are trained on unrelated datasets (OntoNotes vs. WikiNER) that don't use the same label schemes.
If you have data with entity annotation, you can train a new model or extend an existing model with new types. Here are the basics for how to get started: https://spacy.io/usage/training#ner
The WikiNER corpus is available under a CC BY 4.0 license, so if it made sense for your task (if Wikipedia-style texts are similar to the texts you want to process), you could annotate your own additional entity types on this data and then train or extend a model using examples that contain both the old and new types. Otherwise I'm not familiar with what's available for I…