Integrating Custom Entity Extraction with POS Tagging and Parsing in spaCy: Seeking Advice and Clarifications #13483
Unanswered
ANoubani
asked this question in
Help: Installation
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm not exactly sure what question to ask, as I'm uncertain if I'm approaching this correctly. I'll explain my case and then pose my questions. I want to train a spaCy NER model to automatically extract custom entities (UML entities: ACTOR, USECASE, RELATION). From my understanding, I need to prepare annotated data that specifies these new labels to train the NER component. However, I also want the trained model to perform additional processing like applying POS tagging, parsing, and lemmatization. I believe this will enhance the accuracy of the newly trained model's predictions. For instance, I want it to more likely recognize names as actors and verbs as use cases. Is this assumption correct?
I've read the spaCy documentation and understand that if I want to use components without updating their weights, I need to freeze them. In this case, should I list them in both the main pipeline and the frozen components, or just in the frozen components? Whenever I add components to both areas, the training command fails and throws this error:
ValueError: [E143] Labels for component 'tagger' not initialized. This can be fixed by calling add_label, or by providing a representative batch of examples to the component's initialize method.
Questions:
1- If I only include ["tok2vec", "ner"] in the pipeline, will the other components be trained as well? If not, how can I train them in both scenarios: updating their weights and not updating their weights?
2- How do I initialize a component properly?
3- For my purposes, do I need to use en_core_web_trf or en_core_web_lg?
4- which component I need to train on extracting relations between actors and entities, like I want my application to specify that this ACTOR performs this USECASE?
spaCy version 3.7.4
Platform Windows-11-10.0.22631-SP0
Python version 3.12.3
Pipelines en_core_web_trf (3.7.3)
THANKYOU!!
Beta Was this translation helpful? Give feedback.
All reactions