[French morphologizer] Mislabelisation of Mood=Imp|Number=Sing|Tense=Present #8147
Replies: 3 comments
-
Yes, this is due to a mismatch between the training data and your task. We've thought about doing data augmentation for things like imperatives, questions, and informal forms, which are often missing from the training corpora, but we don't have anything published at this point. |
Beta Was this translation helpful? Give feedback.
-
Hello, First of all thank you very much for the quick answer! I've already looked at several (including Stanza for instance) and it seems that this is a regular blind spot of datasets used to train NLP models. If I were to develop my own instances regarding this specific use case would there be a simple way to retrain the model using those ? Thanks again, |
Beta Was this translation helpful? Give feedback.
-
Let me move this to the discussion board. This issue will be locked, but you can follow the link to the new discussion thread. |
Beta Was this translation helpful? Give feedback.
-
Mislabelisation of Mood=Imp|Number=Sing|Tense=Present
Hello,
I am using spaCy as part of an ungoing project on French textbooks and I have noticed that verbs found in Imperative, with Singular Number and in Present Tense (ie: "Mange ton repas", "Fais tes devoirs", etc.) are almost systematiccally misclassified as NOUN.
Am I the first one to notice this or is this a well-known biasi of the french pre-trained models ?
After investigating the dataset Sequoia, I think this is because every example of Imperative verbs are given in Plural Number (ie: "Mangez votre repas", "Faites vos devoirs", etc.).
Moreover, Imperative verbs in present tense and singular number are often homographs of NOUNs which makes the task even more difficult (ie: "Forme", "Danse", "Place", etc.)
Anyway, I know this is probably a specific issue to the kind of texts I'm manipulating, that heavily rely on Singular Present Imperative verbs to express the instructions of exercices, and hence a difficult one to adress in a pre-trained model, but I thought it was worth mentioning.
Idea: one idea to explore to fix this would be to do data augmentation from the instances in Plural Present Imperative form.
How to reproduce the behaviour
Info about spaCy
Beta Was this translation helpful? Give feedback.
All reactions