French morphologizer mislabeling future, conditional, imperative #13717
KennethBaclawskiJrDialpad
started this conversation in
Language Support
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I'm using spaCy to model French conversations, and I see that the morphologizer is not performing as well as I'd expect for unambiguous irrealis forms (specifically future tense, conditional mood, imperatives). I understand the underlying reason is probably that these forms aren't frequent in the training data, but are there any potential updates or recommended workarounds?
For example:
POS/TAG
). This is similar to [French morphologizer] Mislabelisation of Mood=Imp|Number=Sing|Tense=Present #8147, but broader in that "Remplacez" is unambiguously a verb.MORPH
containsMood=Imp
,Tense=Pres
), even though it is unambiguously the future.MORPH
containsMood=Ind
,Tense=Fut
), even though it is unambiguously the conditional.I'm working with a set of ~160 common French verbs and tested their whole paradigms in this way. 98% of infinitives are recognized correctly, but only 13% of second person plural imperatives (34% even had incorrect POS like in the example above), 37% of future tense, and 7% of conditional mood forms. Sure enough, I see that these three categories are uncommon in the UD French Sequoia data.
How to reproduce the behaviour
Info about spaCy
spaCy version: 3.8.2
Python version: 3.11.9
Pipelines: fr_core_news_sm (3.8.0)
Beta Was this translation helpful? Give feedback.
All reactions