You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
spaCy Version Used: v3.5 (displacy) but also in v3.7
Environment Information:
Semi-related: Any guidance on how to modify the tokenizer so that a double spaces would be placed into whitespace_ (ie. ) and not lead to a SPACE token? I did take note of #1707 though putting the additional spaces into whitespace_ seems more logical to me.
How to reproduce the behaviour
Notice the double space in front of
sourire
in the first case vs. the single space in the second caseLes publics avec un sourire chaleureux et
https://demos.explosion.ai/displacy?text=Les%20publics%20avec%20un%20%20sourire%20chaleureux%20%20et&model=fr_core_news_sm
vs.
Les publics avec un sourire chaleureux et
https://demos.explosion.ai/displacy?text=Les%20publics%20avec%20un%20sourire%20chaleureux%20%20et&model=fr_core_news_sm
Your Environment
Semi-related: Any guidance on how to modify the tokenizer so that a double spaces would be placed into
) and not lead to a
whitespace_
(ie.SPACE
token? I did take note of #1707 though putting the additional spaces intowhitespace_
seems more logical to me.Research
a) Maybe related #621
b) Semi-related https://stephantul.github.io/spacy/2019/05/01/tokenizationspacy/
c) Semi-related #9978
The text was updated successfully, but these errors were encountered: