-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
em-dash receives empty POS tag with 'en' models #1700
Comments
hmm - it's actually a little more complicated. If we take the 2.03 example above and then do:
So |
This problem doesn't seem to be English-specific. I've also tried several other languages (German, Spanish, Portuguese) and the tags for em dash are all empty. |
I have the same problem with empty tags for German:
results in:
... which means that only token.shape_, token.is_alpha, token.is_stop have an actual return value, all the others seem to be empty strings. Do you have any suggestions on how to fix this issue or is there another workaround? Thank you so much. Info about spaCyspaCy version: 1.9.0: |
That's strange and basically means that the model isn't predicting anything... 🤔 It makes sense that the lexical attributes all exist, because those aren't predicted by the model. Could you run |
I just upgraded to spacy version 2.0.12 in order to be able to run the validate option. That's what I get: $ spacy --info
$ python -m spacy validate
It looks like spacy 2.0.12 is incompatible with the German model, the command to update the German model packages is empty (see above). If I try to download the model for German, I get a compatibility error: $ python -m spacy download de-core-news-md What would you suggest in order to use the German model for pos-tagging? Can you provide an updated version of "de-core-news-md" for the most recent version of spaCy v2.0.12? |
@IsabelMeraner Thanks for the update! I was confused for a second where the So, in summary: for spaCy |
@ines Thanks a lot for the detailed explanation. The linear models from the older spaCy versions must have been the reason for this behaviour: After using spaCy One last question regarding the usage of the
However, the larger
Do you have any suggestions here? Thank you very much. |
@IsabelMeraner Glad it's working now! And it looks like you don't actually have the |
@ines Of course, it was only installed in the other environment. Now it's working fine with the bigger model. Thanks again for your help! |
I'm using spacy 2.0.13 and have the compatible model en_core_web_lg 2.0.0 installed, but I still get an empty tag for "—" in the sentence: "11.00am — Tony has been given an appointment at the local hospital." I'm running it on Python 3.6.2 with IPython 6.1.0. And I get the same behaviour from the _sm model (also 2.0.0). |
Merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
The
EM DASH
character receives an empty POS tags, both coarse and fine, in the 'en' models for spacy 2.0.3:In 2.0.3:
But in 1.9.0:
Info about spaCy
The text was updated successfully, but these errors were encountered: