Closed
Description
The EM DASH
character receives an empty POS tags, both coarse and fine, in the 'en' models for spacy 2.0.3:
In 2.0.3:
>>> import spacy
>>> nlp = spacy.load('en')
>>> x = nlp('hello — world')
>>> x.print_tree()
[{'word': 'hello', 'lemma': 'hello', 'NE': '', 'POS_fine': 'UH', 'POS_coarse': 'INTJ', 'arc': 'ROOT', 'modifiers': [{'word': '—', 'lemma': '—', 'NE': '', 'POS_fine': '', 'POS_coarse': '', 'arc': 'punct', 'modifiers': []}, {'word': 'world', 'lemma': 'world', 'NE': '', 'POS_fine': 'NN', 'POS_coarse': 'NOUN', 'arc': 'npadvmod', 'modifiers': []}]}]
But in 1.9.0:
>>> import spacy
>>> nlp = spacy.load('en')
>>> x = nlp('hello — world')
>>> x.print_tree()
[{'word': 'hello', 'lemma': 'hello', 'NE': '', 'POS_fine': 'UH', 'POS_coarse': 'INTJ', 'arc': 'ROOT', 'modifiers': [{'word': '—', 'lemma': '—', 'NE': '', 'POS_fine': ':', 'POS_coarse': 'PUNCT', 'arc': 'punct', 'modifiers': []}, {'word': 'world', 'lemma': 'world', 'NE': '', 'POS_fine': 'NN', 'POS_coarse': 'NOUN', 'arc': 'appos', 'modifiers': []}]}]
Info about spaCy
- spaCy version: 2.0.3
- Platform: Darwin-17.2.0-x86_64-i386-64bit
- Python version: 3.6.3
- Models: en