Skip to content

em-dash receives empty POS tag with 'en' models #1700

Closed
@gabbard

Description

@gabbard

The EM DASH character receives an empty POS tags, both coarse and fine, in the 'en' models for spacy 2.0.3:

In 2.0.3:

>>> import spacy
>>> nlp = spacy.load('en')
>>> x = nlp('hello — world')
>>> x.print_tree()
[{'word': 'hello', 'lemma': 'hello', 'NE': '', 'POS_fine': 'UH', 'POS_coarse': 'INTJ', 'arc': 'ROOT', 'modifiers': [{'word': '—', 'lemma': '—', 'NE': '', 'POS_fine': '', 'POS_coarse': '', 'arc': 'punct', 'modifiers': []}, {'word': 'world', 'lemma': 'world', 'NE': '', 'POS_fine': 'NN', 'POS_coarse': 'NOUN', 'arc': 'npadvmod', 'modifiers': []}]}]

But in 1.9.0:

>>> import spacy
>>> nlp = spacy.load('en')
>>> x = nlp('hello — world')
>>> x.print_tree()
[{'word': 'hello', 'lemma': 'hello', 'NE': '', 'POS_fine': 'UH', 'POS_coarse': 'INTJ', 'arc': 'ROOT', 'modifiers': [{'word': '—', 'lemma': '—', 'NE': '', 'POS_fine': ':', 'POS_coarse': 'PUNCT', 'arc': 'punct', 'modifiers': []}, {'word': 'world', 'lemma': 'world', 'NE': '', 'POS_fine': 'NN', 'POS_coarse': 'NOUN', 'arc': 'appos', 'modifiers': []}]}]

Info about spaCy

  • spaCy version: 2.0.3
  • Platform: Darwin-17.2.0-x86_64-i386-64bit
  • Python version: 3.6.3
  • Models: en

Metadata

Metadata

Assignees

No one assigned

    Labels

    feat / taggerFeature: Part-of-speech taggerlang / enEnglish language data and modelsmodelsIssues related to the statistical modelsperf / accuracyPerformance: accuracy

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions