`spacy-huggingface-pipelines`: Use pretrained Transformer models for text and token classification #12591

adrianeboyd · 2023-05-03T09:51:16Z

adrianeboyd
May 3, 2023

The new spacy-huggingface-pipelines package provides wrappers for Hugging Face Transformers pipelines for text and token classification for inference only. As of Transformers v4.28, pipelines provide all the functionality needed for simple spaCy wrappers.

Installation

pip install spacy-huggingface-pipelines

Usage

Text classification with hf_text_pipe:

import spacy
nlp = spacy.blank("en")
nlp.add_pipe(
    "hf_text_pipe",
    config={"model": "distilbert-base-uncased-finetuned-sst-2-english"},
)
doc = nlp("This is great!")
print(doc.cats)
# {'POSITIVE': 0.9998694658279419, 'NEGATIVE': 0.00013048505934420973}

Token classification with hf_token_pipe:

import spacy
nlp = spacy.blank("en")
nlp.add_pipe("hf_token_pipe", config={"model": "dslim/bert-base-NER"})
doc = nlp("My name is Sarah and I live in London")
print(doc.ents)
# (Sarah, London)

See more config settings and examples in the package README.

Search for models on the Hugging Face Hub:

Notes

hf_text_pipe and hf_token_pipe only support inference, not training or fine-tuning.
For texts longer than the model max length:
- text classification pipelines truncate texts at the model max length
- token classification pipelines (POS, NER) support processing longer texts in overlapping spans and merging the results
The transformer models are always loaded from the transformers cache directory or downloaded from the Hugging Face Hub, not from the directory/package saved with nlp.to_disk or spacy package. The model data is not included in the spaCy model directory.

This means that you need to set up your transformers cache for offline use if you have limited internet access, but it has the advantage that you can use the same models in different pipelines and different python environments without having to duplicate the data on disk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`spacy-huggingface-pipelines`: Use pretrained Transformer models for text and token classification #12591

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

spacy-huggingface-pipelines: Use pretrained Transformer models for text and token classification #12591

adrianeboyd May 3, 2023

Installation

Usage

Notes

Replies: 0 comments

`spacy-huggingface-pipelines`: Use pretrained Transformer models for text and token classification #12591

adrianeboyd
May 3, 2023