Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SpanMarker #1237

Open
Hveemos opened this issue Dec 21, 2023 · 7 comments
Open

Support for SpanMarker #1237

Hveemos opened this issue Dec 21, 2023 · 7 comments
Labels
analyzer enhancement New feature or request good first issue Good for newcomers

Comments

@Hveemos
Copy link

Hveemos commented Dec 21, 2023

I have found that SpanMarker models such as tomaarsen/span-marker-mbert-base-multinerd to be very usefull at NER recognition. But Presidio does not seem to support the class 'span_marker.configuration.SpanMarkerConfig'.

Can I resolve this myself or might this be added as an additional feature?

Regards
Joakim

@ogencoglu
Copy link

+1 for this

@ogencoglu
Copy link

Maybe @tomaarsen point some tips.

@omri374
Copy link
Contributor

omri374 commented Dec 24, 2023

Thanks! Great suggestion. Something along those lines? https://github.com/tomaarsen/SpanMarkerNER?tab=readme-ov-file#using-pretrained-spanmarker-models-with-spacy

In the transformers case, we used spacy-huggingface-pipelines to integrate a huggingface/transformers model into a spacy pipeline, because presidio requires the other modules in spaCy in order to run (tokenization, lemmatization etc.). See more here: https://microsoft.github.io/presidio/analyzer/nlp_engines/transformers/#how-ner-results-flow-within-presidio

@omri374
Copy link
Contributor

omri374 commented Jan 2, 2024

@Hveemos would you be interested in adding this capability?

@Hveemos
Copy link
Author

Hveemos commented Jan 8, 2024

Yes, something along those lines (https://github.com/tomaarsen/SpanMarkerNER?tab=readme-ov-file#using-pretrained-spanmarker-models-with-spacy). But I'm sorry to say that I don't have the time or competency to contribute in this project. I solved this with duct tape instead, i.e. running the SpanMarker on each line and using regex to redact (takes forever though).

@omri374
Copy link
Contributor

omri374 commented Jan 9, 2024

@Hveemos the easiest solution would be to create a recognizer class and glue the SpanMarker output to a RecognizerResult object. See something similar here, we did for the flair package:

self.model.predict(sentences)

@ogencoglu
Copy link

UniversalNER is also another interesting generative candidate: https://universal-ner.github.io/

@omri374 omri374 added enhancement New feature or request good first issue Good for newcomers analyzer labels Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants