Implementing GliNER in Haystack v2 Pipeline #8198
-
I wanted to reach out to see if you could provide any guidance or recommendations on the following: Existing Support: Is there any existing support or upcoming feature within Haystack v2 that facilitates the integration of GliNER or similar NER models? Best Practices: Could you advise on the best practices for integrating custom NER models like GliNER into a Haystack pipeline? Specifically, any tips on performance optimization and maintaining compatibility with other Haystack components would be greatly appreciated. Documentation or Examples: If there are any documentation, examples, or community discussions that you can point me to regarding custom model integration, that would be very helpful. Thank you for your time and assistance. I look forward to your guidance on this matter. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi @chaalia out of the box Haystack only supports NER with the NamedEntityExtractor component with possible backends two NER backends: Hugging Face and spaCy. These two backends work with any HF or spaCy model that supports token classification or NER. https://docs.haystack.deepset.ai/docs/namedentityextractor You can use the implementation of https://github.com/deepset-ai/haystack/blob/main/haystack/components/extractors/named_entity_extractor.py or https://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/hugging_face_local.py as an example. Here is some draft code to get you started with your custom component: from haystack import component
@component
class GliNERNamedEntityExtractor:
"""
A component extracting entities with GliNER
"""
@component.output_types(documents=List[Document])
def run(self, documents: List[Document]) -> Dict[str, Any]:
labels = ["Person", "Award", "Date", "Competitions", "Teams"] # you'll want to add a parameter for that
for doc in documents:
entities = model.predict_entities(text, labels, threshold=0.5)
doc.meta[self._METADATA_KEY] = entities # you'll need to make sure entities are in a format that can be stored in meta
return {"documents": documents}
def __init__(self, model_name: str = "urchade/gliner_mediumv2.1")
self.model_name = model_name
def warm_up(self):
"""
Initializes the component.
"""
if self.model is None:
self.model = GLiNER.from_pretrained("urchade/gliner_mediumv2.1") |
Beta Was this translation helpful? Give feedback.
-
Hello @julian-risch thanks a lot for this detailed answer.I really appreciate that. |
Beta Was this translation helpful? Give feedback.
Hi @chaalia out of the box Haystack only supports NER with the NamedEntityExtractor component with possible backends two NER backends: Hugging Face and spaCy. These two backends work with any HF or spaCy model that supports token classification or NER. https://docs.haystack.deepset.ai/docs/namedentityextractor
If you want to use GliNER, I suggest to create a custom component. We have instructions for how to do that here:
https://docs.haystack.deepset.ai/docs/custom-components
You can use the implementation of https://github.com/deepset-ai/haystack/blob/main/haystack/components/extractors/named_entity_extractor.py or https://github.com/deepset-ai/haystack/blob/main/haystack/components/gene…