Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence Transformers components do not support ONNX or OpenVINO formats #8802

Open
1 task done
lbux opened this issue Feb 4, 2025 · 3 comments · May be fixed by #8813
Open
1 task done

Sentence Transformers components do not support ONNX or OpenVINO formats #8802

lbux opened this issue Feb 4, 2025 · 3 comments · May be fixed by #8813
Labels
type:feature New feature or request

Comments

@lbux
Copy link
Contributor

lbux commented Feb 4, 2025

Describe the bug
Passing in a backend (onnx or openvino) in model_kwargs for SentenceTransformers components causes duplicate values for backend expected in Sentence Transformer's Transformer class. The call self._load_model(model_name_or_path, config, cache_dir, backend, is_peft_model, **model_args) fails and is uncaught or handled.
Error message
Error that was thrown (if available)

Traceback (most recent call last):
  File "/home/ulises/quant_test/hello.py", line 38, in <module>
    embedder_onnx.warm_up()
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/sentence_transformers_document_embedder.py", line 186, in warm_up
    self.embedding_backend = _SentenceTransformersEmbeddingBackendFactory.get_embedding_backend(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/backends/sentence_transformers_backend.py", line 36, in get_embedding_backend
    embedding_backend = _SentenceTransformersEmbeddingBackend(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/backends/sentence_transformers_backend.py", line 72, in __init__
    self.model = SentenceTransformer(
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 1739, in _load_sbert_model
    module = module_class(model_name_or_path, cache_dir=cache_folder, backend=self.backend, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/sentence_transformers/models/Transformer.py", line 87, in __init__
    self._load_model(model_name_or_path, config, cache_dir, backend, is_peft_model, **model_args)
TypeError: Transformer._load_model() got multiple values for argument 'backend'

Expected behavior
backend should be properly passed into SentenceTransformer once using their built in backend parameter otherwise the library infers torch as the backend even though model_kwargs contains onnx or openvino

Additional context
About a month ago I looked into using quantized models for the Sentence Transformer because I knew it was technically possible, but I wasn't sure if Haystack's implementation could properly handle it. I ended up working on a different issue until I noticed someone also had the same question, so I decided to pick this up again. I will make a PR to support onnx and openvino formats. Pytorch quantization (using dtype float16 or bfloat16) already works.
To Reproduce

from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.dataclasses import Document
from haystack.utils import ComponentDevice

documents = [
    Document(content="Transformers, the movie, was released in 2007"),
    Document(content="This is an irrelevant document"),
    Document(
        content="Transformers, the Machine Learning architecture, was released in 2017"
    ),
]
query = "When was the movie released?"
embedder_onnx = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"backend": "onnx"},
)
embedder_onnx.warm_up()
onnx_embedded_documents = embedder_onnx.run(documents=documents)

FAQ Check

System:

  • OS: Ubuntu
  • GPU/CPU: N/A
  • Haystack version (commit or version number): 2.9.0
  • DocumentStore: N/A
  • Reader: N/A
  • Retriever: N/A
@anakin87 anakin87 added the type:feature New feature or request label Feb 4, 2025
@anakin87
Copy link
Member

anakin87 commented Feb 4, 2025

Hey, @lbux...

This feature looks interesting to me and I have been thinking about it for a while.

To make it explicit and clean, I would propose to expose the backend argument in the __init__ of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder. Makes sense?

@lbux
Copy link
Contributor Author

lbux commented Feb 4, 2025

Hey, @lbux...

This feature looks interesting to me and I have been thinking about it for a while.

To make it explicit and clean, I would propose to expose the backend argument in the __init__ of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder. Makes sense?

This is initially what I had in mind but didn't want to modify too many files to implement the changes. The way I ended up doing it is in this commit in my local branch: lbux@c6c8330

If adding backend to the inits is acceptable, I can modify my implementation to expose them and use it for the call to SentenceTransformers.

There are some other nuances I think should be discussed in a proper PR, but that can be done after your final thoughts on how to expose the parameters

@anakin87
Copy link
Member

anakin87 commented Feb 4, 2025

Yes, I would say that exposing backend is better. This way we can stay close to the original meaning of the parameters in Sentence Transformers (model_kwargs included).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
2 participants