Sentence Transformers components do not support ONNX or OpenVINO formats #8802

lbux · 2025-02-04T06:00:54Z

Describe the bug
Passing in a backend (onnx or openvino) in model_kwargs for SentenceTransformers components causes duplicate values for backend expected in Sentence Transformer's Transformer class. The call self._load_model(model_name_or_path, config, cache_dir, backend, is_peft_model, **model_args) fails and is uncaught or handled.
Error message
Error that was thrown (if available)

Traceback (most recent call last):
  File "/home/ulises/quant_test/hello.py", line 38, in <module>
    embedder_onnx.warm_up()
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/sentence_transformers_document_embedder.py", line 186, in warm_up
    self.embedding_backend = _SentenceTransformersEmbeddingBackendFactory.get_embedding_backend(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/backends/sentence_transformers_backend.py", line 36, in get_embedding_backend
    embedding_backend = _SentenceTransformersEmbeddingBackend(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/backends/sentence_transformers_backend.py", line 72, in __init__
    self.model = SentenceTransformer(
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 1739, in _load_sbert_model
    module = module_class(model_name_or_path, cache_dir=cache_folder, backend=self.backend, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/sentence_transformers/models/Transformer.py", line 87, in __init__
    self._load_model(model_name_or_path, config, cache_dir, backend, is_peft_model, **model_args)
TypeError: Transformer._load_model() got multiple values for argument 'backend'

Expected behavior
backend should be properly passed into SentenceTransformer once using their built in backend parameter otherwise the library infers torch as the backend even though model_kwargs contains onnx or openvino

Additional context
About a month ago I looked into using quantized models for the Sentence Transformer because I knew it was technically possible, but I wasn't sure if Haystack's implementation could properly handle it. I ended up working on a different issue until I noticed someone also had the same question, so I decided to pick this up again. I will make a PR to support onnx and openvino formats. Pytorch quantization (using dtype float16 or bfloat16) already works.
To Reproduce

from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.dataclasses import Document
from haystack.utils import ComponentDevice

documents = [
    Document(content="Transformers, the movie, was released in 2007"),
    Document(content="This is an irrelevant document"),
    Document(
        content="Transformers, the Machine Learning architecture, was released in 2017"
    ),
]
query = "When was the movie released?"
embedder_onnx = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"backend": "onnx"},
)
embedder_onnx.warm_up()
onnx_embedded_documents = embedder_onnx.run(documents=documents)

FAQ Check

Have you had a look at our new FAQ page?

System:

OS: Ubuntu
GPU/CPU: N/A
Haystack version (commit or version number): 2.9.0
DocumentStore: N/A
Reader: N/A
Retriever: N/A

The text was updated successfully, but these errors were encountered:

anakin87 · 2025-02-04T08:10:19Z

Hey, @lbux...

This feature looks interesting to me and I have been thinking about it for a while.

To make it explicit and clean, I would propose to expose the backend argument in the __init__ of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder. Makes sense?

lbux · 2025-02-04T08:19:37Z

Hey, @lbux...

This feature looks interesting to me and I have been thinking about it for a while.

To make it explicit and clean, I would propose to expose the backend argument in the __init__ of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder. Makes sense?

This is initially what I had in mind but didn't want to modify too many files to implement the changes. The way I ended up doing it is in this commit in my local branch: lbux@c6c8330

If adding backend to the inits is acceptable, I can modify my implementation to expose them and use it for the call to SentenceTransformers.

There are some other nuances I think should be discussed in a proper PR, but that can be done after your final thoughts on how to expose the parameters

anakin87 · 2025-02-04T08:23:04Z

Yes, I would say that exposing backend is better. This way we can stay close to the original meaning of the parameters in Sentence Transformers (model_kwargs included).

anakin87 added the type:feature New feature or request label Feb 4, 2025

lbux linked a pull request Feb 5, 2025 that will close this issue

feat: Add ONNX & OpenVINO backend support, and torch dtype kwargs in Sentence Transformers Components #8813

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentence Transformers components do not support ONNX or OpenVINO formats #8802

Sentence Transformers components do not support ONNX or OpenVINO formats #8802

lbux commented Feb 4, 2025

anakin87 commented Feb 4, 2025

lbux commented Feb 4, 2025

anakin87 commented Feb 4, 2025

Sentence Transformers components do not support ONNX or OpenVINO formats #8802

Sentence Transformers components do not support ONNX or OpenVINO formats #8802

Comments

lbux commented Feb 4, 2025

anakin87 commented Feb 4, 2025

lbux commented Feb 4, 2025

anakin87 commented Feb 4, 2025