You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Passing in a backend (onnx or openvino) in model_kwargs for SentenceTransformers components causes duplicate values for backend expected in Sentence Transformer's Transformer class. The call self._load_model(model_name_or_path, config, cache_dir, backend, is_peft_model, **model_args) fails and is uncaught or handled. Error message
Error that was thrown (if available)
Traceback (most recent call last):
File "/home/ulises/quant_test/hello.py", line 38, in <module>
embedder_onnx.warm_up()
File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/sentence_transformers_document_embedder.py", line 186, in warm_up
self.embedding_backend = _SentenceTransformersEmbeddingBackendFactory.get_embedding_backend(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/backends/sentence_transformers_backend.py", line 36, in get_embedding_backend
embedding_backend = _SentenceTransformersEmbeddingBackend(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/haystack/components/embedders/backends/sentence_transformers_backend.py", line 72, in __init__
self.model = SentenceTransformer(
File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 1739, in _load_sbert_model
module = module_class(model_name_or_path, cache_dir=cache_folder, backend=self.backend, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ulises/quant_test/.venv/lib/python3.12/site-packages/sentence_transformers/models/Transformer.py", line 87, in __init__
self._load_model(model_name_or_path, config, cache_dir, backend, is_peft_model, **model_args)
TypeError: Transformer._load_model() got multiple values for argument 'backend'
Expected behavior backend should be properly passed into SentenceTransformer once using their built in backend parameter otherwise the library infers torch as the backend even though model_kwargs contains onnx or openvino
Additional context
About a month ago I looked into using quantized models for the Sentence Transformer because I knew it was technically possible, but I wasn't sure if Haystack's implementation could properly handle it. I ended up working on a different issue until I noticed someone also had the same question, so I decided to pick this up again. I will make a PR to support onnx and openvino formats. Pytorch quantization (using dtype float16 or bfloat16) already works. To Reproduce
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.dataclasses import Document
from haystack.utils import ComponentDevice
documents = [
Document(content="Transformers, the movie, was released in 2007"),
Document(content="This is an irrelevant document"),
Document(
content="Transformers, the Machine Learning architecture, was released in 2017"
),
]
query = "When was the movie released?"
embedder_onnx = SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={"backend": "onnx"},
)
embedder_onnx.warm_up()
onnx_embedded_documents = embedder_onnx.run(documents=documents)
This feature looks interesting to me and I have been thinking about it for a while.
To make it explicit and clean, I would propose to expose the backend argument in the __init__ of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder. Makes sense?
This feature looks interesting to me and I have been thinking about it for a while.
To make it explicit and clean, I would propose to expose the backend argument in the __init__ of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder. Makes sense?
This is initially what I had in mind but didn't want to modify too many files to implement the changes. The way I ended up doing it is in this commit in my local branch: lbux@c6c8330
If adding backend to the inits is acceptable, I can modify my implementation to expose them and use it for the call to SentenceTransformers.
There are some other nuances I think should be discussed in a proper PR, but that can be done after your final thoughts on how to expose the parameters
Yes, I would say that exposing backend is better. This way we can stay close to the original meaning of the parameters in Sentence Transformers (model_kwargs included).
lbux
linked a pull request
Feb 5, 2025
that will
close
this issue
Describe the bug
Passing in a backend (
onnx
oropenvino
) inmodel_kwargs
forSentenceTransformers
components causes duplicate values forbackend
expected in Sentence Transformer's Transformer class. The callself._load_model(model_name_or_path, config, cache_dir, backend, is_peft_model, **model_args)
fails and is uncaught or handled.Error message
Error that was thrown (if available)
Expected behavior
backend
should be properly passed intoSentenceTransformer
once using their built inbackend
parameter otherwise the library inferstorch
as thebackend
even thoughmodel_kwargs
containsonnx
oropenvino
Additional context
About a month ago I looked into using quantized models for the Sentence Transformer because I knew it was technically possible, but I wasn't sure if Haystack's implementation could properly handle it. I ended up working on a different issue until I noticed someone also had the same question, so I decided to pick this up again. I will make a PR to support onnx and openvino formats. Pytorch quantization (using
dtype
float16 or bfloat16) already works.To Reproduce
FAQ Check
System:
The text was updated successfully, but these errors were encountered: