Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder can accept and pass any arguments to SentenceTransformer.encode #8806

Merged

Conversation

oroszgy
Copy link
Contributor

@oroszgy oroszgy commented Feb 4, 2025

Enhanced SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder to accept an additional parameter, which is passed directly to the underlying SentenceTransformer.encode method for greater flexibility in embedding customization.

Related Issues

No related issues

Proposed Changes:

Recent embedding methods requires extra parameters when used with SentenceTransformer (e.g. task, prompt, prompt_name). One specific example is jina-embeddings-v3 which supports task-specific embeddings through the task paramater.

How did you test it?

Extended corresponding unit tests and make them pass.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

…xtEmbedder can accept and pass any arguments to SentenceTransformer.encode
@oroszgy oroszgy requested review from a team as code owners February 4, 2025 12:56
@oroszgy oroszgy requested review from dfokina and anakin87 and removed request for a team February 4, 2025 12:56
@CLAassistant
Copy link

CLAassistant commented Feb 4, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Feb 4, 2025
@coveralls
Copy link
Collaborator

coveralls commented Feb 4, 2025

Pull Request Test Coverage Report for Build 13161311307

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.001%) to 91.3%

Files with Coverage Reduction New Missed Lines %
components/embedders/sentence_transformers_document_embedder.py 2 96.77%
components/embedders/sentence_transformers_text_embedder.py 2 96.3%
Totals Coverage Status
Change from base Build 13158724398: 0.001%
Covered Lines: 8952
Relevant Lines: 9805

💛 - Coveralls

@anakin87
Copy link
Member

anakin87 commented Feb 4, 2025

Hello and thanks for the PR!

While I understand the need of passing task to jina-embeddings-v3, I see that most of the parameters for encode can be already set in the __init__ of the Embedders, while some of them are not acceptable to make sure that these components run correctly (e.g., output_value, convert_to_numpy, ...).

So I need to think a bit about covering the use case you proposed... I'll let you know!

@oroszgy
Copy link
Contributor Author

oroszgy commented Feb 4, 2025

Thanks for the feedback! Looking forward to hearing back from you.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this approach is OK.

I've left some comments.

Plus,

…dder and SentenceTransformersTextEmbedder mae to be the last positional parameter for backward compatibility reasons
…Embedder and SentenceTransformersDocumentEmbedder
@oroszgy
Copy link
Contributor Author

oroszgy commented Feb 5, 2025

@anakin87 Thanks for the feedback! Went through all of them, and fixed my PR accordingly.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some final things to fix

…mbedder and SentenceTransformersDocumentEmbedder
…dder and SentenceTransformersTextEmbedder mae to be the last positional parameter for backward compatibility (part II.)
@oroszgy
Copy link
Contributor Author

oroszgy commented Feb 5, 2025

Sorry for my oversight, now should be good.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Thanks for your contribution! 💙

@anakin87 anakin87 enabled auto-merge (squash) February 5, 2025 15:56
@anakin87 anakin87 merged commit d2348ad into deepset-ai:main Feb 5, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants