Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embedding models configurable, from both transformers.js and TEI #646

Merged
merged 32 commits into from
Jan 9, 2024

Conversation

mikelfried
Copy link
Contributor

@mikelfried mikelfried commented Dec 19, 2023

Embedding models configurable, from both Xenova and TEI

Enable to configure the embedding model for the websearch. it allows running the model both locally and on hosted machines utilizing GPUs.

  • there can be multiple embedding models, the first is the default, and for each text generation model you can set their own specific embedding model (could be language dependent).
  • support for both Xenova (currently supported in chat-ui) and TEI (Text Embeddings Inference)

Work in progress

@mishig25 in #641 PR changes the current embedding model code, so I will need to tweak the code according to his changes. This feature will be helpfull to get faster and larger embedding models for the pdf chunks (when tei hosted on gpu), and can be language specific.

current limitations

  • when using TEI, there is env varaibles there MAX_CLIENT_BATCH_SIZE, MAX_BATCH_TOKENS that limits the size of batch, which I need to take them into account. (I should make /info request to the endpoint first and then send batches accordingly) Now taking MAX_CLIENT_BATCH_SIZE and MAX_BATCH_TOKENS of TEI into account.
  • Create a nice readme section about it.

Status

Currently works, will fix the limitations soon.

please share your opinion 🙂

Copy link
Collaborator

@nsarrazin nsarrazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the contribution! Looks like a great PR, I like the way you handled the embeddings coming from different sources, should be quite easy to extend in the future.

I was wondering why you chose to specify the embedding at the model level ?

For most use cases, I imagine that specifying a single embedding config for the entire app should be enough no? Not 100% sure on this, maybe there are use cases where you need to support multiple embeddings, let me know!

@mikelfried
Copy link
Contributor Author

mikelfried commented Dec 20, 2023

Hi, thanks for the contribution! Looks like a great PR, I like the way you handled the embeddings coming from different sources, should be quite easy to extend in the future.

I was wondering why you chose to specify the embedding at the model level ?

For most use cases, I imagine that specifying a single embedding config for the entire app should be enough no? Not 100% sure on this, maybe there are use cases where you need to support multiple embeddings, let me know!

Hi, thanks for the quick response,
Without specifically defining embeddingModelName, the first embedding model will be used. The embeddingModelName will be used when the user has one English llm and another Korean (or any language) so will want to use other embedding model for none English,
Or there will be code llm and an embedding model that especially good for codes. This is my reasoning behind the optional embeddingModelName parameter.

@nsarrazin
Copy link
Collaborator

Of course, then it would make a lot of sense to have different embeddings, I didn't think about those use cases 😁

@mikelfried
Copy link
Contributor Author

Updated with automatic batching for TEI endpoints, using /info route to calculate max batch size.

Fixed a bug: when using Web Search and the llm finish outputing the web search box and the sources disappeared, and got visible again when refreshing the page.

@mikelfried
Copy link
Contributor Author

Now supports models with necessary prefixes, such as https://huggingface.co/intfloat/multilingual-e5-large#faq, with preQuery and prePassage optional parameters.

@mishig25
Copy link
Collaborator

mishig25 commented Dec 21, 2023

Another small comment is: instead of calling xenova, we should call that type of model transformersjs

@mishig25
Copy link
Collaborator

@mishig25 in #641 PR changes the current embedding model code, so I will need to tweak the code according to his changes. This feature will be helpfull to get faster and larger embedding models for the pdf chunks (when tei hosted on gpu), and can be language specific.

indeed. In #641, I've refactored embeddings functionality. Specifically, refactored findSimilarSentences into two functions: createEmbeddings & findSimilarSentences.

We plan to merge your PR first and the merge #641 afterwards. Therefore, please feel free to update embeddings functionality in this PR to match that of #641

@mishig25
Copy link
Collaborator

Overall, great work ! Looking forward to merging it 🚀

@mikelfried
Copy link
Contributor Author

Hi @mishig25, thanks for the review, fixed it.

@mishig25
Copy link
Collaborator

mishig25 commented Jan 8, 2024

LGTM (great job) ! Asking a review from @nsarrazin

@mishig25
Copy link
Collaborator

mishig25 commented Jan 8, 2024

@nsarrazin maybe you can pay more attention to embedding endpoints files: types/EmbeddingEndpoints.ts, transformersjs/embeddingEndpoints.ts, tei/embeddingEndpoints.ts as they resemble the structure a lot of #541

.env.template Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@mishig25 mishig25 changed the title Add embedding models configurable, from both Xenova and TEI Add embedding models configurable, from both transformers.js and TEI Jan 8, 2024
@mishig25
Copy link
Collaborator

mishig25 commented Jan 9, 2024

@mikelfried @nsarrazin actually I think we need one more config/setting: embedding endpoint for task.
For example, we would very likely use:

  1. transformersjs for web-search
  2. tei for Generalize RAG + PDF Chat feature #641

Therefore, we would need a setting to specify by task as well. Wdyt?

@nsarrazin
Copy link
Collaborator

I can see the use case, but not sure what would be the best way to structure it in the config file. Maybe we could have three vars inside of the model:

embeddingModel which is the default plus searchEmbeddingModel and pdfEmbeddingModel to (optionally) override the base embeddingModel ? not sure about this, open to alternatives

@mishig25
Copy link
Collaborator

mishig25 commented Jan 9, 2024

embeddingModel which is the default plus searchEmbeddingModel and pdfEmbeddingModel to (optionally) override the base embeddingModel ? not sure about this, open to alternatives

this option sounds good to me. I can do it in subseq PR

Copy link
Collaborator

@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my side, this PR, LGTM !

Copy link
Collaborator

@nsarrazin nsarrazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working great, quite happy with it. I tested locally with the huggingchat config and it works out of the box. Thanks for the great contribution @mikelfried and I think we can merge this now!

We can add the extra vars in the PDF PR @mishig25, thanks for the detailed review as well!

@mishig25
Copy link
Collaborator

mishig25 commented Jan 9, 2024

one sec, don't merge yet

@mishig25
Copy link
Collaborator

mishig25 commented Jan 9, 2024

Ready to merge ! @nsarrazin

@mikelfried great job again !

@nsarrazin
Copy link
Collaborator

Testing one last time that it doesn't break huggingchat and will merge 😄

@nsarrazin
Copy link
Collaborator

Works well, merging it! thanks again for an awesome contrib @mikelfried

@nsarrazin nsarrazin merged commit 3a01622 into huggingface:main Jan 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request websearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants