Add embedding models configurable, from both transformers.js and TEI #646

mikelfried · 2023-12-19T23:48:50Z

Embedding models configurable, from both Xenova and TEI

Enable to configure the embedding model for the websearch. it allows running the model both locally and on hosted machines utilizing GPUs.

there can be multiple embedding models, the first is the default, and for each text generation model you can set their own specific embedding model (could be language dependent).
support for both Xenova (currently supported in chat-ui) and TEI (Text Embeddings Inference)

Work in progress

@mishig25 in #641 PR changes the current embedding model code, so I will need to tweak the code according to his changes. This feature will be helpfull to get faster and larger embedding models for the pdf chunks (when tei hosted on gpu), and can be language specific.

current limitations

when using TEI, there is env varaibles there MAX_CLIENT_BATCH_SIZE, MAX_BATCH_TOKENS that limits the size of batch, which I need to take them into account. (I should make /info request to the endpoint first and then send batches accordingly) Now taking MAX_CLIENT_BATCH_SIZE and MAX_BATCH_TOKENS of TEI into account.
Create a nice readme section about it.

Status

Currently works, will fix the limitations soon.

please share your opinion 🙂

nsarrazin

Hi, thanks for the contribution! Looks like a great PR, I like the way you handled the embeddings coming from different sources, should be quite easy to extend in the future.

I was wondering why you chose to specify the embedding at the model level ?

For most use cases, I imagine that specifying a single embedding config for the entire app should be enough no? Not 100% sure on this, maybe there are use cases where you need to support multiple embeddings, let me know!

mikelfried · 2023-12-20T16:41:44Z

Hi, thanks for the contribution! Looks like a great PR, I like the way you handled the embeddings coming from different sources, should be quite easy to extend in the future.

I was wondering why you chose to specify the embedding at the model level ?

For most use cases, I imagine that specifying a single embedding config for the entire app should be enough no? Not 100% sure on this, maybe there are use cases where you need to support multiple embeddings, let me know!

Hi, thanks for the quick response,
Without specifically defining embeddingModelName, the first embedding model will be used. The embeddingModelName will be used when the user has one English llm and another Korean (or any language) so will want to use other embedding model for none English,
Or there will be code llm and an embedding model that especially good for codes. This is my reasoning behind the optional embeddingModelName parameter.

nsarrazin · 2023-12-20T16:49:23Z

Of course, then it would make a lot of sense to have different embeddings, I didn't think about those use cases 😁

mikelfried · 2023-12-20T21:47:46Z

Updated with automatic batching for TEI endpoints, using /info route to calculate max batch size.

Fixed a bug: when using Web Search and the llm finish outputing the web search box and the sources disappeared, and got visible again when refreshing the page.

mikelfried · 2023-12-20T22:05:52Z

Now supports models with necessary prefixes, such as https://huggingface.co/intfloat/multilingual-e5-large#faq, with preQuery and prePassage optional parameters.

src/routes/conversation/[id]/+page.svelte

mishig25 · 2023-12-21T14:59:50Z

Another small comment is: instead of calling xenova, we should call that type of model transformersjs

src/lib/server/embeddingEndpoints/tei/embeddingEndpoints.ts

src/lib/server/embeddingEndpoints/embeddingEndpoints.ts

mishig25 · 2023-12-21T15:13:11Z

@mishig25 in #641 PR changes the current embedding model code, so I will need to tweak the code according to his changes. This feature will be helpfull to get faster and larger embedding models for the pdf chunks (when tei hosted on gpu), and can be language specific.

indeed. In #641, I've refactored embeddings functionality. Specifically, refactored findSimilarSentences into two functions: createEmbeddings & findSimilarSentences.

We plan to merge your PR first and the merge #641 afterwards. Therefore, please feel free to update embeddings functionality in this PR to match that of #641

src/lib/server/embeddingEndpoints/embeddingEndpoints.ts

src/lib/server/embeddingEndpoints/tei/embeddingEndpoints.ts

src/lib/server/embeddingEndpoints/embeddingEndpoints.ts

mishig25 · 2023-12-21T15:28:04Z

Overall, great work ! Looking forward to merging it 🚀

mikelfried · 2024-01-06T21:41:28Z

Hi @mishig25, thanks for the review, fixed it.

mishig25 · 2024-01-08T09:44:49Z

LGTM (great job) ! Asking a review from @nsarrazin

mishig25 · 2024-01-08T09:56:45Z

@nsarrazin maybe you can pay more attention to embedding endpoints files: types/EmbeddingEndpoints.ts, transformersjs/embeddingEndpoints.ts, tei/embeddingEndpoints.ts as they resemble the structure a lot of #541

.env.template

README.md

mishig25 · 2024-01-09T10:14:25Z

@mikelfried @nsarrazin actually I think we need one more config/setting: embedding endpoint for task.
For example, we would very likely use:

transformersjs for web-search
tei for Generalize RAG + PDF Chat feature #641

Therefore, we would need a setting to specify by task as well. Wdyt?

nsarrazin · 2024-01-09T10:19:56Z

I can see the use case, but not sure what would be the best way to structure it in the config file. Maybe we could have three vars inside of the model:

embeddingModel which is the default plus searchEmbeddingModel and pdfEmbeddingModel to (optionally) override the base embeddingModel ? not sure about this, open to alternatives

mishig25 · 2024-01-09T11:08:58Z

embeddingModel which is the default plus searchEmbeddingModel and pdfEmbeddingModel to (optionally) override the base embeddingModel ? not sure about this, open to alternatives

this option sounds good to me. I can do it in subseq PR

mishig25

from my side, this PR, LGTM !

nsarrazin

This is working great, quite happy with it. I tested locally with the huggingchat config and it works out of the box. Thanks for the great contribution @mikelfried and I think we can merge this now!

We can add the extra vars in the PDF PR @mishig25, thanks for the detailed review as well!

mishig25 · 2024-01-09T13:58:59Z

one sec, don't merge yet

mishig25 · 2024-01-09T15:26:32Z

Ready to merge ! @nsarrazin

@mikelfried great job again !

nsarrazin · 2024-01-09T15:33:02Z

Testing one last time that it doesn't break huggingchat and will merge 😄

nsarrazin · 2024-01-09T15:36:37Z

Works well, merging it! thanks again for an awesome contrib @mikelfried

Add embedding models configurable, from both Xenova and TEI

491131c

mishig25 mentioned this pull request Dec 20, 2023

Generalize RAG + PDF Chat feature #641

Draft

nsarrazin reviewed Dec 20, 2023

View reviewed changes

mikelfried added 5 commits December 20, 2023 20:27

fix lint and format

3473fc2

Fix bug in sentenceSimilarity

aebf653

Batches for TEI using /info route

c065045

Fix web search disapear when finish searching

8df8fd2

Fix lint and format

cc02b4c

Add more options for better embedding model usage

53fa58a

gary149 requested a review from mishig25 December 21, 2023 10:17