Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastEmbed Docs Pt. 2 - Supported Models #1074

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

davidmyriel
Copy link
Contributor

I added four sections for each of the model groups.
We are showing code how to retrieve the model lists.

The problem is that models change often, so we should just pick 1 or two of the best ones per embedding type.

@mrscoopers can you pick 2 from text embedding, 2 from sparse, 1 from late interaction (Colbert) and 1 from image embedding? Perhaps you can do a table instead of bullet points like I did.

We should make a list that we don't have to change often.

@davidmyriel davidmyriel requested a review from mrscoopers August 8, 2024 02:17
Copy link

netlify bot commented Aug 8, 2024

Deploy Preview for condescending-goldwasser-91acf0 ready!

Name Link
🔨 Latest commit e0cbef2
🔍 Latest deploy log https://app.netlify.com/sites/condescending-goldwasser-91acf0/deploys/66b42ac1106aee00085567da
😎 Deploy Preview https://deploy-preview-1074--condescending-goldwasser-91acf0.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@joein
Copy link
Member

joein commented Aug 8, 2024

Having code to look at the supported models and installing pandas for it looks like a bit of an overkill :(
I think a manually updated table is better, however, it is not perfect as well :(

@mrscoopers
Copy link
Contributor

@davidmyriel We can do a model per provider, it's just 11 textual models now if not to consider different sizes/versions/languages (CLIP, e5, MPNet, BAII/bge, paraphrase-multilingual-MiniLM-L12-v2, gte-large, mxbai-embed-large-v1, snowflake-arctic-embed, nomic-embed-text-v1.5, all-MiniLM-L6-v2 & Jina)
Afaik we don't upload models too often, that's why text descriptions of models & their capabilities seem feasible-ish, and then people can do some selection based on the provided info: we can add retrieval benchmark results, prompts required to make retrieval work, etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants