Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing models on leaderboards [WIP] #1848

Open
KennethEnevoldsen opened this issue Jan 21, 2025 · 2 comments
Open

Missing models on leaderboards [WIP] #1848

KennethEnevoldsen opened this issue Jan 21, 2025 · 2 comments

Comments

@KennethEnevoldsen
Copy link
Contributor

I checked to top 10 models for each leaderboard. They seem to be missing the following scores

MTEB(eng, classic):

  • NV-Embed-v2
    • CQADupstackRetrieval
    • STS22
  • NV-Embed-v1
    • MSMARCO
    • CQADupstackRetrieval
    • STS22
  • Missing models:
    • BAAI/bge-en-icl (no meta, no results)
    • yibinlei/LENS-d8000 (no meta, no results)
    • yibinlei/LENS-d4000 (no meta, no results)

MTEB(chinese)

@Samoed can I ask you to add the missing models (results to the results repo + model meta). Feel free to add a filler class for "modelnotimplemented" in the loader (otherwise we will never catch up with model releases).

@x-tabdeveloping
Copy link
Collaborator

I went through the first 200 models, since these are the ones that have a mean on the old leaderboard.
I wasn't 100% rigorous so I might be wrong about some of these, but I think this will probably give us a rather solid idea of what the missing models are comprised of.

models_missing_from_eng_classic = [
    "BAAI/bge-en-icl",
    "yibinlei/LENS-d8000",
    "yibinlei/LENS-d4000",
    "voyageai/voyage-3-m-exp",
    "Alibaba-NLP/gme-Qwen2-VL-7B-Instruct",
    "BAAI/bge-en-icl",
    "llmrails/ember-v1",
    "amazon/Titan-text-embeddings-v2",
    "hkunlp/instructor-large",
    "hkunlp/instructor-xl",
    "hkunlp/instructor-base",
    "sentence-transformers/sentence-t5-xxl",  # all sentence-t5s are missing really
    "elser-v2",  # from Elasticsearch
    "Hum-Works/lodestone-base-4096-v1",
    # LASER and SONAR from Facebook
    # Loads of sentence-transformers models we should probably add all of these
    # cde models
]

# This might be useful to have since it's the same model with less layers
distillations = [
    "TaylorAI/bge-micro-v2",
]

# Something was off about all of these.
# Stalling publishing technical reports or data or incomplete READMEs filled with TODO tags
shady = [
    "raghavlight/TDTE",
    "tsirif/BinGSE-Meta-Llama-3-8B-Instruct",
    "tsirif/BinGSE-Sheared-LLaMA",
    "w601sxs/b1ade-embed",
    "sam-babayev/sf_model_e5",
]

quant = [
    "yoeven/multilingual-e5-large-instruct-Q5_K_M-GGUF",
    "yoeven/multilingual-e5-large-instruct-Q5_0-GGUF",
    "yoeven/multilingual-e5-large-instruct-Q3_K_S-GGUF",
    "JHJHJHJHJ/multilingual-e5-large-instruct-Q5_K_M-GGUF" "parasail-ai/GritLM-7B-vllm",
    "Maxthemacaque/onnx-gte-multilingual-base",
    "BookingCare/multilingual-e5-base-similarity-v1-onnx-quantized",
]

empty_readme = [
    "Labib11/MUG-B-1.6",
    "andersonbcdefg/bge-small-4096",
    "princeton-nlp/sup-simcse-bert-base-uncased",
]
no_model = [
    "twadada/gte_wl",
    "twadada/GTE_wl_mv",
    "twadada/GTE512_sw",
    "twadada/GTE256_sw",
    "twadada/l3_wl",
    "twadada/wl_sw_256",
    "twadada/mv_sw",
    "benayad7/concat-e5-small-bge-small-01",
    "lixsh6/XLM-3B5-embedding",
    "lixsh6/XLM-0B6-embedding",
    "lixsh6/MegatronBert-1B3-embedding",
]

# I might be wrong here, and I'm probably missing a lot, just a few examples
outdated = [
    "text-embedding-004-256",
    "text-embedding-004",
    "jinaai/jina-embedding-b-en-v1",
    "jinaai/jina-embedding-s-en-v1",  # There are probably more of these
    "text-similarity-ada-001",
]

# These are cases where there is an original model and most of them are just duplicate entries
copies = [
    "BASF-AI/nomic-embed-text-v1"
    "BASF-AI/nomic-embed-text-v1.5"
    "fdehlinger/english-4U-bge-small",
    "aliakseilabanau/bge-small-en",
    "lightonai/modernbert-embed-large",
    "lightonai/modernbert-embed-large-unsupervised",
    "nomic-ai/nomic-embed-text-v1.5-128", # I know these are not duplicate entries, but do we really need to have all sizes for variable size embeddings?
    "nomic-ai/nomic-embed-text-v1.5-256",
    "nomic-ai/nomic-embed-text-v1.5-512",
    "jncraton/multilingual-e5-small-ct2-int8",
]

@x-tabdeveloping
Copy link
Collaborator

Here is my personal take on this:

  1. I would not keep quantizations on the leaderboard. I think they introduce clutter, and if people want a quantized model, they have never been easier to find. They just go to the model page and will see a quantizations tab on the right. There is no innovation to quantizing a model, I don't think they should be there to be honest. I'ma bit unsure about adaptations.
  2. I have written to many of the authors of the models where something was missing. I will give them the benefit of the doubt, but I frankly don' think they actually intend to fix anything. Most repos where something was missing, it has been missing for, in some cases, more than a year, and they've had a disclaimer saying "TODO: Add this and that" or "paper coming" or whatever. My opinion is that we should heavily discourage this behaviour. I find it a bit ridiculous that someone would be on the top of the leaderboard for at least two weeks, promise to deliver a technical report and then in the end do nothing in a year's time.
  3. I'm open to discussions on whether we should keep or remove old models that might no longer be relevant. I think there are potentially good points on both sides of this argument.
  4. People who litter the leaderboard with nonexistent models should probably be heavily scrutinized from now on. twadada for instance is responsible for having 7 models on the leaderboard, which have no model files and no information on the model card besides scores on MTEB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants