Missing models on leaderboards [WIP] #1848

KennethEnevoldsen · 2025-01-21T10:16:03Z

I checked to top 10 models for each leaderboard. They seem to be missing the following scores

MTEB(eng, classic):

NV-Embed-v2
- CQADupstackRetrieval
- STS22
NV-Embed-v1
- MSMARCO
- CQADupstackRetrieval
- STS22
Missing models:
- BAAI/bge-en-icl (no meta, no results)
- yibinlei/LENS-d8000 (no meta, no results)
- yibinlei/LENS-d4000 (no meta, no results)

MTEB(chinese)

missing models:
- Conan-embedding-v1 (no meta, no results)

@Samoed can I ask you to add the missing models (results to the results repo + model meta). Feel free to add a filler class for "modelnotimplemented" in the loader (otherwise we will never catch up with model releases).

x-tabdeveloping · 2025-01-21T12:36:03Z

I went through the first 200 models, since these are the ones that have a mean on the old leaderboard.
I wasn't 100% rigorous so I might be wrong about some of these, but I think this will probably give us a rather solid idea of what the missing models are comprised of.

models_missing_from_eng_classic = [
    "BAAI/bge-en-icl",
    "yibinlei/LENS-d8000",
    "yibinlei/LENS-d4000",
    "voyageai/voyage-3-m-exp",
    "Alibaba-NLP/gme-Qwen2-VL-7B-Instruct",
    "BAAI/bge-en-icl",
    "llmrails/ember-v1",
    "amazon/Titan-text-embeddings-v2",
    "hkunlp/instructor-large",
    "hkunlp/instructor-xl",
    "hkunlp/instructor-base",
    "sentence-transformers/sentence-t5-xxl",  # all sentence-t5s are missing really
    "elser-v2",  # from Elasticsearch
    "Hum-Works/lodestone-base-4096-v1",
    # LASER and SONAR from Facebook
    # Loads of sentence-transformers models we should probably add all of these
    # cde models
]

# This might be useful to have since it's the same model with less layers
distillations = [
    "TaylorAI/bge-micro-v2",
]

# Something was off about all of these.
# Stalling publishing technical reports or data or incomplete READMEs filled with TODO tags
shady = [
    "raghavlight/TDTE",
    "tsirif/BinGSE-Meta-Llama-3-8B-Instruct",
    "tsirif/BinGSE-Sheared-LLaMA",
    "w601sxs/b1ade-embed",
    "sam-babayev/sf_model_e5",
]

quant = [
    "yoeven/multilingual-e5-large-instruct-Q5_K_M-GGUF",
    "yoeven/multilingual-e5-large-instruct-Q5_0-GGUF",
    "yoeven/multilingual-e5-large-instruct-Q3_K_S-GGUF",
    "JHJHJHJHJ/multilingual-e5-large-instruct-Q5_K_M-GGUF" "parasail-ai/GritLM-7B-vllm",
    "Maxthemacaque/onnx-gte-multilingual-base",
    "BookingCare/multilingual-e5-base-similarity-v1-onnx-quantized",
]

empty_readme = [
    "Labib11/MUG-B-1.6",
    "andersonbcdefg/bge-small-4096",
    "princeton-nlp/sup-simcse-bert-base-uncased",
]
no_model = [
    "twadada/gte_wl",
    "twadada/GTE_wl_mv",
    "twadada/GTE512_sw",
    "twadada/GTE256_sw",
    "twadada/l3_wl",
    "twadada/wl_sw_256",
    "twadada/mv_sw",
    "benayad7/concat-e5-small-bge-small-01",
    "lixsh6/XLM-3B5-embedding",
    "lixsh6/XLM-0B6-embedding",
    "lixsh6/MegatronBert-1B3-embedding",
]

# I might be wrong here, and I'm probably missing a lot, just a few examples
outdated = [
    "text-embedding-004-256",
    "text-embedding-004",
    "jinaai/jina-embedding-b-en-v1",
    "jinaai/jina-embedding-s-en-v1",  # There are probably more of these
    "text-similarity-ada-001",
]

# These are cases where there is an original model and most of them are just duplicate entries
copies = [
    "BASF-AI/nomic-embed-text-v1"
    "BASF-AI/nomic-embed-text-v1.5"
    "fdehlinger/english-4U-bge-small",
    "aliakseilabanau/bge-small-en",
    "lightonai/modernbert-embed-large",
    "lightonai/modernbert-embed-large-unsupervised",
    "nomic-ai/nomic-embed-text-v1.5-128", # I know these are not duplicate entries, but do we really need to have all sizes for variable size embeddings?
    "nomic-ai/nomic-embed-text-v1.5-256",
    "nomic-ai/nomic-embed-text-v1.5-512",
    "jncraton/multilingual-e5-small-ct2-int8",
]

x-tabdeveloping · 2025-01-21T12:46:36Z

Here is my personal take on this:

I would not keep quantizations on the leaderboard. I think they introduce clutter, and if people want a quantized model, they have never been easier to find. They just go to the model page and will see a quantizations tab on the right. There is no innovation to quantizing a model, I don't think they should be there to be honest. I'ma bit unsure about adaptations.
I have written to many of the authors of the models where something was missing. I will give them the benefit of the doubt, but I frankly don' think they actually intend to fix anything. Most repos where something was missing, it has been missing for, in some cases, more than a year, and they've had a disclaimer saying "TODO: Add this and that" or "paper coming" or whatever. My opinion is that we should heavily discourage this behaviour. I find it a bit ridiculous that someone would be on the top of the leaderboard for at least two weeks, promise to deliver a technical report and then in the end do nothing in a year's time.
I'm open to discussions on whether we should keep or remove old models that might no longer be relevant. I think there are potentially good points on both sides of this argument.
People who litter the leaderboard with nonexistent models should probably be heavily scrutinized from now on. twadada for instance is responsible for having 7 models on the leaderboard, which have no model files and no information on the model card besides scores on MTEB.

x-tabdeveloping mentioned this issue Jan 22, 2025

fix: Adding missing model meta #1856

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing models on leaderboards [WIP] #1848

Missing models on leaderboards [WIP] #1848

KennethEnevoldsen commented Jan 21, 2025

x-tabdeveloping commented Jan 21, 2025

x-tabdeveloping commented Jan 21, 2025

Missing models on leaderboards [WIP] #1848

Missing models on leaderboards [WIP] #1848

Comments

KennethEnevoldsen commented Jan 21, 2025

x-tabdeveloping commented Jan 21, 2025

x-tabdeveloping commented Jan 21, 2025