Add model jxm/cde-small-v1 #1521

YashDThapliyal · 2024-11-28T04:50:15Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

YashDThapliyal · 2024-11-28T06:27:24Z

Once this is approved I will clone the results repo within MTEB and add the generated results folder for this model and submit a PR

x-tabdeveloping

I might be misunderstanding something, but it doesn't seem like you added a correct implementation or metadata on the model. These should be done before we merge the PR.

x-tabdeveloping · 2024-11-28T07:30:45Z

mteb/leaderboard/table.py

@@ -101,7 +101,7 @@ def get_means_per_types(df: pd.DataFrame) -> pd.DataFrame:
 def failsafe_get_model_meta(model_name):
    try:
        return get_model_meta(model_name)
-    except Exception as e:
+    except Exception:


Since your PR is not concerned with the leaderboard, you probably shouldn't put changes in it related to that.

Yes, I believe that was a result of running make lint, however I can leave that out.

x-tabdeveloping · 2024-11-28T07:35:57Z

mteb/models/cde-small-v1_model.py

+
+import mteb
+
+model = mteb.get_model(


I'm not sure if I understand this correctly, but it seems like you did not add a model implementation or model metadata for CDEs. I'm also unsure whether this would work or not. I believe their official guide on how to use CDE is a bit more complicated than this, since they have a first and a sceond stage in all of their guides where they first produce a corpus embedding and then pass it along to the model when embedding new documents.

They have evaluation script, but it a bit complecated https://github.com/jxmorris12/cde/blob/0de4e6c116c8e8223075a2b56277d69e04a2ab7c/evaluate_mteb.py#L26

I see, but I guess it's a better choice still not to implement the model incorrectly here, and maybe just add metadata on it, then ask the CDE team to upload their results to the results repository.
I don't see too much value in adding a script here, that does not use CDEs as they are supposed to be used

I agree with you. I added it evaluation script just for information and show author's implementation

@x-tabdeveloping

I didn't explicitly define the model metadata because when I ran the mteb.get_model_meta command, the output seemed correct. However, I may have misunderstood and overlooked the need to explicitly define the model metadata.

I also have the results repository from when I ran the script. Should I disregard that?

I'm a bit unsure about the next steps I should take. I would appreciate your guidance—thank you!

@jxmorris12 Awesome, I look forward to working with you in the new year!

Hi @jxmorris12,

Happy New Year! I hope you’re doing well. I wanted to follow up and see if you’ve had a chance to take a deeper look at my implementation of your model. I’d greatly appreciate any feedback or suggestions for improvement to ensure we can properly integrate CDE into MTEB.

Looking forward to hearing your thoughts—thank you in advance!

Hi. I'm currently in the process of uploading cde-small-v2, which should happen this week. Once that is finished we can update this PR since it should be easier to use. Should be available within a few days.

Hi @YashDThapliyal – can you (1) update your code to use cde-small-v2 (https://huggingface.co/jxm/cde-small-v2) and (2) update the code to actually grab contextual documents from each corpus? I

I've actually done the work for you of figuring out how to get documents from each dataset type, so you should be essentially copy the approach in the CDE repo: https://github.com/jxmorris12/cde/blob/main/evaluate_mteb.py

Let us know once you've done that and we can all look over the results.

@jxmorris12 sure, I will get on that

updated model.prompts for consistency with mteb

has results of evaluating CDE on tasks

results of running mteb tasks on cde

YashDThapliyal added 11 commits November 4, 2024 10:47

Fix verbosity handling in MTEB.py for consistent logging

d894bbc

updates

0dc2a8a

update docstrings

00032aa

linting code

7ae0583

Create cde-small-v1_model.py

838253d

update code for cde-small-v1 model

eb04a87

Merge branch 'main' of https://github.com/embeddings-benchmark/mteb

be6790e

Merge branch 'embeddings-benchmark:main' into main

5dc100f

make lint and make test

c95b672

Merge branch 'main' of https://github.com/YashDThapliyal/mteb

3a692ec

Update cde-small-v1_model.py

6ab8ffb

YashDThapliyal closed this Nov 28, 2024

YashDThapliyal reopened this Nov 28, 2024

Update cde-small-v1_model.py

36d6702

x-tabdeveloping requested changes Nov 28, 2024

View reviewed changes

x-tabdeveloping mentioned this pull request Dec 2, 2024

leaderboard 2.0: Add missing models #1515

Closed

YashDThapliyal and others added 13 commits December 23, 2024 13:33

create a test

200029d

Merge branch 'embeddings-benchmark:main' into main

5f48218

add model meta data card

64b1c41

Merge branch 'main' of https://github.com/YashDThapliyal/mteb

93b5794

remove zero_shot_benchmark as discussed on PR

9a25f97

clean up comments/add liscense

06f5f01

begin implementing cde

3a2d2e3

add corpus for the model to use

9222601

add model implementation via following HF refrence

d5413fd

syntax error fix (delete ';' )

e269da6

Update cde-small-v1_model.py

8af60b6

updated model.prompts for consistency with mteb

update implementation code

919c2b1

Update cde-small-v1_model.py

5be4ff5

YashDThapliyal and others added 4 commits December 25, 2024 01:39

add results folder

5a05876

has results of evaluating CDE on tasks

Delete mteb/models/results directory

449111e

results directory

9815f36

results of running mteb tasks on cde

Merge remote-tracking branch 'upstream/main'

d2ab89d

x-tabdeveloping mentioned this pull request Jan 21, 2025

CDE models missing #1851

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model jxm/cde-small-v1 #1521

Add model jxm/cde-small-v1 #1521

YashDThapliyal commented Nov 28, 2024 •

edited

Loading

YashDThapliyal commented Nov 28, 2024

x-tabdeveloping left a comment

x-tabdeveloping Nov 28, 2024

YashDThapliyal Nov 28, 2024

x-tabdeveloping Nov 28, 2024

Samoed Nov 28, 2024

x-tabdeveloping Nov 28, 2024

Samoed Nov 28, 2024

YashDThapliyal Nov 28, 2024

YashDThapliyal Dec 26, 2024

YashDThapliyal Jan 9, 2025

jxmorris12 Jan 13, 2025 •

edited

Loading

jxmorris12 Jan 17, 2025

YashDThapliyal Jan 19, 2025

Add model jxm/cde-small-v1 #1521

Are you sure you want to change the base?

Add model jxm/cde-small-v1 #1521

Conversation

YashDThapliyal commented Nov 28, 2024 • edited Loading

Checklist

Adding datasets checklist

Adding a model checklist

YashDThapliyal commented Nov 28, 2024

x-tabdeveloping left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jxmorris12 Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YashDThapliyal commented Nov 28, 2024 •

edited

Loading

jxmorris12 Jan 13, 2025 •

edited

Loading