-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EVAL REQUEST] jina-embeddings-v3 #77
Comments
Note: I found on X(Twitter) that one of the authors (@bwanglzu ) has already completed the evaluations 😳 |
Thank you for the information! |
Great, look forward to the official results 😊 |
I tried the model with fast datasets, but I found that tasks except The results that is the most similar with https://x.com/bo_wangbo/status/1838919204377911477 are, no prefixes, no LoRA except My results are as below:
{
"Classification": {
"amazon_counterfactual_classification": {
"macro_f1": 0.7949948725329687
},
"massive_intent_classification": {
"macro_f1": 0.7766347542682803
},
"massive_scenario_classification": {
"macro_f1": 0.8982075621284786
}
},
"Retrieval": {
"jagovfaqs_22k": {
"ndcg@10": 0.7449944044307708
},
"nlp_journal_abs_intro": {
"ndcg@10": 0.9941946751679634
},
"nlp_journal_title_abs": {
"ndcg@10": 0.9717376985433034
},
"nlp_journal_title_intro": {
"ndcg@10": 0.9609029386920315
}
},
"STS": {
"jsick": {
"spearman": 0.8146985042196159
},
"jsts": {
"spearman": 0.8068520872331155
}
},
"Clustering": {
"livedoor_news": {
"v_measure_score": 0.5036707354224619
},
"mewsc16": {
"v_measure_score": 0.474391205388421
}
},
"PairClassification": {
"paws_x_ja": {
"binary_f1": 0.623716814159292
}
}
}
{
"Classification": {
"amazon_counterfactual_classification": {
"macro_f1": 0.7949948725329687
},
"massive_intent_classification": {
"macro_f1": 0.7766347542682803
},
"massive_scenario_classification": {
"macro_f1": 0.8982075621284786
}
},
"Retrieval": {
"jagovfaqs_22k": {
"ndcg@10": 0.7255870901661032
},
"nlp_journal_abs_intro": {
"ndcg@10": 0.9829431790599418
},
"nlp_journal_title_abs": {
"ndcg@10": 0.9552122947731903
},
"nlp_journal_title_intro": {
"ndcg@10": 0.9324205002364649
}
},
"STS": {
"jsick": {
"spearman": 0.7816133481804449
},
"jsts": {
"spearman": 0.8193021839272429
}
},
"Clustering": {
"livedoor_news": {
"v_measure_score": 0.5387525923415666
},
"mewsc16": {
"v_measure_score": 0.43532523021586217
}
},
"PairClassification": {
"paws_x_ja": {
"binary_f1": 0.623716814159292
}
}
}
{
"Classification": {
"amazon_counterfactual_classification": {
"macro_f1": 0.7949948725329687
},
"massive_intent_classification": {
"macro_f1": 0.7766347542682803
},
"massive_scenario_classification": {
"macro_f1": 0.8982075621284786
}
},
"Retrieval": {
"jagovfaqs_22k": {
"ndcg@10": 0.7157443309160252
},
"nlp_journal_abs_intro": {
"ndcg@10": 0.9849100129100982
},
"nlp_journal_title_abs": {
"ndcg@10": 0.9560377251324601
},
"nlp_journal_title_intro": {
"ndcg@10": 0.9372937234643258
}
},
"STS": {
"jsick": {
"spearman": 0.7816133481804449
},
"jsts": {
"spearman": 0.8193021839272429
}
},
"Clustering": {
"livedoor_news": {
"v_measure_score": 0.5313213726075848
},
"mewsc16": {
"v_measure_score": 0.43532523021586217
}
},
"PairClassification": {
"paws_x_ja": {
"binary_f1": 0.623716814159292
}
}
} LoRA settings (if w/):
Prefix settings (if w/):
|
Thank you so much for your hard work! |
hi @lsz05 @courage i hacked a bit the code to make it work the thing i changed:
in the TextEmbeder class, i added a
in the
I agree my code is a bit "dirty" as i only want to quickly check the results :) hopefully you understand. If i missed anything in your code base result in a different eval result please let me know :) |
but i'm also quite surprised (in a good way) your score is better than what i reported lol :) maybe there is something wrong in my code, but at least not worse , for
|
I think I'm doing the same thing as you in #80 |
I think I'll have to fix some randomness problems (e.g., fix the random seed in training to make sure everything can be exactly reproduced) in My result is as following {
"metric_name": "v_measure_score",
"metric_value": 0.474391205388421,
"details": {
"optimal_clustering_model_name": "Birch",
"val_scores": {
"MiniBatchKMeans": {
"v_measure_score": 0.45751218122353327,
"homogeneity_score": 0.5000149261766943,
"completeness_score": 0.42166906571540486
},
"AgglomerativeClustering": {
"v_measure_score": 0.4884748969401506,
"homogeneity_score": 0.5211802377702618,
"completeness_score": 0.45963186760591423
},
"BisectingKMeans": {
"v_measure_score": 0.4051884446721869,
"homogeneity_score": 0.4429226569148086,
"completeness_score": 0.3733789195189944
},
"Birch": {
"v_measure_score": 0.48868192903235214,
"homogeneity_score": 0.529365428957467,
"completeness_score": 0.45380546454681364
}
},
"test_scores": {
"Birch": {
"v_measure_score": 0.474391205388421,
"homogeneity_score": 0.5112647214750645,
"completeness_score": 0.44247868671235824
}
}
}
} |
i think your PR looks good, maybe two things:
I'm not sure why using LoRA make the performance a bit worse than w.o. LoRA (for example, on STS). Using LoRA is always my default choice :) one small thing to notice is |
btw have you considered to move JMTEB to the official MTEB leaderboard, this will greatly simplify your work. |
I didn't use I examined how I applied your prefixes to Retrieval in full evaluation, as you write in your huggingface repo. |
We are considering it but also concerning about some difference (e.g., usage of dev set). Someone's done it, but not fully finished embeddings-benchmark/mteb#749 |
@lsz05 I'm finishing adding rest of datasets in embeddings-benchmark/mteb#1262 |
Let me close this issue by #81 . Feel free to reopen if there's something remained to be done. |
モデルの基本情報
name: jina-embeddings-v3
type: XLMRoBERTa (+ LoRA Adapter)
size: 559M (LoRA Adapterを加えると572M)
lang: multilingual
モデル詳細
https://arxiv.org/abs/2409.10173
https://huggingface.co/jinaai/jina-embeddings-v3
seen/unseen申告
JMTEBの評価データセットの中,training splitをモデル学習に使用した,またはvalidation setとして,ハイパラチューニングやearly stoppingに使用したデータセット名をチェックしてください。
評価スクリプト
その他の情報
The text was updated successfully, but these errors were encountered: