-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #63 from sbintuitions/dev
[dev to main] v1.3.1
- Loading branch information
Showing
19 changed files
with
440 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
name: Bug Report | ||
about: バグの報告 | ||
title: "[BUG] " | ||
labels: bug | ||
assignees: '' | ||
|
||
--- | ||
|
||
## バグの説明 | ||
|
||
## 再現手順 | ||
|
||
## 解決案 | ||
|
||
## 期待される動作 | ||
|
||
## その他の情報 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
name: Evaluation Request | ||
about: モデル評価・リーダーボード追加のリクエスト | ||
title: "[EVAL REQUEST] " | ||
labels: "eval request" | ||
assignees: '' | ||
|
||
--- | ||
|
||
## モデルの基本情報 | ||
**name**: | ||
**type**: <!-- バックボーンモデル,例えば BERT, LLaMA... --> | ||
**size**: | ||
**lang**: ja / multilingual | ||
|
||
## モデル詳細 | ||
<!-- | ||
学習手法,学習データなど,モデルの詳細について記載してください | ||
--> | ||
|
||
|
||
## seen/unseen申告 | ||
JMTEBの評価データセットの中,training splitをモデル学習に使用した,またはvalidation setとして,ハイパラチューニングやearly stoppingに使用したデータセット名をチェックしてください。 | ||
* Classification | ||
* [ ] Amazon Review Classification | ||
* [ ] Amazon Counterfactual Classification | ||
* [ ] Massive Intent Classification | ||
* [ ] Massive Scenario Classification | ||
* Clustering | ||
* [ ] Livedoor News | ||
* [ ] MewsC-16-ja | ||
* STS | ||
* [ ] JSTS | ||
* [ ] JSICK | ||
* Pair Classification | ||
* [ ] PAWS-X-ja | ||
* Retrieval | ||
* [ ] JAQKET | ||
* [ ] Mr.TyDi-ja | ||
* [ ] JaGovFaqs-22k | ||
* [ ] NLP Journal title-abs | ||
* [ ] NLP Journal title-intro | ||
* [ ] NLP Journal abs-intro | ||
* Reranking | ||
* [ ] Esci | ||
* [ ] 申告しません | ||
|
||
|
||
## 評価スクリプト | ||
<!-- | ||
可能であれば評価用のスクリプトを記入してください。 | ||
モデルに合わせた特殊なセッティングは必ず書いてください。 | ||
--> | ||
|
||
## その他の情報 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
name: Feature Request | ||
about: 新機能のリクエスト | ||
title: "[FEAT REQUEST] " | ||
labels: enhancement | ||
assignees: '' | ||
|
||
--- | ||
|
||
## 概要 | ||
|
||
## 利用シナリオ | ||
|
||
## 解決案 | ||
|
||
## 期待される動作 | ||
|
||
## その他の情報 |
8 changes: 8 additions & 0 deletions
8
.github/pull_request_template.md → .github/PULL_REQUEST_TEMPLATE/bug_fix.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
--- | ||
name: Feature update PR | ||
about: 新機能を追加するPR | ||
title: "[Feature] " | ||
labels: enhancement | ||
assignees: '' | ||
--- | ||
|
||
<!-- | ||
PRを出していただき、ありがとうございます。 | ||
base branchを`dev`にするよう、お願いいたします。 | ||
--> | ||
|
||
## 関連する Issue / PR | ||
<!-- | ||
関連する Issue へのリンクを貼り付けてください | ||
--> | ||
|
||
## PR をマージした後の挙動の変化 | ||
<!-- | ||
この PR により達成したい事柄を簡潔に記載してください。 | ||
--> | ||
|
||
## 挙動の変更を達成するために行ったこと | ||
<!-- | ||
実装方針/内容の概略を記載してください | ||
--> | ||
|
||
## 動作確認 | ||
- [ ] テストが通ることを確認した | ||
- [ ] マージ先がdevブランチであることを確認した | ||
- [ ] ... | ||
|
||
<!-- | ||
## その他 | ||
--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
--- | ||
name: Leaderboard submission PR | ||
about: 自作モデルの評価結果をリーダーボードに反映させるPR | ||
title: "[Submission] " | ||
labels: "submission" | ||
assignees: '' | ||
--- | ||
|
||
<!-- | ||
PRを出していただき、ありがとうございます。 | ||
base branchを`dev`にするよう、お願いいたします。 | ||
--> | ||
|
||
## モデルの基本情報 | ||
**name**: | ||
**type**: <!-- バックボーンモデル,例えば BERT, LLaMA... --> | ||
**size**: | ||
**lang**: ja / multilingual | ||
|
||
## モデル詳細 | ||
<!-- | ||
学習手法,学習データなど,モデルの詳細について記載してください | ||
--> | ||
|
||
|
||
## seen/unseen申告 | ||
JMTEBの評価データセットの中,training splitをモデル学習に使用した,またはvalidation setとして,ハイパラチューニングやearly stoppingに使用したデータセット名をチェックしてください。 | ||
* Classification | ||
* [ ] Amazon Review Classification | ||
* [ ] Amazon Counterfactual Classification | ||
* [ ] Massive Intent Classification | ||
* [ ] Massive Scenario Classification | ||
* Clustering | ||
* [ ] Livedoor News | ||
* [ ] MewsC-16-ja | ||
* STS | ||
* [ ] JSTS | ||
* [ ] JSICK | ||
* Pair Classification | ||
* [ ] PAWS-X-ja | ||
* Retrieval | ||
* [ ] JAQKET | ||
* [ ] Mr.TyDi-ja | ||
* [ ] JaGovFaqs-22k | ||
* [ ] NLP Journal title-abs | ||
* [ ] NLP Journal title-intro | ||
* [ ] NLP Journal abs-intro | ||
* Reranking | ||
* [ ] Esci | ||
* [ ] 申告しません | ||
|
||
|
||
## 評価スクリプト | ||
<!-- | ||
可能であれば評価用のスクリプトを記入してください。 | ||
モデルに合わせた特殊なセッティングは必ず書いてください。 | ||
--> | ||
|
||
## その他の情報 | ||
|
||
## 動作確認 | ||
- [ ] テストが通ることを確認した | ||
- [ ] マージ先がdevブランチであることを確認した | ||
- [ ] 結果の`json`ファイルを正しい位置にアップロードした | ||
- [ ] `leaderboard.md`を更新した | ||
- [ ] ... | ||
|
||
<!-- | ||
## その他 | ||
--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
name: Merge to main branch PR | ||
about: dev → main | ||
title: "[Release] " | ||
labels: "version" | ||
assignees: '' | ||
--- | ||
|
||
## バージョンナンバー | ||
|
||
## 更新の概要 | ||
|
||
## 更新点のissue & PR | ||
|
||
## 動作確認 | ||
- [ ] テストが通ることを確認した | ||
- [ ] ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
name: Version update PR | ||
about: バージョン更新用 | ||
title: "[Version bump-up] " | ||
labels: "version" | ||
assignees: '' | ||
--- | ||
|
||
## 更新のバージョンナンバー | ||
|
||
## 更新の概要 | ||
|
||
## 更新点のissue & PR | ||
|
||
## 動作確認 | ||
- [ ] テストが通ることを確認した | ||
- [ ] マージ先がdevブランチであることを確認した | ||
- [ ] ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
MD013: false | ||
MD040: false | ||
MD025: false | ||
MD028: false | ||
MD028: false | ||
MD033: false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Example scripts | ||
|
||
We provide some example scripts for different scenarios. | ||
|
||
#### [sentencebert_1gpu.sh](docs/examples/sentencebert_1gpu.sh) | ||
|
||
For all-task evaluation with a model that can be loaded with [`SentenceTransformer`](https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py) with single GPU, and `fp16` enabled. The corresponding class in `JMTEB` is [`SentenceBertEmbedder`](src/jmteb/embedders/sbert_embedder.py). | ||
|
||
#### [sentencebert_8gpu.sh](docs/examples/sentencebert_8gpu.sh) | ||
|
||
For all-task evaluation with a model that can be loaded with [`SentenceTransformer`](https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py) with 8 GPUs in a node, and `fp16` enabled. The corresponding class in `JMTEB` is [`DataParallelSentenceBertEmbedder`](src/jmteb/embedders/data_parallel_sbert_embedder.py). | ||
|
||
#### [transformers_embedder_multigpu.sh](docs/examples/transformers_embedder_multigpu.sh) | ||
|
||
For all-task evaluation with a model that can be loaded with `AutoModel` in Hugging Face Transformers (even your DIY model as long as it is registered to `AutoModel`, as `trust_remote_code` is set as `True`) with 8 GPUs in a node, and `bf16` enabled. Note that to enable parallelism, `torchrun` is needed. The corresponding class in `JMTEB` is [`TransformersEmbedder`](src/jmteb/embedders/transformers_embedder.py). | ||
|
||
#### [openai_embedder.sh](docs/examples/openai_embedder.sh) | ||
|
||
For all-task evaluation with an OpenAI embedding model through API. Note that you must export your OpenAI API key before the evaluation. The corresponding class in `JMTEB` is [`OpenAIEmbedder`](src/jmteb/embedders/openai_embedder.py). | ||
|
||
#### [exclude.sh](docs/examples/exclude.sh) | ||
|
||
Exclude some slow tasks based on [sentencebert_1gpu.sh](docs/examples/sentencebert_1gpu.sh). | ||
|
||
#### [include.sh](docs/examples/include.sh) | ||
|
||
Specify a few tasks to be run based on [sentencebert_1gpu.sh](docs/examples/sentencebert_1gpu.sh). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
model=$1 | ||
|
||
echo "Running model: $model" | ||
|
||
echo "start" | ||
date "+%Y-%m-%d %H:%M:%S" | ||
echo "" | ||
|
||
poetry run python -m jmteb \ | ||
--embedder SentenceBertEmbedder \ | ||
--embedder.model_name_or_path "$model" \ | ||
--embedder.model_kwargs '{"torch_dtype": "torch.float16"}' \ | ||
--embedder.device cuda \ | ||
--save_dir "results/${model//\//_}" \ | ||
--overwrite_cache false \ | ||
--evaluators src/jmteb/configs/jmteb.jsonnet \ | ||
--eval_exclude "['amazon_review_classification', 'mrtydi', 'jaqket', 'esci']" | ||
|
||
echo "" | ||
date "+%Y-%m-%d %H:%M:%S" | ||
echo "end" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
model=$1 | ||
|
||
echo "Running model: $model" | ||
|
||
echo "start" | ||
date "+%Y-%m-%d %H:%M:%S" | ||
echo "" | ||
|
||
poetry run python -m jmteb \ | ||
--embedder SentenceBertEmbedder \ | ||
--embedder.model_name_or_path "$model" \ | ||
--embedder.model_kwargs '{"torch_dtype": "torch.float16"}' \ | ||
--embedder.device cuda \ | ||
--save_dir "results/${model//\//_}" \ | ||
--overwrite_cache false \ | ||
--evaluators src/jmteb/configs/jmteb.jsonnet \ | ||
--eval_include "['livedoor_news', 'esci']" | ||
|
||
echo "" | ||
date "+%Y-%m-%d %H:%M:%S" | ||
echo "end" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
model=$1 | ||
|
||
export OPENAI_API_KEY=<your_openai_api_key> | ||
|
||
echo "Running OpenAI model: $model" | ||
|
||
echo "start" | ||
date "+%Y-%m-%d %H:%M:%S" | ||
echo "" | ||
|
||
poetry run python -m jmteb \ | ||
--embedder OpenAIEmbedder \ | ||
--embedder.model "$model" \ | ||
--save_dir "results/${model//\//_}" \ | ||
--overwrite_cache false \ | ||
--evaluators src/jmteb/configs/jmteb.jsonnet | ||
|
||
echo "" | ||
date "+%Y-%m-%d %H:%M:%S" | ||
echo "end" |
Oops, something went wrong.