-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve: batch size setting and multi GPU inference with SentenceTransformers+DP #53
Conversation
tests/embedders/test_dp_sbert.py
Outdated
OUTPUT_DIM = 128 | ||
|
||
|
||
class TestSentenceBertEmbedder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class TestSentenceBertEmbedder: | |
class TestDPSentenceBertEmbedder: |
|
||
def _add_eos_func(self, text: str | list[str]) -> str | list[str]: | ||
try: | ||
eos_token = getattr(self.model.savetokenizer, "eos_token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.model.tokenizer
?
起動時 |
はい。pythonで起動してください。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTMです!
self.model = self.dp_model.sbert | ||
if max_seq_length: | ||
self.model.max_seq_length = max_seq_length | ||
self.initital_batch_size = batch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.initial_batch_size
が使われていないような気がしますが
関連する Issue / PR
PR をマージした後の挙動の変化
挙動の変更を達成するために行ったこと
src/jmteb/embedders/base.py
のTextEmbedder
に_chunk_size
という変数を追加し、_chunk_size
ごとにTextEmbedder.encode
を呼ぶように変更src/jmteb/embedders/data_parallel_sbert_embedder.py
追加し、sentence-transfermersでpytorchのDataParallel
による複数GPU推論が可能になった。動作確認