Enable intfloat/e5-mistral-7b-instruct model with 32k token lens on hpu device #2715

ZhengHongming888 · 2024-06-04T22:59:45Z

This PR belongs to one of enabling Intel's Gaudi2 GPU supported tasks for Sentence Transformer's inference/training

This PR enables intfloat/e5-mistral-7b-instruct model with 32k token lens input on hpu device and it is the revision of PR#2656.

There are two parts for updates -

Efficient new padding for bigger token lens input by using multiple of 128 instead of original power of 2 to reduce the padding overhead when the input token lens is bigger which is not efficient for power of 2.
Bring in the 7b mistral 32k token lens support with hpu device by using the specific arguments in high level encode arguments which is not hard coded as previous PR.

The usage example for 7b mistral with 32k token lens will be -

hpu_kwargs = {"attn_softmax_bf16": True, "reuse_cache": True, "use_flash_attention":True,"flash_attention_recompute": True,"flash_attention_causal_mask": True, }
emb = model.encode(sentences, batch_size=32, kwargs={"hpu_kwargs" : hpu_kwargs})

any questions please comments.

thanks.

ZhengHongming888 added 2 commits June 4, 2024 15:28

revision for 7b mistral gaudi support

a571cf0

rename gaudi_kwargs by hpu_kwargs

560845a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable intfloat/e5-mistral-7b-instruct model with 32k token lens on hpu device #2715

Enable intfloat/e5-mistral-7b-instruct model with 32k token lens on hpu device #2715

ZhengHongming888 commented Jun 4, 2024 •

edited

Loading

Enable intfloat/e5-mistral-7b-instruct model with 32k token lens on hpu device #2715

Are you sure you want to change the base?

Enable intfloat/e5-mistral-7b-instruct model with 32k token lens on hpu device #2715

Conversation

ZhengHongming888 commented Jun 4, 2024 • edited Loading

ZhengHongming888 commented Jun 4, 2024 •

edited

Loading