You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been moving my model to TGI to accelerate the inference speed
Some issues I'm facing right now:
I can't pass a bnb config (which contains info like double_quant, dtype, loading in 4_bit and quant type), i only can pass quant_type nf4
A complete drop in performance (my adapter performance is completely bad, doesn't do the task) which i suspect because of the above configuration differences
I can't configure the tokenizer for my model because I change it with padding to right instead of left
So many configuration constraints that's making reproducing my model a hell
So any idea how can I fix or handle this? is the issue that I'm using qlora and only lora is supported? Should I export the model with the specific configuration and load it locally instead of configuring it on the spot?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi Everyone,
I have been moving my model to TGI to accelerate the inference speed
Some issues I'm facing right now:
I can't pass a bnb config (which contains info like double_quant, dtype, loading in 4_bit and quant type), i only can pass quant_type nf4
A complete drop in performance (my adapter performance is completely bad, doesn't do the task) which i suspect because of the above configuration differences
I can't configure the tokenizer for my model because I change it with padding to right instead of left
So many configuration constraints that's making reproducing my model a hell
So any idea how can I fix or handle this? is the issue that I'm using qlora and only lora is supported? Should I export the model with the specific configuration and load it locally instead of configuring it on the spot?
Beta Was this translation helpful? Give feedback.
All reactions