You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 9, 2024. It is now read-only.
Thank you very much for your work. I got a problem when I ran BLOOM-176B on 8*A100.
I followed the README.md and executed the following command. To be specific, I set do_sample = true and top_k = 1 which I thought it was equivalent to greedy search:
Hi, do_sample = true and top_k = 1 should be fine but the correct way to do it is just do_sample = False.
This is weird. I don't this is a bug in the code in this repository.
But will try to give it a shot.
Can you try with just do_sample = False?
Hi @mayank31398 Sorry for the late reply.
It was ok with do_sample=False. The results were all the same.
But I still can't figure out why sampling can't work properly. Do you know who or which repo I can turn to for some help?
Thank you very much for your work. I got a problem when I ran BLOOM-176B on 8*A100.
I followed the
README.md
and executed the following command. To be specific, I setdo_sample = true
andtop_k = 1
which I thought it was equivalent to greedy search:python -m inference_server.cli --model_name bigscience/bloom --model_class AutoModelForCausalLM --dtype bf16 --deployment_framework hf_accelerate --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": true, "top_k": 1}'
However, the generated outputs of several forwards were different with the same inputs. This situation happened occasionally.
Do you have any clues or ideas about this?
My env info:
The text was updated successfully, but these errors were encountered: