You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And also generated output of the following script looks fine to me.
python examples/run_streaming_llama.py --enable_streaming --recent_size 128 --start_size 0
Am I doing something wrong? (choice of model or dataset could matter??)
Is it okay to conclude that major factor which harms generation performance is wrongly-used pos encoding?
The text was updated successfully, but these errors were encountered:
I've run a number of experiments and it looks like that most of the performance comes from enabling pos_shift.
And also generated output of the following script looks fine to me.
python examples/run_streaming_llama.py --enable_streaming --recent_size 128 --start_size 0
Am I doing something wrong? (choice of model or dataset could matter??)
Is it okay to conclude that major factor which harms generation performance is wrongly-used pos encoding?
The text was updated successfully, but these errors were encountered: