Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run with start_size=0 looks just fine #74

Open
cyr0930 opened this issue Jan 4, 2024 · 0 comments
Open

Run with start_size=0 looks just fine #74

cyr0930 opened this issue Jan 4, 2024 · 0 comments

Comments

@cyr0930
Copy link

cyr0930 commented Jan 4, 2024

I've run a number of experiments and it looks like that most of the performance comes from enabling pos_shift.

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8
6.840701103210449

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 --enable_start_recent_kv_cache --start_size 1 --recent_size 255
29.674755096435547

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 --enable_start_recent_kv_cache --start_size 0 --recent_size 256 --enable_pos_shift
8.8959321975708

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 --enable_start_recent_kv_cache --start_size 1 --recent_size 255 --enable_pos_shift
7.493190765380859

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 --enable_start_recent_kv_cache --start_size 4 --recent_size 252 --enable_pos_shift
7.363883018493652

And also generated output of the following script looks fine to me.
python examples/run_streaming_llama.py --enable_streaming --recent_size 128 --start_size 0

Am I doing something wrong? (choice of model or dataset could matter??)
Is it okay to conclude that major factor which harms generation performance is wrongly-used pos encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant