Run with start_size=0 looks just fine #74

cyr0930 · 2024-01-04T05:52:16Z

I've run a number of experiments and it looks like that most of the performance comes from enabling pos_shift.

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8
6.840701103210449

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 --enable_start_recent_kv_cache --start_size 1 --recent_size 255
29.674755096435547

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 --enable_start_recent_kv_cache --start_size 0 --recent_size 256 --enable_pos_shift
8.8959321975708

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 --enable_start_recent_kv_cache --start_size 1 --recent_size 255 --enable_pos_shift
7.493190765380859

python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 --num_samples 8 --enable_start_recent_kv_cache --start_size 4 --recent_size 252 --enable_pos_shift
7.363883018493652

And also generated output of the following script looks fine to me.
python examples/run_streaming_llama.py --enable_streaming --recent_size 128 --start_size 0

Am I doing something wrong? (choice of model or dataset could matter??)
Is it okay to conclude that major factor which harms generation performance is wrongly-used pos encoding?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run with start_size=0 looks just fine #74

Run with start_size=0 looks just fine #74

cyr0930 commented Jan 4, 2024 •

edited

Loading

Run with start_size=0 looks just fine #74

Run with start_size=0 looks just fine #74

Comments

cyr0930 commented Jan 4, 2024 • edited Loading

cyr0930 commented Jan 4, 2024 •

edited

Loading