Closed
Description
Hi.
I am in the process of adding QuiP inference support into ExllamaV2
and this is the PR
The problem I am having right now is the my Ppl testing results is kind worse compare to your blog results.
so I am wondering is there something wrong with my implementation or any other reasons.
Ppl Benchmarks
using dataset: [wikitext-2-v1_validation_0000.parquet]
(https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation)
Model | Performance |
---|---|
2Bit | |
Llama-2-7b-E8P-2Bit | 8.7339 |
Llama2-7b-exl2-2.5bpw | 8.0745 |
Llama-2-13b-E8P-2Bit | 7.1207 |
Llama2-13b-exl2-2.5bpw | 7.2741 |
Llama-2-70b-E8P-2Bit | 6.2192 |
Llama2-70b-exl2-2.5bpw | 5.8270 |
4Bit | |
Llama-2-7b-HI-4Bit-Packed | 6.0748 |
Llama2-7b-exl2-4.0bpw | 6.0300 |
Llama-2-13b-HI-4Bit-Packed | 7.4169 |
Llama2-13b-exl2-4.0bpw | 5.4905 |
Activity
tsengalb99 commentedon Dec 7, 2023
quip-sharp/lib/utils/gptq_data_utils.py
Line 12 in 6648e56
This is where we sample wikitext2. You should check fp16 results on your dataset for an accurate comparison on that dataset.