Skip to content

Low Ppl benchmark results #9

Closed
Closed
@waters222

Description

@waters222

Hi.
I am in the process of adding QuiP inference support into ExllamaV2
and this is the PR

The problem I am having right now is the my Ppl testing results is kind worse compare to your blog results.

so I am wondering is there something wrong with my implementation or any other reasons.

Ppl Benchmarks

using dataset: [wikitext-2-v1_validation_0000.parquet]
(https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation)

Model Performance
2Bit
Llama-2-7b-E8P-2Bit 8.7339
Llama2-7b-exl2-2.5bpw 8.0745
Llama-2-13b-E8P-2Bit 7.1207
Llama2-13b-exl2-2.5bpw 7.2741
Llama-2-70b-E8P-2Bit 6.2192
Llama2-70b-exl2-2.5bpw 5.8270
4Bit
Llama-2-7b-HI-4Bit-Packed 6.0748
Llama2-7b-exl2-4.0bpw 6.0300
Llama-2-13b-HI-4Bit-Packed 7.4169
Llama2-13b-exl2-4.0bpw 5.4905

Activity

tsengalb99

tsengalb99 commented on Dec 7, 2023

@tsengalb99
Contributor

def get_wikitext2(nsamples, seed, seqlen, model):

This is where we sample wikitext2. You should check fp16 results on your dataset for an accurate comparison on that dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @waters222@tsengalb99

        Issue actions

          Low Ppl benchmark results · Issue #9 · Cornell-RelaxML/quip-sharp