Skip to content

Llama 3 speed #585

Open
Open
@freQuensy23-coder

Description

@freQuensy23-coder

I tested the speed of lama models using exlama and noticed that the speed of 8b models is much slower than 7b (although this is not the case with other inferences. Can you tell me what the problem might be?

A100 80gb

  tokens/s tokens first second symbols/sec
suzume-llama-3-8B-multilingual-gptq 63.96 ± 7.37 56.48 ± 16.76 96.78 ± 14.76
Swallow-7b-instruct-v0.1-gptq 194.56 ± 22.83 166.34 ± 40.23 165.04 ± 37.06
shisa-v1-llama3-8b-gptq 60.01 ± 9.03 58.92 ± 14.76 90.45 ± 15.50

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions