Open
Description
I tested the speed of lama models using exlama and noticed that the speed of 8b models is much slower than 7b (although this is not the case with other inferences. Can you tell me what the problem might be?
A100 80gb
tokens/s | tokens first second | symbols/sec | |
---|---|---|---|
suzume-llama-3-8B-multilingual-gptq | 63.96 ± 7.37 | 56.48 ± 16.76 | 96.78 ± 14.76 |
Swallow-7b-instruct-v0.1-gptq | 194.56 ± 22.83 | 166.34 ± 40.23 | 165.04 ± 37.06 |
shisa-v1-llama3-8b-gptq | 60.01 ± 9.03 | 58.92 ± 14.76 | 90.45 ± 15.50 |
Metadata
Metadata
Assignees
Labels
No labels