Llama 3 speed

I tested the speed of lama models using exlama and noticed that the speed of 8b models is much slower than 7b (although this is not the case with other inferences. Can you tell me what the problem might be?

A100 80gb

  | tokens/s | tokens first second | symbols/sec
-- | -- | -- | --
suzume-llama-3-8B-multilingual-gptq | 63.96 ± 7.37 | 56.48 ± 16.76 | 96.78 ± 14.76
Swallow-7b-instruct-v0.1-gptq | 194.56 ± 22.83 | 166.34 ± 40.23 | 165.04 ± 37.06
shisa-v1-llama3-8b-gptq | 60.01 ± 9.03 | 58.92 ± 14.76 | 90.45 ± 15.50



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Llama 3 speed #585

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	tokens/s	tokens first second	symbols/sec
suzume-llama-3-8B-multilingual-gptq	63.96 ± 7.37	56.48 ± 16.76	96.78 ± 14.76
Swallow-7b-instruct-v0.1-gptq	194.56 ± 22.83	166.34 ± 40.23	165.04 ± 37.06
shisa-v1-llama3-8b-gptq	60.01 ± 9.03	58.92 ± 14.76	90.45 ± 15.50

Uh oh!

Llama 3 speed #585

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions