Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] request smoothquant (int8, W8A8) quantization on 40G A100 #2474

Open
2 tasks done
Hao-YunDeng opened this issue Dec 13, 2024 · 5 comments
Open
2 tasks done
Assignees

Comments

@Hao-YunDeng
Copy link

Hao-YunDeng commented Dec 13, 2024

Checklist

Motivation

We have been using smoothquant (int8, W8A8) quantization on A100 GPU with TensorRT-LLM and recently tested with vLLM as well. The performance is good: speed, memory, and accuracy all are advantageous compared to fp16 or other quantizations.

Can SGLang also support such quantization for A100 machine? My team is very eager to see it coming.

Thanks

Related resources

No response

@Hao-YunDeng
Copy link
Author

AWQ and GPTQ both are W8A16; we need W8A8

@zhyncs
Copy link
Member

zhyncs commented Dec 13, 2024

AWQ and GPTQ are W4A16.

@HandH1998 and @ispobock are collaborating on the W8A8.

@Hao-YunDeng
Copy link
Author

AWQ and GPTQ are W4A16.

@HandH1998 and @ispobock are collaborating on the W8A8.

thank you so much for your reply. Is this W8A8 feature going to be smoothquant? If so, when do you expect to have it available? @zhyncs @HandH1998 @ispobock

@ispobock
Copy link
Collaborator

AWQ and GPTQ are W4A16.
@HandH1998 and @ispobock are collaborating on the W8A8.

thank you so much for your reply. Is this W8A8 feature going to be smoothquant? If so, when do you expect to have it available? @zhyncs @HandH1998 @ispobock

  1. Yes, it's smoothquant.
  2. Maybe next two weeks.

@halexan
Copy link

halexan commented Dec 17, 2024

Does sglang support w8a8 quantized model? Like this one: neuralmagic-ent/Qwen2.5-72B-Instruct-quantized.w8a8

If supported, how can I fly it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants