You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Please use English, otherwise it will be closed.
Motivation
We have been using smoothquant (int8, W8A8) quantization on A100 GPU with TensorRT-LLM and recently tested with vLLM as well. The performance is good: speed, memory, and accuracy all are advantageous compared to fp16 or other quantizations.
Can SGLang also support such quantization for A100 machine? My team is very eager to see it coming.
Thanks
Related resources
No response
The text was updated successfully, but these errors were encountered:
thank you so much for your reply. Is this W8A8 feature going to be smoothquant? If so, when do you expect to have it available? @zhyncs@HandH1998@ispobock
AWQ and GPTQ are W4A16. @HandH1998 and @ispobock are collaborating on the W8A8.
thank you so much for your reply. Is this W8A8 feature going to be smoothquant? If so, when do you expect to have it available? @zhyncs@HandH1998@ispobock
Checklist
Motivation
We have been using smoothquant (int8, W8A8) quantization on A100 GPU with TensorRT-LLM and recently tested with vLLM as well. The performance is good: speed, memory, and accuracy all are advantageous compared to fp16 or other quantizations.
Can SGLang also support such quantization for A100 machine? My team is very eager to see it coming.
Thanks
Related resources
No response
The text was updated successfully, but these errors were encountered: