-
Notifications
You must be signed in to change notification settings - Fork 17
Home
JiCheng edited this page Nov 22, 2023
·
1 revision
Hi there, Welcome to QLLM, which is a flexible tool to use different quantization method including GPTQ and AWQ. You can easily quantize a model with 2-8 bits for trading off model size and accuracy. we supported export model to onnx and running by ONNX Runtime as well.