Skip to content
JiCheng edited this page Nov 22, 2023 · 1 revision

Hi there, Welcome to QLLM, which is a flexible tool to use different quantization method including GPTQ and AWQ. You can easily quantize a model with 2-8 bits for trading off model size and accuracy. we supported export model to onnx and running by ONNX Runtime as well.

Clone this wiki locally