Home

Hi there, Welcome to QLLM, which is a flexible tool to use different quantization method including GPTQ and AWQ. You can easily quantize a model with 2-8 bits for trading off model size and accuracy. we supported export model to onnx and running by ONNX Runtime as well.