v1.13.0: 4-bit quantization, stateful models, Whisper
OpenVINO
Weight only 4-bit quantization
- Add weight only 4-bit quantization support by @AlexKoff88 in #469
optimum-cli export openvino --model gpt2 --weight-format int4_sym_g128 ov_model
optimum-cli export openvino --model gpt2 --weight-format int4_sym_g128 ov_model