Intel® Neural Compressor v1.12 Release
Features
-
Quantization
- Support accuracy-aware AMP (INT8/BF16/FP32) on PyTorch
- Improve post-training quantization (static & dynamic) on PyTorch
- Improve post-training quantization on TensorFlow
- Improve QLinear and QDQ quantization modes on ONNX Runtime
- Improve accuracy-aware AMP (INT8/FP32) on ONNX Runtime
-
Pruning
- Improve pruning-once-for-all for NLP models
-
Sparsity
- Support experimental sparse kernel for reference examples
Productivity
- Support model deployment by loading INT8 models directly from HuggingFace model hub
- Improve GUI with optimized model downloading, performance profiling, etc.
Ecosystem
- Highlight simple quantization usage with few clicks on ONNX Model Zoo
- Upstream INC quantized models (ResNet101, Tiny YoloV3) to ONNX Model Zoo
Examples
- Add Bert-mini distillation + quantization notebook example
- Add DLRM & SSD-ResNet34 quantization examples on IPEX
- Improve BERT structured sparsity training example
Validated Configurations
- Python 3.8, 3.9, 3.10
- Centos 8.3 & Ubuntu 18.04 & Win10
- TensorFlow 2.6.2, 2.7, 2.8
- Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
- PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
- IPEX 1.8.0, 1.9.0, 1.10.0
- MxNet 1.6.0, 1.7.0, 1.8.0
- ONNX Runtime 1.8.0, 1.9.0, 1.10.0