Intel® Neural Compressor v1.12 Release

ftian1 released this 27 May 14:40

· 2061 commits to master since this release

aac0a0e

Features

Quantization
- Support accuracy-aware AMP (INT8/BF16/FP32) on PyTorch
- Improve post-training quantization (static & dynamic) on PyTorch
- Improve post-training quantization on TensorFlow
- Improve QLinear and QDQ quantization modes on ONNX Runtime
- Improve accuracy-aware AMP (INT8/FP32) on ONNX Runtime
Pruning
- Improve pruning-once-for-all for NLP models
Sparsity
- Support experimental sparse kernel for reference examples

Productivity

Support model deployment by loading INT8 models directly from HuggingFace model hub
Improve GUI with optimized model downloading, performance profiling, etc.

Ecosystem

Highlight simple quantization usage with few clicks on ONNX Model Zoo
Upstream INC quantized models (ResNet101, Tiny YoloV3) to ONNX Model Zoo

Examples

Add Bert-mini distillation + quantization notebook example
Add DLRM & SSD-ResNet34 quantization examples on IPEX
Improve BERT structured sparsity training example

Validated Configurations

Python 3.8, 3.9, 3.10
Centos 8.3 & Ubuntu 18.04 & Win10
TensorFlow 2.6.2, 2.7, 2.8
Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
IPEX 1.8.0, 1.9.0, 1.10.0
MxNet 1.6.0, 1.7.0, 1.8.0
ONNX Runtime 1.8.0, 1.9.0, 1.10.0

Assets 2