Skip to content

Intel® Neural Compressor v1.12 Release

Compare
Choose a tag to compare
@ftian1 ftian1 released this 27 May 14:40
· 2061 commits to master since this release

Features

  • Quantization

    • Support accuracy-aware AMP (INT8/BF16/FP32) on PyTorch
    • Improve post-training quantization (static & dynamic) on PyTorch
    • Improve post-training quantization on TensorFlow
    • Improve QLinear and QDQ quantization modes on ONNX Runtime
    • Improve accuracy-aware AMP (INT8/FP32) on ONNX Runtime
  • Pruning

    • Improve pruning-once-for-all for NLP models
  • Sparsity

    • Support experimental sparse kernel for reference examples

Productivity

  • Support model deployment by loading INT8 models directly from HuggingFace model hub
  • Improve GUI with optimized model downloading, performance profiling, etc.

Ecosystem

  • Highlight simple quantization usage with few clicks on ONNX Model Zoo
  • Upstream INC quantized models (ResNet101, Tiny YoloV3) to ONNX Model Zoo

Examples

  • Add Bert-mini distillation + quantization notebook example
  • Add DLRM & SSD-ResNet34 quantization examples on IPEX
  • Improve BERT structured sparsity training example

Validated Configurations

  • Python 3.8, 3.9, 3.10
  • Centos 8.3 & Ubuntu 18.04 & Win10
  • TensorFlow 2.6.2, 2.7, 2.8
  • Intel TensorFlow 1.15.0 UP3, 2.7, 2.8
  • PyTorch 1.8.0+cpu, 1.9.0+cpu, 1.10.0+cpu
  • IPEX 1.8.0, 1.9.0, 1.10.0
  • MxNet 1.6.0, 1.7.0, 1.8.0
  • ONNX Runtime 1.8.0, 1.9.0, 1.10.0