Intel® Extension for Transformers v1.0b Release
Pre-release
Pre-release
- Highlights
- Features
- Productivity
- Examples
- Bug Fixing
- Documentation
Highlights
- Intel® Extension for Transformers provides more compression examples for popular applications like Stable Diffusion. For Stable Diffusion, we support INT8 quantization with PyTorch and BF16 fine-tune with Intel ® Extension for PyTorch.
Features
- Pruning/Sparsity
- Transformers-accelerated Neural Engine
- Support inference on Windows (fc580d5)
- Transformers-accelerated Libraries
- Support INT8 Softmax operator (fece837)
Productivity
- Simplify the integration with Alibaba BladeDISC
Examples
- Support INT8 quantization for large language model (T5-base example) with PyTorch
- Support INT8 Vision Transformer examples (ViT-base and ViT-large) in Neural Engine
- Support FP32 LAT example in Neural Engine
- Support INT8 quantization of 5 top HuggingFace TensorFlow models
Bug Fixing
- Fix Protobuf and Onnx version dependency issue
- Fix memory leak in Neural Engine
Documentation
- Create Notebook for Pruning/Compression Orchestration/IPEX Quantization
- Refine the user guide and compression example
Validated Configurations
- Centos 8.4 & Ubuntu 20.04 & Windows 10
- Python 3.7, 3.8, 3.9
- Intel® Extension for TensorFlow 2.9.1, 2.10.0
- PyTorch 1.11.0+cpu,1.12.0+cpu, 1.13.0+cpu, Intel® Extension for PyTorch 1.12.0+cpu ,1.13.0+cpu