Intel® Extension for Transformers v1.4.2 Release

Latest

Latest

kevinintel released this 24 May 12:23

· 65 commits to main since this release

Highlights
Improvements
Examples
Bug Fixing

Highlights

Support vLLM CPU and IPEX CPU WOQ with Transformer-like API
Support Streamingllm on Habana Gaudi

Improvements

Add true_sequential for WOQ GPTQ (091f564 )
Refine GPU scripts to support OOB mode (f4b3a7b )
QBits adapt to the latest BesTLA (c169bec )
Improve CPU WOQ scheme setting (fd3ee5cf1 )
Enhance voicechat API with multilang tts streaming support (98daf37d )
Add bias internal convertion in qbits (7c29f6f1 )
Add DynamicQuantConfig, QuantAwareTrainingConfig and StaticQuantConfig (6a15b48, e1f4666d)

Examples

Integrate EAGLE with ITREX (e559929d )

Bug Fixing

Fix for token latency (ae7a4ae )
Fix phi3 quantization scripts (2af19c7a6 )
Fix is_intel_gpu_available (47d5024 )
Fix QLoRA CPU issue due to internal API change (699ffca )
Add scale and weight dtype check for quantization config (307c1a8b )
Fix tf autodistill bug of transformers>=4.37 (8116fbb2 )

Validated Configurations

Python 3.10
Ubuntu 22.04
PyTorch 2.2.0+cpu
Intel® Extension for Torch 2.2.0+cpu

Assets 2