Skip to content

Intel® Extension for Transformers v1.3 Release

Compare
Choose a tag to compare
@kevinintel kevinintel released this 22 Dec 07:42
· 487 commits to main since this release
6e3a514

Highlights
Publication
Features
Examples
Bug Fixing
Incompatible change

Highlights

  • LLM Workflow/Neural Chat
    • Achieved Top-1 7B LLM Hugging Face Open Leaderboard in Nov’23
    • Released DPO dataset to Hugging Face Space for fine-tuning
    • Published the blog and fine-tuning code on Gaudi2
    • Supported fine-tuning and inference on Gaudi2 and Xeon
    • Updated notebooks for chatbot development and deployment
    • Provided customizable RAG-based chatbot applications
    • Published INT4 chatbot on Hugging Face Space
  • Transformer Extension for Low-bit Inference and Fine-tuning
    • Supported INT4/NF4/FP4/FP8 LLM inference
    • Improved StreamingLLM for efficient endless text generation
    • Demonstrated up to 40x better performance than llama.cpp on Intel Xeon Scalable Processors
    • Supported QLoRA fine-tuning on CPU

Publications

Features

  • LLM Workflow/Neural Chat
    • Support Gaudi model parallelism serving (7f0090)
    • Add PEFT model support in deepspeed sharded mode (370ca3)
    • Support return error code (ea173a)
    • Enhance NeuralChat security (ab43c7, 43e8b9, 6e0386)
    • Support assisted generation for NeuralChat (5ba797)
    • Add codegen restful API in NeuralChat (0c77b1)
    • Support multi cards streaming inference on Gaudi (9ad75c)
    • Support multi CPU restful API serving (fec4bb4)
    • Support IPEX int8 model (e13363)
    • Enable retrieval with URL as inputs (9d90e1d)
    • Add NER plugin to NeuralChat (aa5d8a)
    • Integrate PhotoAI backend into NeuralChat (da138c, d7a1d8)
    • Support image to image plugin as service (12ad4c)
    • Support optimized SadTalker to Video plugin in NeuralChat (7f24c79)
    • Add askdoc retrieval API & example (89cf76)
    • Add sidebyside UI (dbbcc2b)
  • Transformer Extension for Low-bit Inference and Fine-tuning

Examples

  • LLM Workflow/Neural Chat
    • Add Mistral, Code-Llama, NeuralChat-7B, Qwen (fcee612, 7baa96b, d9a864, 698e58)
    • Added StarCoder, CodeLlama, Falcon and Mistral finetuning example(477018)
    • Add fine-tuning with Deepspeed example (554fb9)
  • Transformer Extension for Low-bit Inference and Fine-tuning
    • Add ChatGLM and Code-Llama example (130b59)
    • Add WOQ to code-generation example (65a645f)
    • Add Text-generation example support ChatGLM2&3 (4525b)
    • Text-generation support qwen (8f41d4)
    • Add INT4 ONNX whisper example (c7f8173c, e9fc4c2)
    • Support DPO on habana/gaudi (98d3ce3)
    • Enable finetune for Qwen-7b-chat on CPU (6bc938)
    • Enable Whisper C++ API (74e92a)
    • Apply the STS task to BAAI/BGE models (0c4c5ed, c399e38)
    • Enable Qwen graph (381331c)
    • Add instruction_tuning Stable Diffusion examples (17f01c6)
    • Enable Mistral-7b (7d1495)
    • Support Falcon-180B (900ebf4)
    • Add Baichuan/Baichuan2 example (98e5f9)

Bug Fixing

  • LLM Workflow/Neural Chat
    • Enhance SafetyChecker to resolve can't find stopword.txt (5ba797)
    • Multilingual ASR enhance (62d002)
    • Remove haystack dependency (16ff4fb)
    • Fix starcoder issues for IPEX int8 and Weight Only int4 (e88c7b)
    • Remove OneDNN env setint for BF16 inference (59ab03)
    • Fix ChatGLM2 model loading issue (4f2169)
    • Fix init issue of langchain chroma (fdefe27)
  • Transformer Extension for Low-bit Inference and Fine-tuning
    • Fixed bug for woq with AWQ (565ab4b)
    • Use validation dataset for evaluation (e764bb)
    • Fix gradient issue for qlora on seq2seq (ff0465)
    • Fix post process with topk topp of python API (7b4730)
    • Fix PC codegen streaming issue (0f0bf22)
    • Fix Jblas stack overflow on Windows (65af04)

Incompatible Changes

  • [Neural Chat] Optimize the structure of NeuralChat example directories (1447e6f)
  • [Transformers Extension for Low-bit Inference] Update baichuan/baichuan2 API (98e5f9)

Validated Configurations

  • Python 3.9, 3.10, 3.11
  • Centos 8.4 & Ubuntu 20.04 & Windows 10
  • Intel® Extension for TensorFlow 2.13.0, 2.14.0
  • PyTorch 2.1.0+cpu 2.0.0+cpu
  • Intel® Extension for PyTorch 2.1.0+cpu, 2.0.0+cpu