Skip to content

Intel® Extension for Transformers v1.3.2 Release

Compare
Choose a tag to compare
@kevinintel kevinintel released this 24 Feb 05:47
· 279 commits to main since this release
9e9e4c7

Highlights

  • Support NeuralChat-TGI serving with Docker (8ebff39)
  • Support Neuralchat-vLLM serving with Docker (1988dd)
  • Support SQL generation in NeuralChat (098aca7)
  • Enable llava mmmu evaluation on Gaudi2 (c30353f)
  • Improve LLM INT4 inference on Intel GPUs

Improvements

  • Minimize dependencies for running a chatbot (a0c9dfe)
  • Remove redundant knowledge id in audio plugin API (9a7353)
  • Update parameters for NeuralSpeed (19fec91)
  • Integrate backend code of Askdoc (c5d4cd)
  • Refine finetuning data preprocessing with static shape for Gaudi2 (3f62ceb)
  • Sync RESTful API with latest OpenAI protocol (2e1c79)
  • Support WOQ model save and load (1c8078f)
  • Extend API for GGUF (7733d4)
  • Enable OpenAI compatible audio API (d62ff9e)
  • Add pack_weight info acquire interface (18d36ef)
  • add customized system prompts (04b2f8)
  • Support WOQ scheme asym (c7f0b70)
  • update code_lm_eval to bigcode_eval (44f914e)
  • enable Retrieval PDF figure to text (d6a66b3)
  • enable retrieval then rerank pipeline (15feadf)
  • enable gramma check and query polish to enhance RAG performance (a63ec0)

Examples

  • Add Rank-One Model Editing (ROME) implementation and example (8dcf0ea7)
  • Support GPTQ, AWQ model in NeuralChat (5b08de)
  • Add Neural Speed example scripts (6a97d15, 3385c42)
  • Add langchain extension example and update notebook (d40e2f1)
  • Support deepseek-coder models in NeuralChat (e7f5b1d)
  • Add autoround examples (71f5e84)
  • BGE embedding model finetuning (67bef24)
  • Support DeciLM-7B and DeciLM-7B-instruct in NeuralChat (e6f87ab)
  • Support GGUF model in NeuralChat (a53a33c)

Bug Fixing

  • Add trust_remote_code args for lm_eval of WOQ example.( 9022eb)
  • Fix CPU WOQ accuracy issue (e530f7)
  • Change the default value for XPU weight-only quantization (4a78ba)
  • Fix whisper forced_decoder_ids error (09ddad)
  • Fix off by one error on masking (525076d)
  • Fix backprop error for text only examples (9cff14a)
  • Use unk token instead of eos token (6387a0)
  • Fix errors in trainer save (ff501d0)
  • Fix Qdrant bug caused by langchain_core upgrade (eb763e6)
  • Set trainer.save_model state_dict format to safetensors (2eca8c)
  • Fix text-generation example accuracy scripts (a2cfb80)
  • Resolve WOQ quantization error when running neuralchat (6c0bd77)
  • Fix response issue of model.predict (3068496)
  • Fix pydub library import issues (c37dab)
  • Fix chat history issue (7bb3314)
  • Update gradio APP to sync with backend change (362b7af)

Validated Configurations

  • Python 3.10
  • Ubuntu 22.04
  • Intel® Extension for TensorFlow 2.13.0
  • PyTorch 2.1.0+cpu
  • Intel® Extension for Torch 2.1.0+cpu