Release Intel® Extension for Transformers v1.3.2 Release · intel/intel-extension-for-transformers

Highlights

Improvements

Minimize dependencies for running a chatbot (a0c9dfe)
Remove redundant knowledge id in audio plugin API (9a7353)
Update parameters for NeuralSpeed (19fec91)
Integrate backend code of Askdoc (c5d4cd)
Refine finetuning data preprocessing with static shape for Gaudi2 (3f62ceb)
Sync RESTful API with latest OpenAI protocol (2e1c79)
Support WOQ model save and load (1c8078f)
Extend API for GGUF (7733d4)
Enable OpenAI compatible audio API (d62ff9e)
Add pack_weight info acquire interface (18d36ef)
add customized system prompts (04b2f8)
Support WOQ scheme asym (c7f0b70)
update code_lm_eval to bigcode_eval (44f914e)
enable Retrieval PDF figure to text (d6a66b3)
enable retrieval then rerank pipeline (15feadf)
enable gramma check and query polish to enhance RAG performance (a63ec0)

Examples

Add Rank-One Model Editing (ROME) implementation and example (8dcf0ea7)
Support GPTQ, AWQ model in NeuralChat (5b08de)
Add Neural Speed example scripts (6a97d15, 3385c42)
Add langchain extension example and update notebook (d40e2f1)
Support deepseek-coder models in NeuralChat (e7f5b1d)
Add autoround examples (71f5e84)
BGE embedding model finetuning (67bef24)
Support DeciLM-7B and DeciLM-7B-instruct in NeuralChat (e6f87ab)
Support GGUF model in NeuralChat (a53a33c)

Bug Fixing

Validated Configurations

Provide feedback