BigDL-LLM Examples on Intel GPU

This folder contains examples of running BigDL-LLM on Intel GPU:

HF-Transformers-AutoModels: running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs)
QLoRA-FineTuning: running QLoRA finetuning using BigDL-LLM on Intel GPUs
vLLM-Serving: running vLLM serving framework on intel GPUs (with BigDL-LLM low-bit optimized models)
Deepspeed-AutoTP: running distributed inference using DeepSpeed AutoTP (with BigDL-LLM low-bit optimized models) on Intel GPUs
PyTorch-Models: running any PyTorch model on BigDL-LLM (with "one-line code change")

System Support

Hardware:

Operating System:

To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.

Step 1, please refer to our driver installation for general purpose GPU capabilities.

Note: IPEX 2.0.110+xpu requires Intel GPU Driver version is Stable 647.21.

Step 2, you also need to download and install Intel® oneAPI Base Toolkit. OneMKL and DPC++ compiler are needed, others are optional.

Note: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.

For better performance, it is recommended to set environment variables on Linux:

export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1