Skip to content

Commit

Permalink
Enhanced XPU Dockerfiles: Optimized Environment Variables and Documen…
Browse files Browse the repository at this point in the history
…tation (#11506)

* Added SYCL_CACHE_PERSISTENT=1 to xpu Dockerfile

* Update the document to add explanations for environment variables.

* update quickstart
  • Loading branch information
liu-shaojun authored Jul 4, 2024
1 parent 60de428 commit 72b4efa
Show file tree
Hide file tree
Showing 5 changed files with 36 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docker/llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Available images in hub are:
| intelanalytics/ipex-llm-serving-cpu:2.1.0-SNAPSHOT | CPU Serving|
| intelanalytics/ipex-llm-serving-xpu:2.1.0-SNAPSHOT | GPU Serving|
| intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.1.0-SNAPSHOT | CPU Finetuning via Docker|
|intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.1.0-SNAPSHOT|CPU Finetuning via Kubernetes|
| intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.1.0-SNAPSHOT|CPU Finetuning via Kubernetes|
| intelanalytics/ipex-llm-finetune-qlora-xpu:2.1.0-SNAPSHOT| GPU Finetuning|

#### Run a Container
Expand Down
3 changes: 3 additions & 0 deletions docker/llm/finetune/xpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ ARG https_proxy
ENV TZ=Asia/Shanghai
ARG PIP_NO_CACHE_DIR=false

# When cache is enabled SYCL runtime will try to cache and reuse JIT-compiled binaries.
ENV SYCL_CACHE_PERSISTENT=1

# retrive oneapi repo public key
RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | tee /usr/share/keyrings/intel-oneapi-archive-keyring.gpg > /dev/null && \
echo "deb [signed-by=/usr/share/keyrings/intel-oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main " | tee /etc/apt/sources.list.d/oneAPI.list && \
Expand Down
3 changes: 3 additions & 0 deletions docker/llm/inference-cpp/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ ARG https_proxy
ENV TZ=Asia/Shanghai
ENV PYTHONUNBUFFERED=1

# When cache is enabled SYCL runtime will try to cache and reuse JIT-compiled binaries.
ENV SYCL_CACHE_PERSISTENT=1

# Disable pip's cache behavior
ARG PIP_NO_CACHE_DIR=false

Expand Down
5 changes: 3 additions & 2 deletions docker/llm/inference/xpu/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ ARG https_proxy

ENV TZ=Asia/Shanghai
ENV PYTHONUNBUFFERED=1
ENV USE_XETLA=OFF
ENV SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

# When cache is enabled SYCL runtime will try to cache and reuse JIT-compiled binaries.
ENV SYCL_CACHE_PERSISTENT=1

COPY chat.py /llm/chat.py
COPY benchmark.sh /llm/benchmark.sh
Expand Down
26 changes: 26 additions & 0 deletions docs/mddocs/DockerGuides/docker_pytorch_inference_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,32 @@ root@arda-arc12:/# sycl-ls
> bash env-check.sh
> ```
> [!NOTE]
> For optimal performance, it is recommended to set several environment variables according to your hardware environment.
>
> ```bash
> # Disable code related to XETLA; only Intel Data Center GPU Max Series supports XETLA, so non-Max machines should set this to OFF.
> # Recommended for use on Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series.
> export USE_XETLA=OFF
>
> # Enable immediate command lists mode for the Level Zero plugin. Improves performance on Intel Arc™ A-Series Graphics and Intel Data Center GPU Max Series; however, it depends on the Linux Kernel, and some Linux kernels may not necessarily provide acceleration.
> # Recommended for use on Intel Arc™ A-Series Graphics and Intel Data Center GPU Max Series, but it depends on the Linux kernel, Non-i915 kernel drivers may cause performance regressions.
> export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
>
> # Controls persistent device compiled code cache. Set to '1' to turn on and '0' to turn off.
> # Recommended for all hardware environments. This environment variable is already set by default in Docker images.
> export SYCL_CACHE_PERSISTENT=1
>
> # Reduce memory accesses by fusing SDP ops.
> # Recommended for use on Intel Data Center GPU Max Series.
> export ENABLE_SDP_FUSION=1
>
> # Disable XMX computation.
> # Recommended for use on integrated GPUs.
> export BIGDL_LLM_XMX_DISABLED=1
> ```
## Run Inference Benchmark
Navigate to benchmark directory, and modify the `config.yaml` under the `all-in-one` folder for benchmark configurations.
Expand Down

0 comments on commit 72b4efa

Please sign in to comment.