forked from intel-analytics/ipex-llm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reconstruct Speculative Decoding example directory (intel-analytics#1…
…1136) * update * update * update
- Loading branch information
Showing
48 changed files
with
79 additions
and
59 deletions.
There are no files selected for viewing
4 changes: 2 additions & 2 deletions
4
.../CPU/Speculative-Decoding/Eagle/README.md → .../CPU/Speculative-Decoding/EAGLE/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,6 @@ | ||
# Self-Speculative Decoding for Large Language Model BF16 Inference using IPEX-LLM on Intel CPUs | ||
You can use IPEX-LLM to run BF16 inference for any Huggingface Transformer model with ***self-speculative decoding*** on Intel CPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. | ||
# Speculative-Decoding Examples on Intel CPU | ||
|
||
## Verified Hardware Platforms | ||
This folder contains examples of running Speculative-Decoding Examples with IPEX-LLM on Intel CPU: | ||
|
||
- Intel Xeon SPR server | ||
|
||
## Recommended Requirements | ||
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#system-support) for more information. Make sure you have installed `ipex-llm` before: | ||
|
||
```bash | ||
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu | ||
``` | ||
|
||
Moreover, install IPEX 2.1.0, which can be done through `pip install intel_extension_for_pytorch==2.1.0`. | ||
- [Self-Speculation](Self-Speculation): running BF16 inference for Huggingface Transformer model with ***self-speculative decoding*** with IPEX-LLM on Intel CPUs | ||
- [EAGLE](EAGLE): running speculative sampling using ***EAGLE*** (Extrapolation Algorithm for Greater Language-model Efficiency) with IPEX-LLM on Intel CPUs |
15 changes: 15 additions & 0 deletions
15
python/llm/example/CPU/Speculative-Decoding/Self-Speculation/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Self-Speculative Decoding for Large Language Model BF16 Inference using IPEX-LLM on Intel CPUs | ||
You can use IPEX-LLM to run BF16 inference for any Huggingface Transformer model with ***self-speculative decoding*** on Intel CPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. | ||
|
||
## Verified Hardware Platforms | ||
|
||
- Intel Xeon SPR server | ||
|
||
## Recommended Requirements | ||
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information. Make sure you have installed `ipex-llm` before: | ||
|
||
```bash | ||
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu | ||
``` | ||
|
||
Moreover, install IPEX 2.1.0, which can be done through `pip install intel_extension_for_pytorch==2.1.0`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
12 changes: 10 additions & 2 deletions
12
.../GPU/Speculative-Decoding/Eagle/README.md → .../GPU/Speculative-Decoding/EAGLE/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,6 @@ | ||
# Self-Speculative Decoding for Large Language Model FP16 Inference using IPEX-LLM on Intel GPUs | ||
You can use IPEX-LLM to run FP16 inference for any Huggingface Transformer model with ***self-speculative decoding*** on Intel GPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. | ||
# Speculative-Decoding Examples on Intel GPU | ||
|
||
## Verified Hardware Platforms | ||
This folder contains examples of running Speculative-Decoding Examples with IPEX-LLM on Intel GPU: | ||
|
||
- Intel Data Center GPU Max Series | ||
|
||
## Recommended Requirements | ||
To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details. | ||
|
||
Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered. | ||
|
||
Step 2, please refer to our [driver installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities. | ||
> **Note**: IPEX 2.1.10+xpu requires Intel GPU Driver version >= stable_775_20_20231219. | ||
Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional. | ||
> **Note**: IPEX 2.1.10+xpu requires Intel® oneAPI Base Toolkit's version == 2024.0. | ||
## Best Known Configuration on Linux | ||
|
||
For optimal performance on Intel Data Center GPU Max Series, it is recommended to set several environment variables. | ||
```bash | ||
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so | ||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 | ||
export ENABLE_SDP_FUSION=1 | ||
``` | ||
- [Self-Speculation](Self-Speculation): running BF16 inference for Huggingface Transformer model with ***self-speculative decoding*** with IPEX-LLM on Intel GPUs | ||
- [EAGLE](EAGLE): running speculative sampling using ***EAGLE*** (Extrapolation Algorithm for Greater Language-model Efficiency) with IPEX-LLM on Intel GPUs |
26 changes: 26 additions & 0 deletions
26
python/llm/example/GPU/Speculative-Decoding/Self-Speculation/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Self-Speculative Decoding for Large Language Model FP16 Inference using IPEX-LLM on Intel GPUs | ||
You can use IPEX-LLM to run FP16 inference for any Huggingface Transformer model with ***self-speculative decoding*** on Intel GPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. | ||
|
||
## Verified Hardware Platforms | ||
|
||
- Intel Data Center GPU Max Series | ||
|
||
## Recommended Requirements | ||
To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details. | ||
|
||
Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered. | ||
|
||
Step 2, please refer to our [driver installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities. | ||
> **Note**: IPEX 2.1.10+xpu requires Intel GPU Driver version >= stable_775_20_20231219. | ||
Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional. | ||
> **Note**: IPEX 2.1.10+xpu requires Intel® oneAPI Base Toolkit's version == 2024.0. | ||
## Best Known Configuration on Linux | ||
|
||
For optimal performance on Intel Data Center GPU Max Series, it is recommended to set several environment variables. | ||
```bash | ||
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so | ||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 | ||
export ENABLE_SDP_FUSION=1 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
Oops, something went wrong.