You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/mddocs/Quickstart/continue_quickstart.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,11 +33,12 @@ Visit [Run Ollama with IPEX-LLM on Intel GPU](./ollama_quickstart.md), and follo
33
33
> If the `Continue` plugin is not installed on the same machine where Ollama is running (which means `Continue` needs to connect to a remote Ollama service), you must configure the Ollama service to accept connections from any IP address. To achieve this, set or export the environment variable `OLLAMA_HOST=0.0.0.0` before executing the command `ollama serve`.
34
34
35
35
> [!TIP]
36
-
> If your local LLM is running on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), it is recommended to additionaly set the following environment variable for optimal performance before executing `ollama serve`:
36
+
> If your local LLM is running on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), setting the following environment variable before starting the service may potentially improve performance.
> The environment variable `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` determines the usage of immediate command lists for task submission to the GPU. While this mode typically enhances performance, exceptions may occur. Please consider experimenting with and without this environment variable for best performance. For more details, you can refer to [this article](https://www.intel.com/content/www/us/en/developer/articles/guide/level-zero-immediate-command-lists.html).
# Available low_bit format including sym_int4, sym_int8, fp16 etc.
61
61
source /opt/intel/oneapi/setvars.sh
62
62
export USE_XETLA=OFF
63
+
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
> The environment variable `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` determines the usage of immediate command lists for task submission to the GPU. While this mode typically enhances performance, exceptions may occur. Please consider experimenting with and without this environment variable for best performance. For more details, you can refer to [this article](https://www.intel.com/content/www/us/en/developer/articles/guide/level-zero-immediate-command-lists.html).
129
+
124
130
#### Launch multiple workers
125
131
126
132
Sometimes we may want to start multiple workers for the best performance. For running in CPU, you may want to seperate multiple workers in different sockets. Assuming each socket have 48 physicall cores, then you may want to start two workers using the following example:
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`
266
268
267
269
> [!NOTE]
268
270
> Please refer to [this guide](../Overview/install_gpu.md#runtime-configuration-1) for more details regarding runtime configuration.
269
271
272
+
> [!NOTE]
273
+
> The environment variable `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` determines the usage of immediate command lists for task submission to the GPU. While this mode typically enhances performance, exceptions may occur. Please consider experimenting with and without this environment variable for best performance. For more details, you can refer to [this article](https://www.intel.com/content/www/us/en/developer/articles/guide/level-zero-immediate-command-lists.html).
Copy file name to clipboardExpand all lines: docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md
+13-4Lines changed: 13 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,6 +51,7 @@ To use GPU acceleration, several environment variables are required or recommend
51
51
```bash
52
52
source /opt/intel/oneapi/setvars.sh
53
53
export SYCL_CACHE_PERSISTENT=1
54
+
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
# [optional] if you want to run on single GPU, use below command to limit GPU may improve performance
56
57
export ONEAPI_DEVICE_SELECTOR=level_zero:0
@@ -62,44 +63,48 @@ To use GPU acceleration, several environment variables are required or recommend
62
63
63
64
```cmd
64
65
set SYCL_CACHE_PERSISTENT=1
66
+
rem under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
65
67
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
66
68
```
67
69
68
70
> [!TIP]
69
71
> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector).
70
72
73
+
> [!NOTE]
74
+
> The environment variable `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` determines the usage of immediate command lists for task submission to the GPU. While this mode typically enhances performance, exceptions may occur. Please consider experimenting with and without this environment variable for best performance. For more details, you can refer to [this article](https://www.intel.com/content/www/us/en/developer/articles/guide/level-zero-immediate-command-lists.html).
75
+
71
76
##### Run llama3
72
77
73
78
Under your current directory, exceuting below command to do inference with Llama3:
74
79
75
80
- For **Linux users**:
76
81
77
82
```bash
78
-
./main -m <model_dir>/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun doing something" -t 8 -e -ngl 33 --color --no-mmap
83
+
./llama-cli -m <model_dir>/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun doing something" -c 1024 -t 8 -e -ngl 33 --color --no-mmap
79
84
```
80
85
81
86
- For **Windows users**:
82
87
83
88
Please run the following command in Miniforge Prompt.
84
89
85
90
```cmd
86
-
main -m <model_dir>/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun doing something" -e -ngl 33 --color --no-mmap
91
+
llama-cli -m <model_dir>/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun doing something" -c 1024 -e -ngl 33 --color --no-mmap
87
92
```
88
93
89
94
Under your current directory, you can also execute below command to have interactive chat with Llama3:
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
# [optional] if you want to run on single GPU, use below command to limit GPU may improve performance
136
142
export ONEAPI_DEVICE_SELECTOR=level_zero:0
@@ -147,6 +153,7 @@ Launch the Ollama service:
147
153
set ZES_ENABLE_SYSMAN=1
148
154
set OLLAMA_NUM_GPU=999
149
155
set SYCL_CACHE_PERSISTENT=1
156
+
rem under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
150
157
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
151
158
152
159
ollama serve
@@ -160,6 +167,8 @@ Launch the Ollama service:
160
167
> [!TIP]
161
168
> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector).
162
169
170
+
> [!NOTE]
171
+
> The environment variable `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` determines the usage of immediate command lists for task submission to the GPU. While this mode typically enhances performance, exceptions may occur. Please consider experimenting with and without this environment variable for best performance. For more details, you can refer to [this article](https://www.intel.com/content/www/us/en/developer/articles/guide/level-zero-immediate-command-lists.html).
0 commit comments