Skip to content

Commit 7d8f3a0

Browse files
committed
new main Merge branch 'main' of https://github.com/intel-analytics/ipex-llm into test_transformers_41
2 parents 428e62b + 75b19f8 commit 7d8f3a0

File tree

19 files changed

+624
-265
lines changed

19 files changed

+624
-265
lines changed

.github/workflows/llm-c-evaluation.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ permissions:
1212
on:
1313
# schedule:
1414
# - cron: "00 15 * * *" # GMT time, 15:00 GMT == 23:00 Beijing Time
15-
pull_request:
16-
branches: [main]
17-
paths:
18-
- ".github/workflows/llm-c-evaluation.yml"
15+
# pull_request:
16+
# branches: [main]
17+
# paths:
18+
# - ".github/workflows/llm-c-evaluation.yml"
1919
# Allows you to run this workflow manually from the Actions tab
2020
workflow_dispatch:
2121
inputs:
@@ -204,7 +204,7 @@ jobs:
204204
pip install pandas==1.5.3
205205
206206
- name: Download ceval results
207-
uses: actions/download-artifact@v3
207+
uses: actions/download-artifact@4.1.7
208208
with:
209209
name: ceval_results
210210
path: results
@@ -259,7 +259,7 @@ jobs:
259259
fi
260260
261261
- name: Download ceval results
262-
uses: actions/download-artifact@v3
262+
uses: actions/download-artifact@4.1.7
263263
with:
264264
name: results_${{ needs.set-matrix.outputs.date }}
265265
path: ${{ env.ACC_FOLDER }}

.github/workflows/llm-harness-evaluation.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ permissions:
1212
on:
1313
# schedule:
1414
# - cron: "30 12 * * *" # GMT time, 12:30 GMT == 20:30 China
15-
pull_request:
16-
branches: [main]
17-
paths:
18-
- ".github/workflows/llm-harness-evaluation.yml"
15+
# pull_request:
16+
# branches: [main]
17+
# paths:
18+
# - ".github/workflows/llm-harness-evaluation.yml"
1919
# Allows you to run this workflow manually from the Actions tab
2020
workflow_dispatch:
2121
inputs:
@@ -220,7 +220,7 @@ jobs:
220220
pip install --upgrade pip
221221
pip install jsonlines pytablewriter regex
222222
- name: Download all results
223-
uses: actions/download-artifact@v3
223+
uses: actions/download-artifact@4.1.7
224224
with:
225225
name: harness_results
226226
path: results
@@ -260,7 +260,7 @@ jobs:
260260
fi
261261
262262
- name: Download harness results
263-
uses: actions/download-artifact@v3
263+
uses: actions/download-artifact@4.1.7
264264
with:
265265
name: harness_results
266266
path: ${{ env.ACC_FOLDER}}/${{ env.DATE }}

.github/workflows/llm-ppl-evaluation.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ permissions:
1212
on:
1313
# schedule:
1414
# - cron: "00 12 * * *" # GMT time, 12:00 GMT == 20:00 China
15-
pull_request:
16-
branches: [main]
17-
paths:
18-
- ".github/workflows/llm-ppl-evaluation.yml"
15+
# pull_request:
16+
# branches: [main]
17+
# paths:
18+
# - ".github/workflows/llm-ppl-evaluation.yml"
1919
# Allows you to run this workflow manually from the Actions tab
2020
workflow_dispatch:
2121
inputs:
@@ -206,7 +206,7 @@ jobs:
206206
pip install --upgrade pip
207207
pip install jsonlines pytablewriter regex
208208
- name: Download all results
209-
uses: actions/download-artifact@v3
209+
uses: actions/download-artifact@4.1.7
210210
with:
211211
name: ppl_results
212212
path: results
@@ -245,7 +245,7 @@ jobs:
245245
fi
246246
247247
- name: Download ppl results
248-
uses: actions/download-artifact@v3
248+
uses: actions/download-artifact@4.1.7
249249
with:
250250
name: ppl_results
251251
path: ${{ env.ACC_FOLDER}}/${{ env.DATE }}

.github/workflows/llm-whisper-evaluation.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ permissions:
1212
on:
1313
# schedule:
1414
# - cron: "00 13 * * *" # GMT time, 13:00 GMT == 21:00 China
15-
pull_request:
16-
branches: [main]
17-
paths:
18-
- ".github/workflows/llm-whisper-evaluation.yml"
15+
# pull_request:
16+
# branches: [main]
17+
# paths:
18+
# - ".github/workflows/llm-whisper-evaluation.yml"
1919
# Allows you to run this workflow manually from the Actions tab
2020
workflow_dispatch:
2121
inputs:
@@ -176,14 +176,14 @@ jobs:
176176
177177
- name: Download all results for nightly run
178178
if: github.event_name == 'schedule'
179-
uses: actions/download-artifact@v3
179+
uses: actions/download-artifact@4.1.7
180180
with:
181181
name: whisper_results
182182
path: ${{ env.NIGHTLY_FOLDER}}/${{ env.OUTPUT_PATH }}
183183

184184
- name: Download all results for pr run
185185
if: github.event_name == 'pull_request'
186-
uses: actions/download-artifact@v3
186+
uses: actions/download-artifact@4.1.7
187187
with:
188188
name: whisper_results
189189
path: ${{ env.PR_FOLDER}}/${{ env.OUTPUT_PATH }}

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -319,7 +319,7 @@ Over 50 models have been optimized/verified on `ipex-llm`, including *LLaMA/LLaM
319319
| MiniCPM-V | | [link](python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V) |
320320
| MiniCPM-V-2 | | [link](python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2) |
321321
| MiniCPM-Llama3-V-2_5 | | [link](python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-Llama3-V-2_5) |
322-
| MiniCPM-V-2_6 | | [link](python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2_6) |
322+
| MiniCPM-V-2_6 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/minicpm-v-2_6) | [link](python/llm/example/GPU/HuggingFace/Multimodal/MiniCPM-V-2_6) |
323323

324324
## Get Support
325325
- Please report a bug or raise a feature request by opening a [Github Issue](https://github.com/intel-analytics/ipex-llm/issues)

docs/mddocs/Quickstart/graphrag_quickstart.md

Lines changed: 80 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,16 @@ The [GraphRAG project](https://github.com/microsoft/graphrag) is designed to lev
99
- [Setup Python Environment for GraphRAG](#3-setup-python-environment-for-graphrag)
1010
- [Index GraphRAG](#4-index-graphrag)
1111
- [Query GraphRAG](#5-query-graphrag)
12+
- [Query GraphRAG](#5-query-graphrag)
13+
- [Troubleshooting](#troubleshooting)
1214

1315
## Quickstart
1416

1517
### 1. Install and Start `Ollama` Service on Intel GPU
1618

17-
Follow the steps in [Run Ollama with IPEX-LLM on Intel GPU Guide](./ollama_quickstart.md) to install and run Ollama on Intel GPU. Ensure that `ollama serve` is running correctly and can be accessed through a local URL (e.g., `https://127.0.0.1:11434`).
19+
Follow the steps in [Run Ollama with IPEX-LLM on Intel GPU Guide](./ollama_quickstart.md) to install `ipex-llm[cpp]==2.1.0` and run Ollama on Intel GPU. Ensure that `ollama serve` is running correctly and can be accessed through a local URL (e.g., `https://127.0.0.1:11434`).
20+
21+
**Please note that for GraphRAG, we highly recommand using the stable version of ipex-llm through `pip install ipex-llm[cpp]==2.1.0`**.
1822

1923
### 2. Prepare LLM and Embedding Model
2024

@@ -57,13 +61,17 @@ conda create -n graphrag-local-ollama python=3.10
5761
conda activate graphrag-local-ollama
5862

5963
pip install -e .
64+
pip install future
6065

6166
pip install ollama
6267
pip install plotly
6368
```
6469

6570
in which `pip install ollama` is for enabling restful APIs through python, and `pip install plotly` is for visualizing the knowledge graph.
6671

72+
> [!NOTE]
73+
> Please note that the Python environment for GraphRAG setup here is separate from the one for Ollama server on Intel GPUs.
74+
6775
### 4. Index GraphRAG
6876

6977
The environment is now ready for GraphRAG with local LLMs and embedding models running on Intel GPUs. Before querying GraphRAG, it is necessary to first index GraphRAG, which could be a resource-intensive operation.
@@ -114,24 +122,25 @@ Perpare the input corpus, and then initialize the workspace:
114122
#### Update `settings.yml`
115123

116124
In the `settings.yml` file inside the `ragtest` folder, add the configuration `request_timeout: 1800.0` for `llm`. Besides, if you would like to use LLMs or embedding models other than `mistral` or `nomic-embed-text`, you are required to update the `settings.yml` in `ragtest` folder accordingly:
117-
>
118-
> ```yml
119-
> llm:
120-
> api_key: ${GRAPHRAG_API_KEY}
121-
> type: openai_chat
122-
> model: mistral # change it accordingly if using another LLM
123-
> model_supports_json: true
124-
> request_timeout: 1800.0 # add this configuration; you could also increase the request_timeout
125-
> api_base: http://localhost:11434/v1
126-
>
127-
> embeddings:
128-
> async_mode: threaded
129-
> llm:
130-
> api_key: ${GRAPHRAG_API_KEY}
131-
> type: openai_embedding
132-
> model: nomic_embed_text # change it accordingly if using another embedding model
133-
> api_base: http://localhost:11434/api
134-
> ```
125+
126+
127+
```yml
128+
llm:
129+
api_key: ${GRAPHRAG_API_KEY}
130+
type: openai_chat
131+
model: mistral # change it accordingly if using another LLM
132+
model_supports_json: true
133+
request_timeout: 1800.0 # add this configuration; you could also increase the request_timeout
134+
api_base: http://localhost:11434/v1
135+
136+
embeddings:
137+
async_mode: threaded
138+
llm:
139+
api_key: ${GRAPHRAG_API_KEY}
140+
type: openai_embedding
141+
model: nomic_embed_text # change it accordingly if using another embedding model
142+
api_base: http://localhost:11434/api
143+
```
135144
136145
#### Conduct GraphRAG indexing
137146
@@ -197,3 +206,55 @@ The Transformer model has been very successful in various natural language proce
197206
198207
Since its initial introduction, the Transformer model has been further developed and improved upon. Variants of the Transformer architecture, such as BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly Optimized BERT Pretraining Approach), have achieved state-of-the-art performance on a wide range of natural language processing tasks [Data: Reports (1, 2, 34, 46, 64, +more)].
199208
```
209+
210+
### Troubleshooting
211+
212+
#### `failed to find free space in the KV cache, retrying with smaller n_batch` when conducting GraphRAG Indexing, and `JSONDecodeError` when querying GraphRAG
213+
214+
If you observe the Ollama server log showing `failed to find free space in the KV cache, retrying with smaller n_batch` while conducting GraphRAG indexing, and receive `JSONDecodeError` when querying GraphRAG, try to increase context length for the LLM model and index/query GraphRAG again.
215+
216+
Here introduce how to make the LLM model support larger context. To do this, we need to first create a file named `Modelfile`:
217+
218+
```
219+
FROM mistral:latest
220+
PARAMETER num_ctx 4096
221+
```
222+
223+
> [!TIP]
224+
> Here we increase `num_ctx` to 4096 as an example. You could adjust it accordingly.
225+
226+
and then use the following commands to create a new model in Ollama named `mistral:latest-nctx4096`:
227+
228+
- For **Linux users**:
229+
230+
```bash
231+
./ollama create mistral:latest-nctx4096 -f Modelfile
232+
```
233+
234+
- For **Windows users**:
235+
236+
Please run the following command in Miniforge or Anaconda Prompt.
237+
238+
```cmd
239+
ollama create mistral:latest-nctx4096 -f Modelfile
240+
```
241+
242+
Finally, update `settings.yml` inside the `ragtest` folder to use `llm` model `mistral:latest-nctx4096`:
243+
244+
```yml
245+
llm:
246+
api_key: ${GRAPHRAG_API_KEY}
247+
type: openai_chat
248+
model: mistral:latest-nctx4096 # change it accordingly if using another LLM, or LLM model with larger num_ctx
249+
model_supports_json: true
250+
request_timeout: 1800.0 # add this configuration; you could also increase the request_timeout
251+
api_base: http://localhost:11434/v1
252+
253+
embeddings:
254+
async_mode: threaded
255+
llm:
256+
api_key: ${GRAPHRAG_API_KEY}
257+
type: openai_embedding
258+
model: nomic_embed_text # change it accordingly if using another embedding model
259+
api_base: http://localhost:11434/api
260+
```
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# MiniCPM-V-2_6
2+
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM-V-2_6 models. For illustration purposes, we utilize the [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) as a reference MiniCPM-V-2_6 model.
3+
4+
## 0. Requirements
5+
To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
6+
7+
## Example: Predict Tokens using `chat()` API
8+
In the example [chat.py](./chat.py), we show a basic use case for a MiniCPM-V-2_6 model to predict the next N tokens using `chat()` API, with IPEX-LLM INT4 optimizations.
9+
### 1. Install
10+
We suggest using conda to manage environment:
11+
12+
On Linux:
13+
14+
```bash
15+
conda create -n llm python=3.11
16+
conda activate llm
17+
18+
# install ipex-llm with 'all' option
19+
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
20+
pip install torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cpu
21+
pip install transformers==4.40.0 trl
22+
```
23+
On Windows:
24+
25+
```cmd
26+
conda create -n llm python=3.11
27+
conda activate llm
28+
29+
pip install --pre --upgrade ipex-llm[all]
30+
pip install torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cpu
31+
pip install transformers==4.40.0 trl
32+
```
33+
34+
### 2. Run
35+
36+
- chat without streaming mode:
37+
```
38+
python ./chat.py --prompt 'What is in the image?'
39+
```
40+
- chat in streaming mode:
41+
```
42+
python ./chat.py --prompt 'What is in the image?' --stream
43+
```
44+
45+
> [!TIP]
46+
> For chatting in streaming mode, it is recommended to set the environment variable `PYTHONUNBUFFERED=1`.
47+
48+
49+
Arguments info:
50+
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the MiniCPM-V-2_6 model (e.g. `openbmb/MiniCPM-V-2_6`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'openbmb/MiniCPM-V-2_6'`.
51+
- `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg'`.
52+
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is in the image?'`.
53+
- `--stream`: flag to chat in streaming mode
54+
55+
> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
56+
>
57+
> Please select the appropriate size of the MiniCPM model based on the capabilities of your machine.
58+
59+
#### 2.1 Client
60+
On client Windows machine, it is recommended to run directly with full utilization of all cores:
61+
```cmd
62+
python ./chat.py
63+
```
64+
65+
#### 2.2 Server
66+
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
67+
68+
E.g. on Linux,
69+
```bash
70+
# set IPEX-LLM env variables
71+
source ipex-llm-init
72+
73+
# e.g. for a server with 48 cores per socket
74+
export OMP_NUM_THREADS=48
75+
numactl -C 0-47 -m 0 python ./chat.py
76+
```
77+
78+
#### 2.3 Sample Output
79+
#### [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
80+
```log
81+
Inference time: xxxx s
82+
-------------------- Input Image --------------------
83+
http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
84+
-------------------- Input Prompt --------------------
85+
What is in the image?
86+
-------------------- Chat Output --------------------
87+
The image features a young child holding a white teddy bear dressed in pink. The background includes some red flowers and what appears to be a stone wall.
88+
```
89+
90+
```log
91+
-------------------- Input Image --------------------
92+
http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
93+
-------------------- Input Prompt --------------------
94+
图片里有什么?
95+
-------------------- Stream Chat Output --------------------
96+
图片中有一个小女孩,她手里拿着一个穿着粉色裙子的白色小熊玩偶。背景中有红色花朵和石头结构,可能是一个花园或庭院。
97+
```
98+
99+
The sample input image is (which is fetched from [COCO dataset](https://cocodataset.org/#explore?id=264959)):
100+
101+
<a href="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg"><img width=400px src="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg" ></a>

0 commit comments

Comments
 (0)