Skip to content

Commit 071c53e

Browse files
committed
Improve the quick_start.md
1 parent 7a82557 commit 071c53e

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

docs/source/getting-started/quick_start.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ You can use our official offline example script to run offline inference as foll
4040

4141
```bash
4242
cd examples/
43+
# Change the model path to your own model path
44+
export MODEL_PATH=/home/models/Qwen2.5-14B-Instruct
4345
python offline_inference.py
4446
```
4547

@@ -58,7 +60,11 @@ export PYTHONHASHSEED=123456
5860
Run the following command to start the vLLM server with the Qwen/Qwen2.5-14B-Instruct model:
5961

6062
```bash
61-
vllm serve /home/models/Qwen2.5-14B-Instruct \
63+
# Change the model path to your own model path
64+
export MODEL_PATH=/home/models/Qwen2.5-14B-Instruct
65+
vllm serve ${MODEL_PATH} \
66+
--trust-remote-code \
67+
--served-model-name vllm_cpu_offload \
6268
--max-model-len 20000 \
6369
--tensor-parallel-size 2 \
6470
--gpu_memory_utilization 0.87 \
@@ -95,7 +101,7 @@ After successfully started the vLLM server,You can interact with the API as fo
95101
curl http://localhost:7800/v1/completions \
96102
-H "Content-Type: application/json" \
97103
-d '{
98-
"model": "/home/models/Qwen2.5-14B-Instruct",
104+
"model": "vllm_cpu_offload",
99105
"prompt": "Shanghai is a",
100106
"max_tokens": 7,
101107
"temperature": 0

0 commit comments

Comments
 (0)