Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU][LNL] Run LLM inference on LNL NPU is very very slow #1563

Open
johnysh opened this issue Jan 16, 2025 · 3 comments
Open

[NPU][LNL] Run LLM inference on LNL NPU is very very slow #1563

johnysh opened this issue Jan 16, 2025 · 3 comments
Assignees
Labels

Comments

@johnysh
Copy link

johnysh commented Jan 16, 2025

[OS] Win11
[Platform]: Intel(R) Core(TM) Ultra 7 258V 2.20 GHz
[RAM]: 32GB
[NPU driver]: 32.0.100.3104
ENV:

https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html

pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0
pip install openvino==2024.6 openvino-tokenizers==2024.6 openvino-genai==2024.6

PIP LIST:
openvino 2024.6.0
openvino-genai 2024.6.0.0
openvino-telemetry 2024.5.0
openvino-tokenizers 2024.6.0.0
optimum 1.23.3
optimum-intel 1.19.0

Code:
https://github.com/openvinotoolkit/openvino.genai

CMD:
optimum-cli export openvino -m TheBloke/Llama-2-7B-Chat-GPTQ Llama-2-7B-Chat-GPTQ

python\benchmark_genai>python ./benchmark_genai.py -m Llama-2-7B-Chat-GPTQ -d NPU

Result:

Image

@Wan-Intel
Copy link

Could you please run the following command and share the result with us?
python\benchmark_genai>python ./benchmark_genai.py -m Llama-2-7B-Chat-GPTQ -d CPU

@johnysh
Copy link
Author

johnysh commented Jan 20, 2025

CPU Result:

Load time: 1260.00 ms
Generate time: 1786.82 ± 66.43 ms
Tokenization time: 0.48 ± 0.04 ms
Detokenization time: 0.43 ± 0.04 ms
TTFT: 288.04 ± 43.60 ms
TPOT: 78.86 ± 19.94 ms
Throughput : 12.68 ± 3.21 tokens/s

iGPU Result:

Load time: 77268.00 ms
Generate time: 917.43 ± 2.66 ms
Tokenization time: 0.54 ± 0.04 ms
Detokenization time: 0.56 ± 0.00 ms
TTFT: 52.42 ± 0.39 ms
TPOT: 45.47 ± 2.41 ms
Throughput : 21.99 ± 1.17 tokens/s

@Wan-Intel
Copy link

Thanks for sharing the information. I'll escalate the case to relevant team and we'll provide an update as soon as possible.

@Wan-Intel Wan-Intel added the PSE label Jan 23, 2025
@avitial avitial self-assigned this Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants