Call for tunning results on CLBlast to achieve faster llama.cpp prompt performance #1688
tangjinchuan
started this conversation in
General
Replies: 1 comment
-
Thanks for the information, I ran tuning and added the results for Adreno 640, which is now merged with CLBlast. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear all,
Please forgive me if this seems to be spam.
Since llama.cpp can run on non-cuda GPUs with the help of CLBlast, each GPU could have a different architecture that needs different parameters to achieve the best matrix multiplication. In many cases, CLBlast is a library that can achieve higher performance with GEMM after tunning. I am a volunteer to help this CLBlast project to find different tuning results for different GPUs, it would be very great if you could kindly run CLBlast tuner on your GPU (all GPUs are very welcome, the tutorial is given as follows ) and report the tunning results in zip file to CLBlast site here:
New tuning results · Issue #1 · CNugteren/CLBlast · GitHub
This could help to make CLBlast and llama.cpp faster!!!
Official manual on running the tuner (especially for Linux/MacOS users):
CLBlast/tuning.md at master · CNugteren/CLBlast · GitHub
My thread and file for running the tuner easily in Windows:
CNugteren/CLBlast#1 (comment)
Best wishes,
Jinchuan Tang
Beta Was this translation helpful? Give feedback.
All reactions