Call for tunning results on CLBlast to achieve faster llama.cpp prompt performance #1688

tangjinchuan · 2023-06-04T08:13:29Z

tangjinchuan
Jun 4, 2023

Dear all,

Please forgive me if this seems to be spam.

Since llama.cpp can run on non-cuda GPUs with the help of CLBlast, each GPU could have a different architecture that needs different parameters to achieve the best matrix multiplication. In many cases, CLBlast is a library that can achieve higher performance with GEMM after tunning. I am a volunteer to help this CLBlast project to find different tuning results for different GPUs, it would be very great if you could kindly run CLBlast tuner on your GPU (all GPUs are very welcome, the tutorial is given as follows ) and report the tunning results in zip file to CLBlast site here:

New tuning results · Issue #1 · CNugteren/CLBlast · GitHub

This could help to make CLBlast and llama.cpp faster!!!

Official manual on running the tuner (especially for Linux/MacOS users):

CLBlast/tuning.md at master · CNugteren/CLBlast · GitHub

My thread and file for running the tuner easily in Windows:

CNugteren/CLBlast#1 (comment)

Best wishes,

Jinchuan Tang

ghost · 2023-06-29T13:38:57Z

ghost
Jun 29, 2023

Thanks for the information, I ran tuning and added the results for Adreno 640, which is now merged with CLBlast.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call for tunning results on CLBlast to achieve faster llama.cpp prompt performance #1688

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Call for tunning results on CLBlast to achieve faster llama.cpp prompt performance #1688

tangjinchuan Jun 4, 2023

Replies: 1 comment

ghost Jun 29, 2023

tangjinchuan
Jun 4, 2023

ghost
Jun 29, 2023