Refactoring: add helper class to bind qnn tensor -> ggml tensor #2

chraac · 2024-06-17T04:17:18Z

Self Reported Review Complexity:
- Review Complexity : Low
- Review Complexity : Medium
- Review Complexity : High
I have read the contributing guidelines

As I said in your upstream PR, better to have a function for wrapping ggml_tensor into Qnn_Tensor_t.
So here i create a PR for it.

Run test on cpu backend, works well

Run on npu backend, also works well:

chraac · 2024-06-17T04:24:11Z

ggml-qnn.cpp

+                QNN_LOG_WARN("alloc rpcmem failure, %s\n", strerror(errno));
+                QNN_LOG_DEBUG("tensor%p name %s", _qnn_tensor, QNN_TENSOR_GET_NAME(*_qnn_tensor));
+                _context = nullptr;
+                // TODO: should we free the tensor here?


Should we free the _qnn_tensor here create by tensorCreateGraphTensor (line 1979)?

不需要。高通QNN SDK貌似没有提供类似的函数。看高通的文档，貌似SDK内部会管理这些内部资源。

您在这个PR里提到的问题我此前已经注意到了，暂时没有理解为啥会这样。高通QNN SDK的技术资料比较少，目前只有哪个SDK reference manual.

个人判断应该是漏了些同步操作，不过确实没啥信息

不太清楚，我做过各种实验。没有查到公开的有价值的参考资料。

从已有的公开资料来看，国内目前已经实现了高通NPU加速的公司有几家，其中发布了Open MiniCPM-V的面壁智能是其中一家。如果您是商业公司雇员，可以联系对方。如果是我这样的独立开发者，在没有与QTI签定NDA以及得到高通技术支持的情况下，试图完全做出来，难度可能不小：比如哪些出错信息代码，根本不知道具体是啥意思。

之前用过Qualcomm的GPU profiler，也是bug一堆，这种问题感觉得等他们自己修了，我们要workaround的话，会花掉很多无谓的时间

赞同您的观点。最好得到高通的技术支持。

zhouwg · 2024-06-17T13:38:33Z

thanks for your PR. we can discuss this problem in my personal learning&study project:https://github.com/zhouwg/kantv/tree/ggml-qnn-refine/core/ggml/llamacpp/tests/ggml-qnn.

不了解您是啥背景？类似我这样闲着没事干的独立个人开发者？还是AI相关公司雇员？如果您是公司雇员，可以联系面壁智能（他们应该已经做到了高通NPU的加速).

没想到您还对这个问题感兴趣。我现在对上游项目不感兴趣了，如您愿意，可以在我的学习&研究项目里研究ggml-qnn相关问题，可以用中文，这样更方便，最后将研究结果贡献给社区。不用费劲提交给上游了（事实上，自从哪个ggml-rpc.cpp相关的多个pr被合并到master分支后------很失望)。

chraac · 2024-06-17T14:29:06Z

thanks for your PR. we can discuss this problem in my personal learning&study project:https://github.com/zhouwg/kantv/tree/ggml-qnn-refine/core/ggml/llamacpp/tests/ggml-qnn.

不了解您是啥背景？类似我这样闲着没事干的独立个人开发者？还是AI相关公司雇员？如果您是公司雇员，可以联系面壁智能（他们应该已经做到了高通NPU的加速).

没想到您还对这个问题感兴趣。我现在对上游项目不感兴趣了，如您愿意，可以在我的学习&研究项目里研究ggml-qnn相关问题，可以用中文，这样更方便，最后将研究结果贡献给社区。不用费劲提交给上游了（事实上，自从哪个ggml-rpc.cpp相关的多个pr被合并到master分支后------我很失望，上游项目已经进入垃圾时间了，没啥核心改进)。

我个人对这个问题还是比较乐观，毕竟rpc那系列pr我也看过，并且日常我也用，和qnn这个，我认为也不冲突。
个人对llama.cpp添加qnn backend还是看好的，另外这个PR确实也大，所以review比较慢，还是请你别灰心，加油！要是有机会的话还是尽量merge到upstream，毕竟这样可以少很多事。

我做这个还是出于个人爱好，主要可以学习交流新东西，兴趣使然。所以也没打算和厂商联系，基于公开资料做做这样。

chraac · 2024-06-17T15:26:28Z

这个可能我还是说两句我的经验，我刚开始提PR也是这样，觉得PR的某些评论有点针对个人，后来打交道多了也就想开了，换个位置思考，其实大家都是基于兴趣爱好做点开源的东西，也不是工作，review的人可能也只是兴趣爱好，基于这个出发点，是不是大家合作就会好些。
PR review时间长的问题，我也经常遇到（我以前有过经验，一个大PR拖了几个月），这个可能确实无解，因为大家都是业余玩这个，时间也不固定，我一般会把我的思路写下来，然后每部分代码加点注释啥的，这样review也容易些。
PR优先级这个确实无解，有可能某些PR就是很重要，但是社区没有足够的有相关背景的人来review，这种情况确实存在，看起来您的PR就是处于类似的位置，所以对其他reviewer可能难度很大，这个还得请老哥别灰心。
一个社区确实是会有不同的声音的，确实会有部分人反对，也是一种声音，但是社区内部还是比较民主的，~~经过这些年，我开始觉得有个反对的声音挺重要的~~。
另外，老哥，不必妄自菲薄，你能业余时间做些开源贡献，其实已经比很多人厉害了；另外，每天对着代码对着comment，确实会容易有情绪，这种情况确实存在，也只能说尽量控制，不影响判断，也尽量不泄露到其他地方（这个真难，我也做不到，唉），这样反倒容易让不相干的人卷进来。
还有就是语言的问题，其实我们自己都玩llm，现在llm已经能很好的处理语言之间转换的问题了，其实用好了，可以减少很多语言障碍。

感谢老哥回复，你用业余时间做到这样，已经不错了，加油！

chraac · 2024-06-19T02:48:49Z

@zhouwg 老哥，有空麻烦看看PR，如果没啥问题能不能帮忙merge到你的分支去。
这个backend还是有人关注的，请不要放弃。
我也会抽点时间继续完善这个分支。
最后再次感谢老哥的努力！

chraac added 6 commits June 16, 2024 22:28

init the test array with const values

5e18cdc

add ggml_qnn_tensor_binder

6c68adc

use tensor wrapper in add

37bb926

use tensor wrapper in matmul

36e41a1

use ggml_qnn_tensor_reader for output tensor

a5679dd

use ggml_qnn_tensor_writer for all parameters

5fe7b87

github-actions bot added the testing label Jun 17, 2024

chraac mentioned this pull request Jun 17, 2024

ggml-qnn: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend ggerganov/llama.cpp#6869

Closed

4 tasks

chraac commented Jun 17, 2024

View reviewed changes

rename

9456bba

fix todo

65a14d9

chraac force-pushed the dev-function-to-map-tensor branch from 4d70039 to 65a14d9 Compare June 18, 2024 15:09

make the constant condition first

aeef0c6

chraac requested a review from zhouwg June 19, 2024 02:48

remove TODO

dfe159f

chraac force-pushed the dev-function-to-map-tensor branch from 7a77028 to dfe159f Compare June 19, 2024 03:16

chraac added 7 commits June 19, 2024 14:39

split logger function, tensors and backend from main qnn source

9932062

remove reference of g_qnn_mgr in qnn_instance

3c491a3

fix compiling error

3fe07eb

rename

37a1585

move qnn helper function into utility files

ff0359d

fix op handle checker

e1056da

split qnn ops into file

c9e99bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring: add helper class to bind qnn tensor -> ggml tensor #2

Refactoring: add helper class to bind qnn tensor -> ggml tensor #2

chraac commented Jun 17, 2024

chraac Jun 17, 2024

zhouwg Jun 17, 2024 •

edited

Loading

zhouwg Jun 17, 2024

chraac Jun 17, 2024

zhouwg Jun 17, 2024 •

edited

Loading

chraac Jun 17, 2024

zhouwg Jun 17, 2024 •

edited

Loading

zhouwg commented Jun 17, 2024 •

edited

Loading

chraac commented Jun 17, 2024

chraac commented Jun 17, 2024 •

edited

Loading

chraac commented Jun 19, 2024 •

edited

Loading

Refactoring: add helper class to bind qnn tensor -> ggml tensor #2

Are you sure you want to change the base?

Refactoring: add helper class to bind qnn tensor -> ggml tensor #2

Conversation

chraac commented Jun 17, 2024

chraac Jun 17, 2024

Choose a reason for hiding this comment

zhouwg Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

zhouwg Jun 17, 2024

Choose a reason for hiding this comment

chraac Jun 17, 2024

Choose a reason for hiding this comment

zhouwg Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

chraac Jun 17, 2024

Choose a reason for hiding this comment

zhouwg Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

zhouwg commented Jun 17, 2024 • edited Loading

chraac commented Jun 17, 2024

chraac commented Jun 17, 2024 • edited Loading

chraac commented Jun 19, 2024 • edited Loading

zhouwg Jun 17, 2024 •

edited

Loading

zhouwg Jun 17, 2024 •

edited

Loading

zhouwg Jun 17, 2024 •

edited

Loading

zhouwg commented Jun 17, 2024 •

edited

Loading

chraac commented Jun 17, 2024 •

edited

Loading

chraac commented Jun 19, 2024 •

edited

Loading