[question] SA8295 and 8775 fallback to `default TFLite GPU backend`. #4

ecccccsgo · 2024-11-18T08:35:30Z

Hi,

you are doing a great thing to make the whisper available on android in the future.

Recently, I hope to deployed the llm on SA8295 and 8775 NPU 🌵, but there few tutorial about this, and I found your great work deploy whisper in NPU with tflite Delegate, I think it will help me understand more about this.

first i test it on the automotive chipset SA8295 and 8775, and get the fallback to GPU:

Compiled Nov 18 202407:38:18
AXIE TFLite Runner
Argmax, Inc.

lib dir: /data/local/tmp/lib
cache dir: /data/local/tmp/cache
Creating cache directory: /data/local/tmp/cache
INFO: Initialized TensorFlow Lite runtime.
INFO: [QNN Delegate] Caching: cache_filename of /data/local/tmp/cache/qnn_binary_8316569388647923909.bin isn't available, caching in SAVE MODE.
ERROR: [QNN Delegate] Failed to create device_handle for Backend ID 6, error=14001
ERROR: Restored original execution plan after delegate application failure.
INFO: [QNN Delegate] Caching: cache_filename of /data/local/tmp/cache/qnn_binary_1695503246574442769.bin isn't available, caching in SAVE MODE.
ERROR: [QNN Delegate] Failed to create device_handle for Backend ID 6, error=14001
ERROR: Restored original execution plan after delegate application failure.
INFO: [QNN Delegate] Caching: cache_filename of /data/local/tmp/cache/qnn_binary_279480402058969599.bin isn't available, caching in SAVE MODE.
ERROR: [QNN Delegate] Failed to create device_handle for Backend ID 6, error=14001
ERROR: Restored original execution plan after delegate application failure.
INFO: [QNN Delegate] Caching: cache_filename of /data/local/tmp/cache/qnn_binary_18177980569190481050.bin isn't available, caching in SAVE MODE.
ERROR: [QNN Delegate] Failed to create device_handle for Backend ID 6, error=14001
ERROR: Restored original execution plan after delegate application failure.
postproc vocab size: 51864
INFO: Created TensorFlow Lite delegate for GPU.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
VERBOSE: Replacing 27 out of 42 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 12 partitions for the whole graph.
ERROR: Following operations are not supported by GPU delegate:
DELEGATE TfLiteXNNPackDelegate: Operation is not supported.
LOG_SOFTMAX: Operation is not supported.
SCATTER_ND: Operation is not supported.
ZEROS_LIKE: Operation is not supported.
3 operations will run on the GPU, and the remaining 18 operations will run on the CPU.
VERBOSE: Replacing 3 out of 21 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 3 partitions for the whole graph.
ERROR: TfLiteGpuDelegate Init: Tensor "model/tf.math.reduce_logsumexp/ReduceLogSumExp/Log" has bad input dims size: 0.
INFO: Created 0 GPU delegate kernels.
ERROR: TfLiteGpuDelegate Prepare: delegate is not initialized
ERROR: Node number 48 (TfLiteGpuDelegateV2) failed to prepare.
ERROR: Restored original execution plan after delegate application failure.
Falling back to default TFLite GPU backend..tflite_init done..
Text Out:
<|0|> I 'm not ... <|2|> <|2|> I 'm not ... <|4|> <|6|> I 'm not ... <|8|> <|10|> I 'm not ... <|12|> <|15|> I 'm not ... <|17|> 

Text Out:
<|18.29|> I love you , baby . <|22.29|> <|22.29|> I love you , baby . <|26.29|> <|26.29|> I love you , baby . <|30.29|>

could you help me to find out the reason?

i ran with whisper_tiny, the result seems not good, i hope to know that in your testing, if the small model always repeat the words and performs bad?
BTW, I notice that the development of whisper-IOS are more activate? I wander if the situations of developing Android are more complicated?

looking forward to your reply, thank you~

The text was updated successfully, but these errors were encountered:

keith4ever · 2024-11-18T18:08:26Z

Hi @ecccccsgo ,

Thanks a lot for trying with out whisperkit Android!
It's very interesting that you're trying with that platform with SA8295/8775 - Is it an Automotive development platform?
Are you trying to run in Linux or Android CLI? Obviously we haven't tried with that platform, but regardless it seems to run (after delegation fallbacks) without crashing, which is a great start.

Please note that we only have tested on 4 QCOM SoCs: SM8650, 8550, 8450, and 8350.
If the SoCs is old so that it doesn't support its NPU(HTP), then it tries to fallback on generic TFLite GPU delegation. Ultimately, it falls back to generic (XNNPACK) CPU, and I'm assuming this is what's happening in your system.
Therefore it's great that it could run without crashing, but the result doesn't seem right.
Yes, obviously it seems hallucinating and repeating itself when things are not right..
The first I can do: do you mind posting the .wav audio file you're trying with? I am guessing it may not have the required format of audio (which should be mono, 16Khz, S16LE PCM).
You're correct that our whisper-ios is way ahead and more mature. The reason is that it has different HW platform with Apple SoCs and SW support.
Unfortunately, we are unable to carry most of its maturity into Android eco system. We do have verified pre/post processing algos and whisper models, but currently we only use part of it, or even the models are in different format (Tensorflow Lite). Android AI eco system has various limits and it lacks maturity at the moment. But It doesn't mean Android is specifically more complicated. It just started later, and as you know, it has challenges from fragmented HW/SW parts.
But we're rapidly developing more and making it more advanced, so it'll be great if you can keep on checking with our progress.

ecccccsgo · 2024-11-19T06:52:02Z

Hi @keith4ever, thank you for your reply. SA8295/8775 are the Automotive development platform, and I found the whisper with NPU supporting these chipsets in here: https://aihub.qualcomm.com/models/whisper_base_en?domain=Audio&chipsets=SA8295P . and I found your repo with TFlite delegate way to realize running on the GPU. :)

for the first one, I will check it out the NPU support things.

for the second one, I input the .wav format, but I am not sure the mono, 16Khz, S16LE PCM things.

for the last one, I was confused that deploy models on NPU things in QCOM chipsets recently... and have few tutorials :（

keith4ever · 2024-11-19T20:50:35Z

Thanks for your info.
Yes, if NPU of the SoC isn't supported by any reason, then it'll fall back on TFLite GPU (which might be powered by OpenCL kernel).

If you can't share your .wav file, please try to convert it to the supported format with ffmpeg command:
ffmpeg -i input.wav -f s16le -ac 1 -ar 16000 output.wav
and try again with the output.wav. I'm almos sure that is the root cause.

We'll support all major audio codecs (.m4a, .mp3, .flac, ...) in the next release, let alone other sample rate/channel/PCM wav formats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] SA8295 and 8775 fallback to `default TFLite GPU backend`. #4

[question] SA8295 and 8775 fallback to `default TFLite GPU backend`. #4

ecccccsgo commented Nov 18, 2024

keith4ever commented Nov 18, 2024

ecccccsgo commented Nov 19, 2024

keith4ever commented Nov 19, 2024

[question] SA8295 and 8775 fallback to default TFLite GPU backend. #4

[question] SA8295 and 8775 fallback to default TFLite GPU backend. #4

Comments

ecccccsgo commented Nov 18, 2024

keith4ever commented Nov 18, 2024

ecccccsgo commented Nov 19, 2024

keith4ever commented Nov 19, 2024

[question] SA8295 and 8775 fallback to `default TFLite GPU backend`. #4

[question] SA8295 and 8775 fallback to `default TFLite GPU backend`. #4