Skip to content

Latest commit

 

History

History
82 lines (64 loc) · 6.38 KB

AmlogicNPU.md

File metadata and controls

82 lines (64 loc) · 6.38 KB

We have added support for Amlogic NPUs (Neural Processing Unit) acceleration in version v3.9.0. You'll be amazed to see UltimateALPR running at up to 64fps (High Definition[HD/720p] resolution) on a $99 ARM device (Khadas VIM3). The engine can run at up to 90fps on low resolution images.

This guide will focus on how to use UltimateALPR on Kadas VIM3 but any SBC (Single Board Computer) with Amlogic NPU will work fine (e.g. Banana Pi).

Operating system

Your Khadas VIM3 will likely come with an Android 9 installed on the eMMC. Unfortunately that's a 32-bit Android OS and not suitable for high performance applications. You'll need to install a Linux AArch64 OS from Khadas website: https://docs.khadas.com/linux/firmware/Vim3UbuntuFirmware.html. We're using version 4.9 (https://doubango.org/khadas_images/VIM3_Ubuntu-server-focal_Linux-4.9_arm64_SD-USB_V1.0.9-211217.img.xz) but any version should work. Please note that the Mainline Kernel images do not support NPU, make sure to install the right Linux version (see above).

You don't need to override the Android OS from the eMMC, install the Linux OS on an external SD card. Your Khadas will choose the OS on the SD card at boot time. This is the healthiest way to test NPU acceleration on Linux without overwriting the OS on the eMMC. Once you're happy with the result you could install the Linux OS on the eMMC which is faster than the SD card (memory read/write). You just need to remove the SD card for the boot loader to choose Android (on the eMMC) again.

When I run uname -a on my device I see Linux Khadas 4.9.241 #22 SMP PREEMPT Fri Dec 17 17:34:50 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

We do not recommend upgrading your OS. More at https://groups.google.com/g/doubango-ai/c/Q8C6cZnObtU

Enabling NPU acceleration

To enable NPU acceleration:

  • you'll need to set the JSON configuration entry npu_enabled to true (by default it's already set to true). This could be done by using command param --npu_enabled true when using the recognizer or the benchmark application.
  • your hardware name must be listed in supported_hardware.txt (case insensitive). If that's not the case, then edit the file to add it. To find your hardware name, run cat /proc/cpuinfo | grep Hardware

Benchmarking

We'll run the benchmark sample application on Khadas VIM3 to see how fast UltimateALPR is on that device. We'll run the benchmark with and without NPU acceleration to see the boost.

Check list

  • make sure your device has enough power
  • make sure your CPU isn't throttling or overheating: cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
  • make sure that your CPU power management is Performance and not Powersave: cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  • make sure to unplug your device and let it coool down if your performance numbers aren't as good as what we're reporting here

Running on Khadas VIM3 Basic edition

The benchmark application is ran on Khadas VIM3 Basic edition (Linux 4.9) using a 720p (1280x720) image. This is a large image (1280x720), you can try with smaller image to see how fast the engine would be. Notice how fast the engine is when parallel mode is enabled. Please note that parallel mode isn't available on Python, you'll have to use C++, Java, C# or any other language.

To run the benchmark application with 0.2 positive rate (20% of the images will have plates) for 100 loops:

cd ulatimateALPR-SDK/binaries/linux/aarch64
chmod +x benchmark
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./benchmark \
    --positive ../../../assets/images/lic_us_1280x720.jpg \
    --negative ../../../assets/images/london_traffic.jpg \
    --assets ../../../assets \
    --npu_enabled true \
    --charset latin \
    --loops 100 \
    --rate 0.2 \
    --parallel true
  • Change --npu_enabled true to enable/disable NPU acceleration
  • Change --parallel true to enable/disable parallel mode. --parallel false to use sequential mode insteal of parallel mode.

Performance numbers

0.0 rate 0.2 rate 0.5 rate 0.7 rate 1.0 rate
Khadas VIM3 Basic
Linux 4.9, NPU, Parallel mode
1560 millis
64.08 fps
1797 millis
55.63 fps
1876 millis
53.29 fps
2162 millis
46.25 fps
2902 millis
34.45 fps
Khadas VIM3 Basic
Linux 4.9, NPU, Sequential mode
1776 millis
56.30 fps
3443 millis
29.04 fps
6009 millis
16.63 fps
7705 millis
12.97 fps
10275 millis
9.73 fps
Khadas VIM3 Basic
Linux 4.9, CPU, Parallel mode
4187 millis
23.88 fps
4414 millis
22.65 fps
4824 millis
20.72 fps
5189 millis
19.26 fps
5740 millis
17.42 fps
Khadas VIM3 Basic
Linux 4.9, CPU, Sequential mode
4184 millis
23.89 fps
5972 millis
16.74 fps
8513 millis
11.74 fps
10258 millis
9.74 fps
12867 millis
7.77 fps
  • When parallel mode is enabled we'll perform detection using the NPU and OCR using the CPU in parallel.
  • Notice how the parallel mode is 4 times faster than the sequential mode when rate=1.0 (all 100 images have plates).