Skip to content

Latest commit

 

History

History

benchmark


This application is used to check everything is ok and running as fast as expected. The information about the maximum frame rate (237fps on Intel Xeon, 152fps on Jetson NX, 47fps on Snapdragon 855 and 12fps on Raspberry Pi 4) could be checked using this application. It's open source and doesn't require registration or license key.

More information about the benchmark rules at https://www.doubango.org/SDKs/anpr/docs/Benchmark.html.

Dependencies

The SDK is developed in C++11 and you'll need glibc 2.27+ on Linux and Microsoft Visual C++ 2015 Redistributable(x64) - 14.0.24123 (any later version is ok) on Windows. You most likely already have these dependencies on you machine as almost every program require it.

If you're planning to use OpenVINO, then you'll need Intel C++ Compiler Redistributable (choose newest). Please note that OpenVINO is packaged in the SDK as plugin and loaded (dlopen) at runtime. The engine will fail to load the plugin if Intel C++ Compiler Redistributable is missing on your machine but the program will work as expected with Tensorflow as fallback. We highly recommend using OpenVINO to speedup the inference time. See benchmark numbers with/without OpenVINO at https://www.doubango.org/SDKs/anpr/docs/Benchmark.html#core-i7-windows.

Debugging missing dependencies

To check if all dependencies are present:

GPGPU acceleration

  • On x86-64, GPGPU acceleration is disabled by default. Check here for more information on how to enable it.
  • We highly recommend enabling NVIDIA TensorRT (--trt_enabled true). Enabling TensorRT will disable OpenVINO.
  • On NVIDIA Jetson (AArch64), GPGPU acceleration is always enabled. Check here for more information.

Peformance numbers

These performance numbers are obtained using version 3.13 and parallel mode enabled. You can use any later version. Please notice the boost when OpenVINO is enabled on machines without GPU.

Some performance numbers on mid-range GPU (GTX 1070), high-range ARM CPU (Galaxy S10+), low-range ARM CPU (Raspberry Pi 4) devices using 720p (1280x720) images:

0.0 rate 0.2 rate 0.5 rate 0.7 rate 1.0 rate
AMD Ryzen 7 3700X 8-Core + RTX 3060
(Ubuntu 20, OpenVINO disabled, TensorRT enabled)
201 millis
497.40 fps
238 millis
419.71 fps
291 millis
343.12 fps
333 millis
299.41 fps
379 millis
263.36 fps
AMD Ryzen 7 3700X 8-Core + RTX 3060
(Ubuntu 20, OpenVINO enabled, TensorRT disabled)
615 millis
162.54 fps
679 millis
147.13 fps
740 millis
135.01 fps
773 millis
129.21 fps
809.18 millis
123.58 fps
AMD Ryzen 7 3700X 8-Core + RTX 3060
(Ubuntu 20, OpenVINO disabled, TensorRT disabled)
961 millis
103.97 fps
1047 millis
95.46 fps
1206 millis
82.90 fps
1325 millis
75.45 fps
1434.16 millis
69.72 fps
Intel® Xeon® E3 1230v5 + GTX 1070
(Ubuntu 18, OpenVINO enabled, TensorRT disabled)
737 millis
135.62 fps
809 millis
123.55 fps
903 millis
110.72 fps
968 millis
103.22 fps
1063 millis
94.07 fps
Intel® Xeon® E3 1230v5 + GTX 1070
(Ubuntu 18, OpenVINO disabled, TensorRT disabled)
711 millis
140.51 fps
828 millis
120.76 fps
1004 millis
99.53 fps
1127 millis
88.70 fps
1292 millis
77.38 fps
i7-4790K
(Windows 8, OpenVINO enabled)
758 millis
131.78 fps
1110 millis
90.07 fps
1597 millis
62.58 fps
1907 millis
52.42 fps
2399 millis
41.66 fps
i7-4790K
(Windows 8, OpenVINO disabled)
2427 millis
41.18 fps
2658 millis
37.60 fps
2999 millis
33.34 fps
3360 millis
29.75 fps
3607 millis
27.72 fps
i7-4770HQ
(Winows 10, OpenVINO enabled)
1094 millis
91.35 fps
1674 millis
59.71 fps
2456 millis
40.71 fps
2923 millis
34.21 fps
4255 millis
23.49 fps
i7-4770HQ
(Windows 10, OpenVINO disabled)
4129 millis
24.21 fps
4486 millis
22.28 fps
4916 millis
20.34 fps
5460 millis
18.31 fps
5740 millis
17.42 fps
Khadas VIM3 Basic
Linux 4.9, NPU, Parallel mode
1560 millis
64.08 fps
1797 millis
55.63 fps
1876 millis
53.29 fps
2162 millis
46.25 fps
2902 millis
34.45 fps
Khadas VIM3 Basic
Linux 4.9, NPU, Sequential mode
1776 millis
56.30 fps
3443 millis
29.04 fps
6009 millis
16.63 fps
7705 millis
12.97 fps
10275 millis
9.73 fps
Khadas VIM3 Basic
Linux 4.9, CPU, Parallel mode
4187 millis
23.88 fps
4414 millis
22.65 fps
4824 millis
20.72 fps
5189 millis
19.26 fps
5740 millis
17.42 fps
Khadas VIM3 Basic
Linux 4.9, CPU, Sequential mode
4184 millis
23.89 fps
5972 millis
16.74 fps
8513 millis
11.74 fps
10258 millis
9.74 fps
12867 millis
7.77 fps
RockPi 4B
(Ubuntu Server 18.04)
7588 millis
13.17 fps
8008 millis
12.48 fps
8606 millis
11.61 fps
9213 millis
10.85fps
9798 millis
10.20 fps
Raspberry Pi 4
(Raspbian Buster)
81890 millis
12.21 fps
89770 millis
11.13 fps
115190 millis
8.68 fps
122950 millis
8.13fps
141460 millis
7.06 fps
Jetson Xavier NX
(JetPack 5.1.0)
657 millis
152 fps
744 millis
134 fps
837 millis
119 fps
961 millis
104 fps
1068 millis
93 fps
Jetson Nano B01
(JetPack 4.4.1)
2920 millis
34 fps
3102 millis
32 fps
3274 millis
30 fps
3415 millis
29 fps
3727 millis
27 fps

Some notes:

  • The above numbers show that the best case is 'AMD Ryzen 7 3700X 8-Core + RTX 3060 + TensorRT enabled'. In such case the GPU (TensorRT, CUDA) will be used for all modules (detection, classification and OCR).
  • When TensorRT is disabled we still use the GPU via Tensorflow. Notice the huge difference between TensorRT and Tensorflow.
  • Please note that even if Raspberry Pi 4 has a 64-bit CPU Raspbian OS uses a 32-bit kernel which means we're loosing many SIMD optimizations.
  • On RockPi 4B the code is 5 times faster when parallel processing is enabled.
  • On NVIDIA Jetson the code is 3 times faster when parallel processing is enabled.
  • On Khadas VIM3 the code is almost 4 times faster when parallel processing is enabled.
  • On Android devices we have noticed that parallel processing can speedup the pipeline by up to 120% on some devices while on Raspberry Pi the gain is marginal.
  • Both i7 CPUs are 6yr+ old (2014) to make sure everyone can easily find them at the cheapest price possible.

Pre-built binaries

If you don't want to build this sample by yourself then, use the pre-built versions:

On Windows, the easiest way to try this sample is to navigate to binaries/windows/x86_64 and run binaries/windows/x86_64/benchmark.bat. You can edit these files to use your own images and configuration options.

Building

This sample contains a single C++ source file and is easy to build. The documentation about the C++ API is at https://www.doubango.org/SDKs/anpr/docs/cpp-api.html.

Windows

You'll need Visual Studio to build the code. The VS project is at benchmark.vcxproj. Open it.

  1. You will need to change the "Command Arguments" like the below image. Default value: --loops 100 --rate 0.2 --positive $(ProjectDir)..\..\..\assets\images\lic_us_1280x720.jpg --negative $(ProjectDir)..\..\..\assets\images\london_traffic.jpg --assets $(ProjectDir)..\..\..\assets --charset latin
  2. You will need to change the "Environment" variable like the below image. Default value: PATH=$(VCRedistPaths)%PATH%;$(ProjectDir)..\..\..\binaries\windows\x86_64

VC++ config

You're now ready to build and run the sample.

Generic GCC

Next command is a generic GCC command:

cd ultimateALPR-SDK/samples/c++/benchmark

g++ benchmark.cxx -O3 -I../../../c++ -L../../../binaries/<yourOS>/<yourArch> -lultimate_alpr-sdk -o benchmark
  • You've to change yourOS and yourArch with the correct values. For example, on Linux x86_64 they would be equal to linux and x86_64 respectively. On Linux aarch64 they would be linux and aarch64 respectively.
  • If you're cross compiling then, you'll have to change g++ with the correct triplet. For example, on Linux host for Android ARM64 target the triplet would be equal to aarch64-linux-android-g++.

Raspberry Pi (Raspbian OS)

To build the sample for Raspberry Pi you can either do it on the device itself or cross compile it on Windows, Linux or OSX machines. For more information on how to install the toolchain for cross compilation please check here.

cd ultimateALPR-SDK/samples/c++/benchmark

arm-linux-gnueabihf-g++ benchmark.cxx -O3 -I../../../c++ -L../../../binaries/raspbian/armv7l -lultimate_alpr-sdk -o benchmark
  • On Windows: replace arm-linux-gnueabihf-g++ with arm-linux-gnueabihf-g++.exe
  • If you're building on the device itself: replace arm-linux-gnueabihf-g++ with g++ to use the default GCC

Testing

After building the application you can test it on your local machine.

Usage

Benchmark is a command line application with the following usage:

benchmark \
      --positive <path-to-image-with-a-plate> \
      --negative <path-to-image-without-a-plate> \
      [--assets <path-to-assets-folder>] \
      [--charset <recognition-charset:latin/korean/chinese>] \
      [--num_threads <number of threads:[1, inf]>] \
      [--car_noplate_detect_enabled <whether-to-enable-detecting-cars-with-no-plate:true/false>] \
      [--ienv_enabled <whether-to-enable-IENV:true/false>] \
      [--openvino_enabled <whether-to-enable-OpenVINO:true/false>] \
      [--openvino_device <openvino-device-to-use>] \
      [--npu_enabled <whether-to-enable-NPU-acceleration:true/false>] \
      [--trt_enabled <whether-to-enable-TensorRT-acceleration:true/false>] \
      [--simd_enabled <whether-to-enable-SIMD-acceleration:true/false>] \
      [--klass_lpci_enabled <whether-to-enable-LPCI:true/false>] \
      [--klass_vcr_enabled <whether-to-enable-VCR:true/false>] \
      [--klass_vmmr_enabled <whether-to-enable-VMMR:true/false>] \
      [--klass_vbsr_enabled <whether-to-enable-VMMR:true/false>] \
      [--loops <number-of-times-to-run-the-loop:[1, inf]>] \
      [--rate <positive-rate:[0.0, 1.0]>] \
      [--parallel <whether-to-enable-parallel-mode:true/false>] \
      [--rectify <whether-to-enable-rectification-layer:true/false>] \
      [--tokenfile <path-to-license-token-file>] \
      [--tokendata <base64-license-token-data>]

Options surrounded with [] are optional.

The information about the maximum frame rate (140fps on GTX 1070, 47fps on Snapdragon 855 and 12fps on Raspberry Pi 4) is obtained using --rate 0.0 which means evaluating the negative (no license plate) image only. The minimum frame rate could be obtained using --rate 1.0 which means evaluating the positive image only (all images on the video stream have a license plate). In real life, very few frames from a video stream will contain a license plate (--rate < 0.01).

Examples

  • For example, on Raspberry Pi you may call the benchmark application using the following command:
LD_LIBRARY_PATH=../../../binaries/raspbian/armv7l:$LD_LIBRARY_PATH ./benchmark \
    --positive ../../../assets/images/lic_us_1280x720.jpg \
    --negative ../../../assets/images/london_traffic.jpg \
    --assets ../../../assets \
    --charset latin \
    --loops 100 \
    --rate 0.2 \
    --parallel true \
    --rectify false
  • On NVIDIA Jetson, you'll need to generate the models as explained here, put the device on maximum performance mode (sudo nvpmodel -m 2 && sudo jetson_clocks), then run:
LD_LIBRARY_PATH=../../../binaries/jetson/aarch64:$LD_LIBRARY_PATH ./benchmark \
    --positive ../../../assets/images/lic_us_1280x720.jpg \
    --negative ../../../assets/images/london_traffic.jpg \
    --assets ../../../assets \
    --charset latin \
    --loops 100 \
    --rate 0.2 \
    --parallel true \
    --rectify false
  • On Linux x86_64, you may use the next command:
LD_LIBRARY_PATH=../../../binaries/linux/x86_64:$LD_LIBRARY_PATH ./benchmark \
    --positive ../../../assets/images/lic_us_1280x720.jpg \
    --negative ../../../assets/images/london_traffic.jpg \
    --assets ../../../assets \
    --charset latin \
    --loops 100 \
    --rate 0.2 \
    --parallel true
  • On Linux aarch64, you may use the next command:
LD_LIBRARY_PATH=../../../binaries/linux/aarch64:$LD_LIBRARY_PATH ./benchmark \
    --positive ../../../assets/images/lic_us_1280x720.jpg \
    --negative ../../../assets/images/london_traffic.jpg \
    --assets ../../../assets \
    --charset latin \
    --loops 100 \
    --rate 0.2 \
    --parallel true
  • On Windows x86_64, you may use the next command:
benchmark.exe ^
    --positive ../../../assets/images/lic_us_1280x720.jpg ^
    --negative ../../../assets/images/london_traffic.jpg ^
    --assets ../../../assets ^
    --charset latin ^
    --loops 100 ^
    --rate 0.2 ^
    --parallel true

Please note that if you're cross compiling the application then you've to make sure to copy the application and both the assets and binaries folders to the target device.