Skip to content
Siva edited this page Jan 4, 2023 · 24 revisions

About

This page contains the benchmark results for several popular image classification models. We auto-tune all listed models on target platforms and benchmark the inference performance (time cost per image).

Content

ARM CPU

Note: If a board has big.LITTLE architecture, we will use all big cores. Otherwise, we will use all cores. In the following device specifications, we only list the cores being used.

Devices

  • Firefly-RK3399 : 2 x Cortex A72 1.8Ghz
  • Raspberry Pi 3B : 4 x Cortex A53 1.2Ghz
  • Huawei P20 Pro / Mate10 Pro (Soc: HiSilicon Kirin 970) : (4 x Cortex A73 2.36GHz)
  • Google Pixel 2 (Soc: Qualcomm Snapdragon 835) : (4 × Kyro 2.35 GHz)
  • PYNQ (2 x Cortex-A9 650MHz)

Results

  • dtype = float32, batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet mobilenet-v2 resnet-18 resnet-50 squeezenet-v1.0 squeezenet-v1.1 vgg-16 vgg-19
Raspberry Pi 3B 610.2 2074.2 121.8 104.8 320.0 726.0 185.1 94.0 1772.0 2119.8
Firefly RK3399 336.8 1304.4 77.9 64.8 158.6 403.2 94.3 48.2 903.5 1086.0
Huawei P20 Pro 179.7 444.7 41.3 33.4 77.4 232.5 51.4 26.0 486.3 729.4
Google Pixel2 161.0 434.8 39.6 29.3 66.0 181.1 47.3 23.0 397.1 485.0
Xilinx PYNQ 2887.0 9691.7 721.4 513.3 1231.7 3585.5 913.0 478.3 -1.0 -1.0

Mobile GPU

Devices

  • Mali-T860 MP4: On Firefly-RK3399. Its frequency is locked to 800MHz.

Results

  • dtype = float32, batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet mobilenet-v2 resnet-18 resnet-50 squeezenet-v1.0 squeezenet-v1.1 vgg-16 vgg-19
Mali-T860 410.6 784.7 79.5 77.7 127.3 354.7 111.0 62.5 673.2 792.1
  • dtype = float16 and batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet mobilenet-v2 resnet-18 resnet-50 squeezenet-v1.0 squeezenet-v1.1 vgg-16 vgg-19
Mali-T860 295.4 464.9 52.9 60.7 84.3 221.0 77.3 46.7 405.6 472.8

NVIDIA GPU

Devices

  • Jetson TX2: on Max-N mode 1.3GHz
  • GTX 1080 TI, GTX Titan X

Results

  • dtype = float32, batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet mobilenet-v2 resnet-18 resnet-50 vgg-16 vgg-19
GTX 1080 Ti 3.6 5.8 0.7 1.0 1.1 2.8 4.2 4.8
GTX TITAN X 5.8 9.9 1.0 1.6 1.6 4.3 6.3 7.4
Jetson TX2 26.8 45.7 5.2 8.8 9.6 26.2 58.2 68.8

AMD GPU

  • dtype = float32, batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet resnet-18 resnet-50 vgg-16 vgg-19
Vega FE 5.8 8.9 1.0 1.6 4.5 6.3 7.2

Adreno GPU

Devices

  • Snapdragon Gen 1 : Adreno 730

Results

  • batch_size = 1 (unit: ms)
Resnet 18 Resnet 34 Resnet 50 VGG-16 VGG-19 Densenet-121 Inception V3 MobilenetV1 Squeezenet-v1.0 Squeezenet-v1.1
FP32 9.56 15.37 18.25 54.20 108.71 27.33 39.54 3.82 6.89 3.24
FP16 6.94 11.94 13.77 34.58 41.23 11.93 30.13 2.72 4.75 2.52

Reproduce

See readme page https://github.com/dmlc/tvm/tree/master/apps/benchmark on how to get these numbers.

Clone this wiki locally