You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/articles_en/get-started/learn-openvino/openvino-samples/benchmark-tool.rst
+88-91Lines changed: 88 additions & 91 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,12 +31,12 @@ Basic Usage
31
31
Before running ``benchmark_app``, make sure the ``openvino_env`` virtual
32
32
environment is activated, and navigate to the directory where your model is located.
33
33
34
-
The benchmarking application works with models in the OpenVINO IR
34
+
The benchmark application works with models in the OpenVINO IR
35
35
(``model.xml`` and ``model.bin``) and ONNX (``model.onnx``) formats.
36
36
Make sure to :doc:`convert your models <../../../openvino-workflow/model-preparation/convert-model-to-ir>`
37
37
if necessary.
38
38
39
-
To run benchmarking with default options on a model, use the following command:
39
+
To run a benchmark with default options on a model, use the following command:
40
40
41
41
.. code-block:: sh
42
42
@@ -57,54 +57,47 @@ Basic Usage
57
57
:doc:`Benchmark Python Tool <benchmark-tool>` is available,
58
58
and you should follow the usage instructions on that page instead.
59
59
60
-
The benchmarking application works with models in the OpenVINO IR, TensorFlow,
60
+
The benchmark application works with models in the OpenVINO IR, TensorFlow,
61
61
TensorFlow Lite, PaddlePaddle, PyTorch and ONNX formats. If you need it,
62
62
OpenVINO also allows you to :doc:`convert your models <../../../openvino-workflow/model-preparation/convert-model-to-ir>`.
63
63
64
-
To run benchmarking with default options on a model, use the following command:
64
+
To run a benchmark with default options on a model, use the following command:
65
65
66
66
.. code-block:: sh
67
67
68
68
./benchmark_app -m model.xml
69
69
70
70
71
-
By default, the application will load the specified model onto the CPU and perform
72
-
inference on batches of randomly-generated data inputs for 60 seconds. As it loads,
73
-
it prints information about the benchmark parameters. When benchmarking is completed,
74
-
it reports the minimum, average, and maximum inference latency and the average throughput.
71
+
By default, the application loads the specified model and performs
72
+
inference on batches of randomly-generated data inputs on CPU for 60 seconds.
73
+
It displays information about the benchmark parameters as it loads the model.
74
+
When the benchmark is completed, it reports the minimum, average, and maximum inference
75
+
latency and the average throughput.
75
76
76
77
You may be able to improve benchmark results beyond the default configuration by
77
78
configuring some of the execution parameters for your model. For example, you can
78
79
use "throughput" or "latency" performance hints to optimize the runtime for higher
79
80
FPS or reduced inference time. Read on to learn more about the configuration
80
-
options available with ``benchmark_app``.
81
-
82
-
83
-
84
-
81
+
options available for ``benchmark_app``.
85
82
86
83
Configuration Options
87
84
#####################
88
85
89
-
The benchmark app provides various options for configuring execution parameters.
90
-
This section covers key configuration options for easily tuning benchmarking to
91
-
achieve better performance on your device. A list of all configuration options
86
+
You can easily configure and fine-tune benchmarks with various execution parameters,
87
+
for example to achieve better performance on your device. The list of all configuration options
92
88
is given in the :ref:`Advanced Usage <advanced-usage-benchmark>` section.
93
89
94
90
Performance hints: latency and throughput
95
91
+++++++++++++++++++++++++++++++++++++++++
96
92
97
-
The benchmark app allows users to provide high-level "performance hints" for
98
-
setting latency-focused or throughput-focused inference modes. This hint causes
99
-
the runtime to automatically adjust runtime parameters, such as the number of
100
-
processing streams and inference batch size, to prioritize for reduced latency
101
-
or high throughput.
93
+
With high-level "performance hints", which automatically adjust parameters such as the
94
+
number of processing streams and inference batch size, you can aim for low-latency
95
+
or high-throughput inference.
102
96
103
97
The performance hints do not require any device-specific settings and they are
104
-
completely portable between devices. Parameters are automatically configured
105
-
based on whichever device is being used. This allows users to easily port
106
-
applications between hardware targets without having to re-determine the best
107
-
runtime parameters for the new device.
98
+
completely portable between devices. The parameters are automatically configured
99
+
based on the device. Therefore, you can easily port applications between hardware targets
100
+
without having to re-determine the best runtime parameters for a new device.
108
101
109
102
If not specified, throughput is used as the default. To set the hint explicitly,
110
103
use ``-hint latency`` or ``-hint throughput`` when running ``benchmark_app``:
@@ -129,8 +122,12 @@ use ``-hint latency`` or ``-hint throughput`` when running ``benchmark_app``:
129
122
130
123
.. note::
131
124
132
-
It is up to the user to ensure the environment on which the benchmark is running is optimized for maximum performance. Otherwise, different results may occur when using the application in different environment settings (such as power optimization settings, processor overclocking, thermal throttling).
133
-
When you specify single options multiple times, only the last value will be used. For example, the ``-m`` flag:
125
+
Make sure the environment is optimized for maximum performance when benchmark is running.
126
+
Otherwise, different environment settings, such as power optimization settings, processor
127
+
overclocking, or thermal throttling may give different results.
128
+
129
+
When you specify single options multiple times, only the last value will be used.
130
+
For example, the ``-m`` flag:
134
131
135
132
.. tab-set::
136
133
@@ -154,9 +151,9 @@ Latency
154
151
--------------------
155
152
156
153
Latency is the amount of time it takes to process a single inference request.
157
-
In applications where data needs to be inferenced and acted on as quickly as
158
-
possible (such as autonomous driving), low latency is desirable. For conventional
159
-
devices, lower latency is achieved by reducing the amount of parallel processing
154
+
Low latency is useful in applications where data needs to be inferred and acted on
155
+
as quickly as possible (such as autonomous driving). For conventional
156
+
devices, low latency is achieved by reducing the amount of parallel processing
160
157
streams so the system can utilize as many resources as possible to quickly calculate
161
158
each inference request. However, advanced devices like multi-socket CPUs and modern
162
159
GPUs are capable of running multiple inference requests while delivering the same latency.
@@ -169,10 +166,10 @@ processing streams and inference batch size to achieve the best latency.
169
166
Throughput
170
167
--------------------
171
168
172
-
Throughput is the amount of data an inference pipeline can process at once, and
173
-
it is usually measured in frames per second (FPS) or inferences per second. In
174
-
applications where large amounts of data needs to be inferenced simultaneously
175
-
(such as multi-camera video streams), high throughput is needed. To achieve high
169
+
Throughput is the amount of data processed by an inference pipeline at a time.
170
+
It is usually measured in frames per second (FPS) or inferences per second. High
171
+
throughput is beneficial for applications where large amounts of data needs to be
172
+
inferred simultaneously (such as multi-camera video streams). To achieve high
176
173
throughput, the runtime focuses on fully saturating the device with enough data
177
174
to process. It utilizes as much memory and as many parallel streams as possible
178
175
to maximize the amount of data that can be processed simultaneously.
@@ -191,13 +188,14 @@ determined using performance hints, see
191
188
Device
192
189
++++++++++++++++++++
193
190
194
-
To set which device benchmarking runs on, use the ``-d <device>`` argument. This
195
-
will tell ``benchmark_app`` to run benchmarking on that specific device. The benchmark
196
-
app supports CPU and GPU devices. In order to use GPU, the system
197
-
must have the appropriate drivers installed. If no device is specified, ``benchmark_app``
198
-
will default to using ``CPU``.
191
+
The benchmark app supports CPU and GPU devices. To run a benchmark on a chosen device,
192
+
set the ``-d <device>`` argument. When run with default parameters, ``benchmark_app``
193
+
creates 4 and 16 inference requests for CPU and GPU respectively.
194
+
195
+
In order to use GPU, the system must have the appropriate drivers installed. If no
196
+
device is specified, ``benchmark_app`` will use ``CPU`` by default.
199
197
200
-
For example, to run benchmarking on GPU, use:
198
+
For example, to run a benchmark on GPU, use:
201
199
202
200
.. tab-set::
203
201
@@ -216,16 +214,17 @@ For example, to run benchmarking on GPU, use:
216
214
./benchmark_app -m model.xml -d GPU
217
215
218
216
219
-
You may also specify ``AUTO`` as the device, in which case the ``benchmark_app`` will
220
-
automatically select the best device for benchmarking and support it with the
221
-
CPU at the model loading stage. This may result in increased performance, thus,
222
-
should be used purposefully. For more information, see the
217
+
You may also specify ``AUTO`` as the device, to let ``benchmark_app``
218
+
automatically select the best device for benchmarking and support it with
219
+
CPU when loading the model. You can use ``AUTO`` when you aim for better performance.
0 commit comments