Skip to content

Commit 61b6da8

Browse files
[DOCS] preparing 2025.0 pass 4 (#28657)
1 parent aa274e4 commit 61b6da8

File tree

2 files changed

+96
-91
lines changed

2 files changed

+96
-91
lines changed

docs/articles_en/get-started/learn-openvino/openvino-samples/benchmark-tool.rst

Lines changed: 88 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@ Basic Usage
3131
Before running ``benchmark_app``, make sure the ``openvino_env`` virtual
3232
environment is activated, and navigate to the directory where your model is located.
3333

34-
The benchmarking application works with models in the OpenVINO IR
34+
The benchmark application works with models in the OpenVINO IR
3535
(``model.xml`` and ``model.bin``) and ONNX (``model.onnx``) formats.
3636
Make sure to :doc:`convert your models <../../../openvino-workflow/model-preparation/convert-model-to-ir>`
3737
if necessary.
3838

39-
To run benchmarking with default options on a model, use the following command:
39+
To run a benchmark with default options on a model, use the following command:
4040

4141
.. code-block:: sh
4242
@@ -57,54 +57,47 @@ Basic Usage
5757
:doc:`Benchmark Python Tool <benchmark-tool>` is available,
5858
and you should follow the usage instructions on that page instead.
5959

60-
The benchmarking application works with models in the OpenVINO IR, TensorFlow,
60+
The benchmark application works with models in the OpenVINO IR, TensorFlow,
6161
TensorFlow Lite, PaddlePaddle, PyTorch and ONNX formats. If you need it,
6262
OpenVINO also allows you to :doc:`convert your models <../../../openvino-workflow/model-preparation/convert-model-to-ir>`.
6363

64-
To run benchmarking with default options on a model, use the following command:
64+
To run a benchmark with default options on a model, use the following command:
6565

6666
.. code-block:: sh
6767
6868
./benchmark_app -m model.xml
6969
7070
71-
By default, the application will load the specified model onto the CPU and perform
72-
inference on batches of randomly-generated data inputs for 60 seconds. As it loads,
73-
it prints information about the benchmark parameters. When benchmarking is completed,
74-
it reports the minimum, average, and maximum inference latency and the average throughput.
71+
By default, the application loads the specified model and performs
72+
inference on batches of randomly-generated data inputs on CPU for 60 seconds.
73+
It displays information about the benchmark parameters as it loads the model.
74+
When the benchmark is completed, it reports the minimum, average, and maximum inference
75+
latency and the average throughput.
7576

7677
You may be able to improve benchmark results beyond the default configuration by
7778
configuring some of the execution parameters for your model. For example, you can
7879
use "throughput" or "latency" performance hints to optimize the runtime for higher
7980
FPS or reduced inference time. Read on to learn more about the configuration
80-
options available with ``benchmark_app``.
81-
82-
83-
84-
81+
options available for ``benchmark_app``.
8582

8683
Configuration Options
8784
#####################
8885

89-
The benchmark app provides various options for configuring execution parameters.
90-
This section covers key configuration options for easily tuning benchmarking to
91-
achieve better performance on your device. A list of all configuration options
86+
You can easily configure and fine-tune benchmarks with various execution parameters,
87+
for example to achieve better performance on your device. The list of all configuration options
9288
is given in the :ref:`Advanced Usage <advanced-usage-benchmark>` section.
9389

9490
Performance hints: latency and throughput
9591
+++++++++++++++++++++++++++++++++++++++++
9692

97-
The benchmark app allows users to provide high-level "performance hints" for
98-
setting latency-focused or throughput-focused inference modes. This hint causes
99-
the runtime to automatically adjust runtime parameters, such as the number of
100-
processing streams and inference batch size, to prioritize for reduced latency
101-
or high throughput.
93+
With high-level "performance hints", which automatically adjust parameters such as the
94+
number of processing streams and inference batch size, you can aim for low-latency
95+
or high-throughput inference.
10296

10397
The performance hints do not require any device-specific settings and they are
104-
completely portable between devices. Parameters are automatically configured
105-
based on whichever device is being used. This allows users to easily port
106-
applications between hardware targets without having to re-determine the best
107-
runtime parameters for the new device.
98+
completely portable between devices. The parameters are automatically configured
99+
based on the device. Therefore, you can easily port applications between hardware targets
100+
without having to re-determine the best runtime parameters for a new device.
108101

109102
If not specified, throughput is used as the default. To set the hint explicitly,
110103
use ``-hint latency`` or ``-hint throughput`` when running ``benchmark_app``:
@@ -129,8 +122,12 @@ use ``-hint latency`` or ``-hint throughput`` when running ``benchmark_app``:
129122
130123
.. note::
131124

132-
It is up to the user to ensure the environment on which the benchmark is running is optimized for maximum performance. Otherwise, different results may occur when using the application in different environment settings (such as power optimization settings, processor overclocking, thermal throttling).
133-
When you specify single options multiple times, only the last value will be used. For example, the ``-m`` flag:
125+
Make sure the environment is optimized for maximum performance when benchmark is running.
126+
Otherwise, different environment settings, such as power optimization settings, processor
127+
overclocking, or thermal throttling may give different results.
128+
129+
When you specify single options multiple times, only the last value will be used.
130+
For example, the ``-m`` flag:
134131

135132
.. tab-set::
136133

@@ -154,9 +151,9 @@ Latency
154151
--------------------
155152

156153
Latency is the amount of time it takes to process a single inference request.
157-
In applications where data needs to be inferenced and acted on as quickly as
158-
possible (such as autonomous driving), low latency is desirable. For conventional
159-
devices, lower latency is achieved by reducing the amount of parallel processing
154+
Low latency is useful in applications where data needs to be inferred and acted on
155+
as quickly as possible (such as autonomous driving). For conventional
156+
devices, low latency is achieved by reducing the amount of parallel processing
160157
streams so the system can utilize as many resources as possible to quickly calculate
161158
each inference request. However, advanced devices like multi-socket CPUs and modern
162159
GPUs are capable of running multiple inference requests while delivering the same latency.
@@ -169,10 +166,10 @@ processing streams and inference batch size to achieve the best latency.
169166
Throughput
170167
--------------------
171168

172-
Throughput is the amount of data an inference pipeline can process at once, and
173-
it is usually measured in frames per second (FPS) or inferences per second. In
174-
applications where large amounts of data needs to be inferenced simultaneously
175-
(such as multi-camera video streams), high throughput is needed. To achieve high
169+
Throughput is the amount of data processed by an inference pipeline at a time.
170+
It is usually measured in frames per second (FPS) or inferences per second. High
171+
throughput is beneficial for applications where large amounts of data needs to be
172+
inferred simultaneously (such as multi-camera video streams). To achieve high
176173
throughput, the runtime focuses on fully saturating the device with enough data
177174
to process. It utilizes as much memory and as many parallel streams as possible
178175
to maximize the amount of data that can be processed simultaneously.
@@ -191,13 +188,14 @@ determined using performance hints, see
191188
Device
192189
++++++++++++++++++++
193190

194-
To set which device benchmarking runs on, use the ``-d <device>`` argument. This
195-
will tell ``benchmark_app`` to run benchmarking on that specific device. The benchmark
196-
app supports CPU and GPU devices. In order to use GPU, the system
197-
must have the appropriate drivers installed. If no device is specified, ``benchmark_app``
198-
will default to using ``CPU``.
191+
The benchmark app supports CPU and GPU devices. To run a benchmark on a chosen device,
192+
set the ``-d <device>`` argument. When run with default parameters, ``benchmark_app``
193+
creates 4 and 16 inference requests for CPU and GPU respectively.
194+
195+
In order to use GPU, the system must have the appropriate drivers installed. If no
196+
device is specified, ``benchmark_app`` will use ``CPU`` by default.
199197

200-
For example, to run benchmarking on GPU, use:
198+
For example, to run a benchmark on GPU, use:
201199

202200
.. tab-set::
203201

@@ -216,16 +214,17 @@ For example, to run benchmarking on GPU, use:
216214
./benchmark_app -m model.xml -d GPU
217215
218216
219-
You may also specify ``AUTO`` as the device, in which case the ``benchmark_app`` will
220-
automatically select the best device for benchmarking and support it with the
221-
CPU at the model loading stage. This may result in increased performance, thus,
222-
should be used purposefully. For more information, see the
217+
You may also specify ``AUTO`` as the device, to let ``benchmark_app``
218+
automatically select the best device for benchmarking and support it with
219+
CPU when loading the model. You can use ``AUTO`` when you aim for better performance.
220+
For more information, see the
223221
:doc:`Automatic device selection <../../../openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection>` page.
224222

225223
.. note::
226224

227225
* If either the latency or throughput hint is set, it will automatically configure streams,
228-
batch sizes, and the number of parallel infer requests for optimal performance, based on the specified device.
226+
batch sizes, and the number of parallel infer requests for optimal performance,
227+
based on the specified device.
229228

230229
* Optionally, you can specify the number of parallel infer requests with the ``-nireq``
231230
option. Setting a high value may improve throughput at the expense
@@ -234,13 +233,13 @@ should be used purposefully. For more information, see the
234233
Number of iterations
235234
++++++++++++++++++++
236235

237-
By default, the benchmarking app will run for a predefined duration, repeatedly
236+
By default, the benchmark app will run for a predefined duration, repeatedly
238237
performing inference with the model and measuring the resulting inference speed.
239238
There are several options for setting the number of inference iterations:
240239

241240
* Explicitly specify the number of iterations the model runs, using the
242241
``-niter <number_of_iterations>`` option.
243-
* Set how much time the app runs for, using the ``-t <seconds>`` option.
242+
* Set the ``-t <seconds>`` option to run the app for a specified amount of time.
244243
* Set both of them (execution will continue until both conditions are met).
245244
* If neither ``-niter`` nor ``-t`` are specified, the app will run for a
246245
predefined duration that depends on the device.
@@ -251,17 +250,18 @@ average latency and throughput.
251250
Maximum inference rate
252251
++++++++++++++++++++++
253252

254-
By default, the benchmarking app will run inference at maximum rate based on device capabilities.
255-
The maximum inferance rate can be configured by ``-max_irate <MAXIMUM_INFERENCE_RATE>`` option.
256-
Tweaking this value allow better accuracy in power usage measurement by limiting the number of executions.
253+
By default, the benchmark app will run inference at maximum rate based on the device capabilities.
254+
The maximum inference rate can be configured by the ``-max_irate <MAXIMUM_INFERENCE_RATE>`` option.
255+
Modifying this parameter by limiting the number of executions, may result in
256+
better accuracy and reduction in power consumption.
257+
257258

258259
Inputs
259260
++++++++++++++++++++
260261

261-
The benchmark tool runs benchmarking on user-provided input images in
262+
The tool runs benchmarks on user-provided input images in
262263
``.jpg``, ``.bmp``, or ``.png`` formats. Use ``-i <PATH_TO_INPUT>`` to specify
263-
the path to an image or a folder of images. For example, to run benchmarking on
264-
an image named ``test1.jpg``, use:
264+
the path to an image or a folder of images:
265265

266266
.. tab-set::
267267

@@ -280,15 +280,15 @@ an image named ``test1.jpg``, use:
280280
./benchmark_app -m model.xml -i test1.jpg
281281
282282
283-
The tool will repeatedly loop through the provided inputs and run inference on
284-
them for the specified amount of time or a number of iterations. If the ``-i``
283+
The tool will repeatedly loop through the provided inputs and run inference
284+
for the specified amount of time or the number of iterations. If the ``-i``
285285
flag is not used, the tool will automatically generate random data to fit the
286286
input shape of the model.
287287

288288
Examples
289289
++++++++++++++++++++
290290

291-
For more usage examples (and step-by-step instructions on how to set up a model for benchmarking),
291+
For more usage examples and step-by-step instructions,
292292
see the :ref:`Examples of Running the Tool <examples-of-running-the-tool-python>` section.
293293

294294
.. _advanced-usage-benchmark:
@@ -301,10 +301,9 @@ Advanced Usage
301301
By default, OpenVINO samples, tools and demos expect input with BGR channels
302302
order. If you trained your model to work with RGB order, you need to manually
303303
rearrange the default channel order in the sample or demo application or reconvert
304-
your model using model conversion API with ``reverse_input_channels`` argument
305-
specified. For more information about the argument, refer to When to Reverse
306-
Input Channels section of Converting a Model to Intermediate Representation (IR).
307-
304+
your model.
305+
For more information, refer to the **Color Conversion** section of
306+
:doc:`Preprocessing API <../../../openvino-workflow/running-inference/optimize-inference/optimize-preprocessing/preprocessing-api-details>`.
308307

309308
Per-layer performance and logging
310309
+++++++++++++++++++++++++++++++++
@@ -313,27 +312,25 @@ The application also collects per-layer Performance Measurement (PM) counters fo
313312
each executed infer request if you enable statistics dumping by setting the
314313
``-report_type`` parameter to one of the possible values:
315314

316-
* ``no_counters`` report includes configuration options specified, resulting
317-
FPS and latency.
318-
* ``average_counters`` report extends the ``no_counters`` report and additionally
319-
includes average PM counters values for each layer from the network.
320-
* ``detailed_counters`` report extends the ``average_counters`` report and
315+
* ``no_counters`` - includes specified configuration options, resulting FPS and latency.
316+
* ``average_counters`` - extends the ``no_counters`` report and additionally
317+
includes average PM counters values for each layer from the model.
318+
* ``detailed_counters`` - extends the ``average_counters`` report and
321319
additionally includes per-layer PM counters and latency for each executed infer request.
322320

323-
Depending on the type, the report is stored to ``benchmark_no_counters_report.csv``,
321+
Depending on the type, the report is saved to the ``benchmark_no_counters_report.csv``,
324322
``benchmark_average_counters_report.csv``, or ``benchmark_detailed_counters_report.csv``
325-
file located in the path specified in ``-report_folder``. The application also
326-
saves executable graph information serialized to an XML file if you specify a
327-
path to it with the ``-exec_graph_path`` parameter.
323+
file located in the path specified with ``-report_folder``. The application also
324+
saves executable graph information to an XML file, located in a folder
325+
specified with the ``-exec_graph_path`` parameter.
328326

329327
.. _all-configuration-options-python-benchmark:
330328

331329
All configuration options
332330
+++++++++++++++++++++++++
333331

334-
Running the application with the ``-h`` or ``--help`` option yields the
335-
following usage message:
336-
332+
Run the application with the ``-h`` or ``--help`` flags to get information on
333+
available options and parameters:
337334

338335
.. tab-set::
339336

@@ -605,8 +602,7 @@ following usage message:
605602
}
606603
607604
608-
609-
Running the application with the empty list of options yields the usage message given above and an error message.
605+
The help information is also displayed when you run the application without any parameters.
610606
611607
More information on inputs
612608
++++++++++++++++++++++++++
@@ -626,18 +622,18 @@ Examples of Running the Tool
626622
############################
627623
628624
This section provides step-by-step instructions on how to run the Benchmark Tool
629-
with the ``asl-recognition`` Intel model on CPU or GPU devices. It uses random data as the input.
625+
with the ``asl-recognition`` Intel model on CPU or GPU devices. It uses random data as input.
630626
631627
.. note::
632628
633629
Internet access is required to execute the following steps successfully. If you
634-
have access to the Internet through a proxy server only, please make sure that
635-
it is configured in your OS environment.
630+
have access to the Internet through a proxy server only, make sure
631+
it is configured in your OS.
636632
637-
Run the tool, specifying the location of the OpenVINO Intermediate Representation
638-
(IR) model ``.xml`` file, the device to perform inference on, and a performance hint.
639-
The following commands demonstrate examples of how to run the Benchmark Tool
640-
in latency mode on CPU and throughput mode on GPU devices:
633+
Run the tool, specifying the location of the ``.xml`` model file of OpenVINO Intermediate
634+
Representation (IR), the inference device and a performance hint.
635+
The following examples show how to run the Benchmark Tool
636+
on CPU and GPU in latency and throughput mode respectively:
641637
642638
* On CPU (latency mode):
643639
@@ -678,14 +674,15 @@ in latency mode on CPU and throughput mode on GPU devices:
678674
679675
680676
The application outputs the number of executed iterations, total duration of execution,
681-
latency, and throughput. Additionally, if you set the ``-report_type`` parameter,
682-
the application outputs a statistics report. If you set the ``-pc`` parameter,
683-
the application outputs performance counters. If you set ``-exec_graph_path``,
684-
the application reports executable graph information serialized. All measurements
685-
including per-layer PM counters are reported in milliseconds.
677+
latency, and throughput. Additionally, if you set the parameters:
678+
679+
* ``-report_type`` - the application outputs a statistics report,
680+
* ``-pc`` - the application outputs performance counters,
681+
* ``-exec_graph_path`` - the application reports executable graph information serialized.
682+
683+
All measurements including per-layer PM counters are reported in milliseconds.
686684
687-
An example of the information output when running ``benchmark_app`` on CPU in
688-
latency mode is shown below:
685+
An example of running ``benchmark_app`` on CPU in latency mode and its output are shown below:
689686
690687
.. tab-set::
691688
@@ -827,11 +824,11 @@ latency mode is shown below:
827824
[ INFO ] Throughput: 91.12 FPS
828825
829826
830-
The Benchmark Tool can also be used with dynamically shaped networks to measure
827+
The Benchmark Tool can also be used with dynamically shaped models to measure
831828
expected inference time for various input data shapes. See the ``-shape`` and
832829
``-data_shape`` argument descriptions in the :ref:`All configuration options <all-configuration-options-python-benchmark>`
833-
section to learn more about using dynamic shapes. Here is a command example for
834-
using ``benchmark_app`` with dynamic networks and a portion of the resulting output:
830+
section to learn more about using dynamic shapes. Below is an example of
831+
using ``benchmark_app`` with dynamic models and a portion of the resulting output:
835832
836833
837834
.. tab-set::

docs/articles_en/openvino-workflow-generative/inference-with-genai.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -575,6 +575,14 @@ compression is done by NNCF at the model export stage. The exported model contai
575575
information necessary for execution, including the tokenizer/detokenizer and the generation
576576
config, ensuring that its results match those generated by Hugging Face.
577577

578+
.. note::
579+
580+
To use meta-llama/Llama-2-7b-chat-hf model, you will need to accept license agreement.
581+
You must be a registered user in 🤗 Hugging Face Hub. Please visit HuggingFace model card,
582+
carefully read terms of usage and click accept button. You will need to use an access token
583+
for the code below to run. For more information on access tokens, refer to this section of
584+
the documentation. Refer to this document to learn how to login to Hugging Face Hub.
585+
578586
The `LLMPipeline` is the main object to setup the model for text generation. You can provide the
579587
converted model to this object, specify the device for inference, and provide additional
580588
parameters.

0 commit comments

Comments
 (0)