Skip to content

Commit

Permalink
Update deepsparse-readme.md (#846)
Browse files Browse the repository at this point in the history
* Update deepsparse-readme.md

* Update deepsparse-readme.md

* Add files via upload

* Update deepsparse-readme.md

* Update deepsparse-readme.md

* Add files via upload

* Update deepsparse-readme.md

* Update deepsparse-readme.md

* Add files via upload

* Add files via upload

* Update deepsparse-readme.md

* Update deepsparse-readme.md

* Add files via upload

* Update deepsparse-readme.md

* Delete batch-1.png

* Delete batch-64.png

* Update deepsparse-readme.md

Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Mark Kurtz <[email protected]>
  • Loading branch information
3 people authored Jan 5, 2023
1 parent 6e98f33 commit a3900c0
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 78 deletions.
Binary file added cluster-bar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
153 changes: 75 additions & 78 deletions examples/ultralytics-yolo/ultralytics-readmes/deepsparse-readme.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,23 @@
# DeepSparse
<div style="display: flex; flex-direction: column; align-items: center;">
<h1>
DeepSparse &nbsp;&nbsp;<img alt="tool icon" src="https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/icon-deepsparse.png" />
</h1>
</div>

This guide explains how to deploy YOLOv5 with Neural Magic's DeepSparse.
Welcome to software-delivered AI.

## About DeepSparse
This guide explains how to deploy YOLOv5 with Neural Magic's DeepSparse.

Welcome to software-delivered AI.
DeepSparse is an inference runtime with exceptional performance on CPUs. For instance, compared to ONNX Runtime's baseline, DeepSparse offers a 3.7x speed-up at batch size 1 and a 5.8x speed-up at batch size 64 for YOLOv5s!

DeepSparse is an inference runtime offering GPU-class performance on CPUs. For the first time, your deep learning workloads can meet the performance
demands of production without the complexity and costs of hardware accelerators.
<p align="center">
<img width="60%" src="cluster-bar.png">
</p>

Simply put, DeepSparse gives you the performance of GPUs and the simplicity of software:
For the first time, your deep learning workloads can meet the performance demands of production without the complexity and costs of hardware accelerators.
Put simply, DeepSparse gives you the performance of GPUs and the simplicity of software:
- **Flexible Deployments**: Run consistently across cloud, data center, and edge with any hardware provider from Intel to AMD to ARM
- **Near-Infinite Scalability**: Scale vertically from 1 to 192 cores, out with standard Kubernetes, or fully-abstracted with Serverless
- **Infinite Scalability**: Scale vertically to 100s of cores, out with standard Kubernetes, or fully-abstracted with Serverless
- **Easy Integration**: Clean APIs for integrating your model into an application and monitoring it in production

**[Start your 90 day Free Trial](https://neuralmagic.com/deepsparse-free-trial/?utm_campaign=free_trial&utm_source=ultralytics_github).**
Expand All @@ -20,21 +26,20 @@ Simply put, DeepSparse gives you the performance of GPUs and the simplicity of s

DeepSparse takes advantage of model sparsity to gain its performance speedup.

Sparsification through pruning and quantization is a broadly studied technique, allowing reductions of 10x in the size and compute needed to
execute a network, while maintaining high accuracy. DeepSparse is sparsity-aware, so it skips the multiply-adds by 0, shrinking amount of compute
in a forward pass. Since the sparse computation is memory bound, DeepSparse executes the network depth-wise, breaking the problem into Tensor Columns,
Sparsification through pruning and quantization is a broadly studied technique, allowing order-of-magnitude reductions in the size and compute needed to
execute a network, while maintaining high accuracy. DeepSparse is sparsity-aware, meaning it skips the zeroed out parameters, shrinking amount of compute
in a forward pass. Since the sparse computation is now memory bound, DeepSparse executes the network depth-wise, breaking the problem into Tensor Columns,
vertical stripes of computation that fit in cache.

<p align="center">
<img width="60%" src="tensor-columns.png">
</p>

Sparse computation, executed depth-wise in cache, allows DeepSparse to deliver GPU-class performance on CPUs!
Sparse networks with compressed computation, executed depth-wise in cache, allows DeepSparse to deliver GPU-class performance on CPUs!

### How Do I Create A Sparse Version of YOLOv5 Trained on My Data?

Neural Magic's open-source model repository SparseZoo and open-source model optimization library SparseML are integrated with Ultralytics YOLOv5, making
it easy to fine-tune a pre-sparsified version of any YOLOv5 model onto custom data with a single CLI command.
Neural Magic's open-source model repository, SparseZoo, contains pre-sparsified checkpoints of each YOLOv5 model. Using SparseML, which is integrated with Ultralytics, you can fine-tune a sparse checkpoint onto your data with a single CLI command.

[Checkout Neural Magic's YOLOv5 documentation for more details](https://docs.neuralmagic.com/use-cases/object-detection/sparsifying).

Expand All @@ -44,120 +49,100 @@ We will walk through an example benchmarking and deploying a sparse version of Y

### Install DeepSparse

Run the following to install DeepSparse. We recommend you use a virtual enviornment.
Run the following to install DeepSparse. We recommend you use a virtual enviornment with Python.

```bash
pip install deepsparse[server,yolo]
pip install deepsparse[server,yolo,onnxruntime]
```

### Collect an ONNX File

DeepSparse accepts a model in the ONNX format, passed either as:
- A SparseZoo stub which identifies an ONNX file in the SparseZoo
- A Local Path to an ONNX model in a filesystem
- A local path to an ONNX model in a filesystem

We will compare the pruned-quantized YOLOv5s to the standard dense YOLOv5s, identified by the following SparseZoo stubs:
```
The examples below will use the standard dense YOLOv5s and pruned-quantized YOLOv5s checkpoints, identified by the following SparseZoo stubs:
```bash
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni # < pruned for VNNI machines
```

### Benchmark Performance

DeepSparse includes a benchmarking script to test performance. The examples below use an AWS `c6i.4xlarge` instance (8 cores).
We will compare DeepSparse's throughput to ONNX Runtime's throughput on YOLOv5s, using DeepSparse's benchmarking script.

We will show DeepSparse's performance in several scenarios against ONNX Runtime, which can be installed with the following:
```
pip install onnxruntime
```
The benchmarks were run on an AWS `c6i.8xlarge` instance (16 cores).

#### ONNX Runtime Baseline
#### Batch 1 Performance Comparison

The performance benchmarking script includes an option to run with ONNX Runtime as a baseline. Run the following:
ONNX Runtime achieves 49 images/sec with dense YOLOv5s.

```bash
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 1 -e onnxruntime
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 1 -nstreams 1 -e onnxruntime

> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
> Batch Size: 1
> Scenario: sync
> Throughput (items/sec): 17.2266
> Latency Mean (ms/batch): 58.0394
> Latency Median (ms/batch): 58.0130
> Latency Std (ms/batch): 0.2080
> Iterations: 173
> Throughput (items/sec): 48.8549
> Latency Mean (ms/batch): 20.4613
> Latency Median (ms/batch): 20.4192
```
At batch 1, ORT achieves 17 images/sec with dense YOLOv5s.

#### DeepSparse Dense Performance

While DeepSparse gets its best performance with sparse models, it also runs dense models well. Run the following:
DeepSparse achieves 135 items/sec with a pruned-quantized YOLOv5s, **a 2.8x performance gain over ONNX Runtime!**

```bash
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 1
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none -s sync -b 1 -nstreams 1

> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
> Batch Size: 1
> Scenario: sync
> Throughput (items/sec): 35.2821
> Latency Mean (ms/batch): 28.3274
> Latency Median (ms/batch): 28.3064
> Latency Std (ms/batch): 0.1022
> Iterations: 353
> Throughput (items/sec): 135.0647
> Latency Mean (ms/batch): 7.3895
> Latency Median (ms/batch): 7.2398```
```

At batch 1, DeepSparse achieves 35 images/sec with dense YOLOv5s, a **2x performance improvement over ORT**!

#### DeepSparse Sparse Performance

When sparsity is applied to the model, DeepSparse's performance is even better. Run the following:
Since `c6i.8xlarge` instances have VNNI instructions, DeepSparse's throughput can be pushed further if weights are pruned in blocks of 4. DeepSparse achieves 180 items/sec with a 4-block pruned-quantized YOLOv5s, a **3.7x performance gain over ONNX Runtime!**
```bash
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 -s sync -b 1
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni -s sync -b 1 -nstreams 1
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned35_quant-none-vnni
> Batch Size: 1
> Scenario: sync
> Throughput (items/sec): 79.8339
> Latency Mean (ms/batch): 12.5111
> Latency Median (ms/batch): 12.5534
> Latency Std (ms/batch): 0.1546
> Iterations: 799
> Throughput (items/sec): 179.6016
> Latency Mean (ms/batch): 5.5615
> Latency Median (ms/batch): 5.5458
```
At batch 1, DeepSparse achieves 80 images/sec with a pruned-quantized YOLOv5s, a **4.7x performance improvement over ORT**!

#### Batch 64 Performance Comparison
In latency-insensitive scenarios with large batch sizes, DeepSparse's performance relative to ORT is even stronger.
In latency-insensitive scenarios with large batch sizes, DeepSparse's performance relative to ONNX Runtime is even stronger.

ONNX Runtime achieves 42 images/sec with dense YOLOv5s:

ORT achieves 15 images/sec at batch 64:
```bash
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 64 -e onnxruntime
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none -s sync -b 64 -nstreams 1 -e onnxruntime
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none
> Batch Size: 64
> Scenario: sync
> Throughput (items/sec): 14.7454
> Latency Mean (ms/batch): 4339.7396
> Latency Median (ms/batch): 4190.1105
> Latency Std (ms/batch): 211.6520
> Iterations: 3
> Throughput (items/sec): 41.5560
> Latency Mean (ms/batch): 1538.6640
> Latency Median (ms/batch): 1538.0362
```

DeepSparse achieves 148 images/sec at batch 64, a **10x performance improvement over ORT**!
DeepSparse achieves 239 images/sec with pruned-quantized YOLOv5s, a **5.8x performance improvement over ORT**!

```bash
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 -s sync -b 64
deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none -s sync -b 64 -nstreams 1
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
> Batch Size: 64
> Scenario: sync
> Throughput (items/sec): 147.6844
> Latency Mean (ms/batch): 433.3391
> Latency Median (ms/batch): 433.1426
> Latency Std (ms/batch): 0.9764
> Iterations: 24
> Throughput (items/sec): 239.0854
> Latency Mean (ms/batch): 267.6703
> Latency Median (ms/batch): 267.3194
```

### Deploy a Model
Expand All @@ -169,6 +154,18 @@ To try the deployment examples below, pull down a sample image for the example a
wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
```

#### Annotate CLI
You can also use the annotate command to have the engine save an annotated photo on disk. Try --source 0 to annotate your live webcam feed!
```bash
deepsparse.object_detection.annotate --model_filepath zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 --source basilica.jpg
```

Running the above command will create an `annotation-results` folder and save the annotated image inside.

<p align = "center">
<img src="https://github.com/neuralmagic/deepsparse/blob/d31f02596ebff2ec62761d0bc9ca14c4663e8858/src/deepsparse/yolo/sample_images/basilica-annotated.jpg" alt="annotated" width="60%"/>
</p>

#### Python API

`Pipelines` wrap pre-processing and output post-processing around the runtime, providing a clean inferface for adding DeepSparse to an application.
Expand All @@ -183,7 +180,7 @@ from deepsparse import Pipeline
images = ["basilica.jpg"]
# create Pipeline
model_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94"
model_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none"
yolo_pipeline = Pipeline.create(
task="yolo",
model_path=model_stub,
Expand All @@ -205,7 +202,7 @@ Spin up the Server with the pruned-quantized YOLOv5s:
```bash
deepsparse.server \
--task yolo \
--model_path zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94
--model_path zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned65_quant-none
```

An example request, using Python's `requests` package:
Expand All @@ -228,6 +225,6 @@ labels = annotations["labels"]
## Get Started With DeepSparse
**Research or Testing?** DeepSparse Community is free for research and testing. Production deployments require DeepSparse Enterprise.
**Research or Testing?** DeepSparse Community is free for research and testing. Get started with our [Documentation](https://docs.neuralmagic.com/).
**Want to Try DeepSparse Enterprise?** [Start your 90 day free trail](https://neuralmagic.com/deepsparse-free-trial/?utm_campaign=free_trial&utm_source=ultralytics_github).
**Want to Try DeepSparse Enterprise?** [Start your 90 day free trial](https://neuralmagic.com/deepsparse-free-trial/?utm_campaign=free_trial&utm_source=ultralytics_github).

0 comments on commit a3900c0

Please sign in to comment.