This directory provides examples of how to deploy Deep CNNs on FPGAs using Xilinx Python APIs.
All examples provided within this directory exercise precompiled versions of GoogLeNet or ResNet50 whose components are stored in the local ./data directory.
A "compiled model" consists of low level HW instructions, and quantization parameters.
Compiler Outputs: JSON FILE (HW Instructions *_56.json or _28.json), model_data directory with preprocessed floating point weights.
Quantizer Outputs: JSON FILE containing scaling factors for each layer in the corresponding network. (_8b.json or *_16b.json)
Important Notes:
- The final layers of the network (Fully connected, Softmax) are ran on the CPU, as those layers are not supported by the FPGA
- The streaming_classify example will require you to download the imagnet validation set. Once you download it, you can provide the path to the directory using the flag
-d <DIRECTORY>
- Amazon AWS EC2 F1 requires root privileges to load the FPGA, use the documented workaround
The following three examples of applications using the Python xfDNN API are provided:
- A Test Classification example that demonstrates how to run inference on a single image "dog.jpg"
- A Streaming Classifcation example that streams images from disk through the FPGA for classification.
- A Multi-Network example that shows different DNNs running independently on multiple processing elements on the FPGA.
To run any of the three examples, use the provided bash run.sh script.
-
Navigate to the
ml-suite/examples/deployment_modes
dir.$ cd ml-suite/examples/deployment_modes
-
Familiarize yourself with the script usage by:
./run.sh -h
The key parameters are:- -p platform Valid values are
alveo-u200
,alveo-u250
,aws
,nimbix
,1525
,1525-ml
- Note: Generally this switch is no longer needed, because we auto-detect the platform
- -t test - Valid values are
test_classify
orstreaming_classify
ormultinet
- -m model - Valid values are
googlenet_v1
orresnet50
, default isgooglenet_v1
- -c compiler optimized - This flag runs the network with a compiler optimization for max throughput or min latency
- -g - This flag enables accuracy checking given a golden result text file.
- -p platform Valid values are
- Single Image Classification on alveo, ResNet50 v1:
$ ./run.sh -t test_classify -m resnet50
- Single Image Classification on AWS, GoogLeNet:
$ ./run.sh -t test_classify -m googlenet_v1
- Streaming Image Classification on alveo, GoogLeNet:
$ ./run.sh -t streaming_classify -d $HOME/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min
- Streaming Image Classification on alveo, throughput optimized, and reporting accuracy for Imagenet validation set:
$ ./run.sh -t streaming_classify -g -c throughput -d $HOME/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min
- Streaming Image Classification FPGA-only mode on alveo (image pre-processing and output accuracy check are skipped), throughput optimized:
$ ./run.sh -t streaming_classify_fpgaonly -c throughput -d $HOME/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min
- Streaming Image Classification FPGA-only mode with live pipeline performance report:
To exit the streaming report view, press CTRL-Z and type kill -9 %%.
$ ./run.sh -t streaming_classify_fpgaonly -c throughput -d $HOME/CK-TOOLS/dataset-imagenet-ilsvrc2012-val-min -x -v | python $MLSUITE_ROOT/xfdnn/rt/scripts/speedometer.py
- Multinet Image Classification on alveo
./run.sh -t multinet
Take a look at the following scripts to understand the examples:
- run.sh
- test_classify.py
- mp_classify.py
- test_classify_async_multinet.py
The python scripts use the arg parser defined in xdnn_io.py
--xclbin
- Defines which FPGA binary overlay to use. The available binaries are stored in overlaybins--netcfg
- FPGA instructions generated by the Compiler for the network being ran--quantizecfg
- Path to json file to use for quantization (The json file contains scaling params)--fpgaoutsz
- Flattened size of the final activation computed by FPGA (The FPGA will not do FC layers or Softmax)--datadir
- Path to data files to run for the network (weights)--labels
- Path to text file containing line seperated labels--golden
- Path to text file containing line seperated correct labels--images
- Directory with image files to classify (Only applicable to streaming_classify)--jsoncfg
- Path to json file used to define seperate networks (Only applicable to multinet)
For Multinet deployments, the different models/networks are set in the --jsoncfg
file. For the Multinet example given above, see how to set the arguments here [multinet.json][]
$ ./run.sh -t test_classify
=============== pyXDNN =============================
[XBLAS] # kernels: 1
Linux:4.4.0-121-generic:#145-Ubuntu SMP Fri Apr 13 13:47:23 UTC 2018:x86_64
Distribution: Ubuntu 16.04.2 LTS
GLIBC: 2.23
---
---
CL_PLATFORM_VENDOR Xilinx
CL_PLATFORM_NAME Xilinx
CL_DEVICE_0: 0x21f60c0
CL_DEVICES_FOUND 1, using 0
loading /opt/ml-suite/overlaybins/1525/overlay_4.xclbin
[XBLAS] kernel0: kernelSxdnn_0
[XDNN] loading xclbin settings from /opt/ml-suite/overlaybins/1525/overlay_4.xclbin.json
[XDNN] using custom DDR banks 0, 3
Loading weights/bias/quant_params to FPGA...
[XDNN] kernel configuration
[XDNN] num cores : 2
[XDNN] dsp array width : 96
[XDNN] axi data width (in 32bits) : 16
[XDNN] img mem size : 9 MB
[XDNN] max instr num : 1536
[XDNN] max xbar entries : 4096
[XDNN] version : 3.1
[XDNN] 8-bit mode : 1
---------- Prediction 1/2 for /opt/ml-suite/examples/deployment_modes/dog.jpg ----------
0.5986 "n02112018 Pomeranian"
0.2033 "n02123394 Persian cat"
0.0319 "n02492035 capuchin, ringtail, Cebus capucinus"
0.0271 "n02085620 Chihuahua"
0.0198 "n02123597 Siamese cat, Siamese"