VAD: Vectorized Scene Representation for Efficient Autonomous Driving(VAD) is an end-to-end vectorized paradigm for autonomous driving.
In this demo, we will use VAD-Tiny as our deployment target.
| Method | Backbone precision |
Head precision |
Framework | avg. L2 | Latency(ms) |
|---|---|---|---|---|---|
| VAD-Tiny | fp16 | fp32 | TensorRT | 0.78 | 90.2 (On Orin) |
cd /workspace
git clone https://github.com/hustvl/VAD.git
git clone https://github.com/NVIDIA/DL4AGX.gitPlease follow the instructions in the official repo (install.md, prepare-dataset.md) to setup VAD environment first.
Then download VAD_tiny.pth from google drive to folder /workspace/VAD/ckpts,
You may verify your installation with
cd /workspace/VAD
CUDA_VISIBLE_DEVICES=0 python tools/test.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py /workspace/VAD/ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmpThis command line is expected to output the benchmark results. This environment will be referred to as torch container.
NOTE You may need to adjust the original repo to reproduce the correct results
img_norm_cfg = dict( mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
- If you got attribute error in /workspace/VAD/projects/mmdet3d_plugin/core/bbox/structures/lidar_box3d.py
# from mmdet3d.ops.roiaware_pool3d import points_in_boxes_gpu from mmdet3d.ops import points_in_boxes_all as points_in_boxes_gpu
- For best user experience, we recommend use torch >= 1.12. You may also build the docker with given ./dockerfile. To build the docker, here is the example command line. You may change the argument for volume mapping according to your setup.
cd /workspace/DL4AGX/AV-Solutions/vad-trt docker build --network=host -f dockerfile . -t vad-trt docker run --name=vad-trt -d -it --rm --shm-size=4096m --privileged --gpus all -it --network=host \ -v /workspace:/workspace -v <path to nuscenes>:/data \ vad-trt /bin/bash
To setup the deployment environment, you may run the following commands.
cd /workspace/DL4AGX/AV-Solutions/vad-trt/export_eval
ln -s /workspace/VAD/data data # create a soft-link to the data folder
export PYTHONPATH=.:/workspace/VADAs VAD is a temporal model, the inference behavior is different between the first frame and the subsequent frames.
When one frame has its previous frame, it will first do a temporal warp then concat the warped feature map to the current one.
To deploy the ONNX of the first frame, you may run
python export_no_prev.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py /workspace/VAD/ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmpTo deploy the ONNX of the subsequent frames, you may run
python export_prev.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py /workspace/VAD/ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmpAfter these two command lines, you are expected to see vadv1.extract_img_feat, vadv1.pts_bbox_head.forward, vadv1_prev.pts_bbox_head.forward under /workspace/DL4AGX/AV-Solutions/vad-trt/export_eval/scratch. Each folder contains dumped input and output tensors in binary format, and an ONNX file begin with sim_.
We provide test_tensorrt.py to run benchmark with TensorRT. It will produce similar result as the original benchmark with pytorch.
- To prepare dependencies for benchmark:
pip install pycuda numpy==1.23
pip install <TensorRT Root>/python/tensorrt-<version>-cp38-none-linux_aarch64.whl- Build plugins for benchmark
export TRT_ROOT=<path to your tensorrt dir>
cd /workspace/dl4agx/AV-Solutions/vad-trt/plugins/
mkdir build && cd build
cmake ..
make- Then you need to build tensorrt engine.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TRT_ROOT>/lib
export PATH=$PATH:<TRT_ROOT>/bin
mkdir /workspace/DL4AGX/AV-Solutions/vad-trt/export_eval/vadv1.extract_img_feat
mkdir /workspace/DL4AGX/AV-Solutions/vad-trt/export_eval/vadv1.pts_bbox_head.forward
mkdir /workspace/DL4AGX/AV-Solutions/vad-trt/export_eval/vadv1_prev.pts_bbox_head.forward
# build image encoder
trtexec --onnx=scratch/vadv1.extract_img_feat/sim_vadv1.extract_img_feat.onnx \
--staticPlugins=../plugins/build/libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--fp16 \
--saveEngine=vadv1.extract_img_feat/vadv1.extract_img_feat.fp16.engine
# build heads
trtexec --onnx=scratch/vadv1.pts_bbox_head.forward/sim_vadv1.pts_bbox_head.forward.onnx \
--staticPlugins=../plugins/build/libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--saveEngine=vadv1.pts_bbox_head.forward/vadv1.pts_bbox_head.forward.engine
# build heads with prev_bev
trtexec --onnx=scratch/vadv1_prev.pts_bbox_head.forward/sim_vadv1_prev.pts_bbox_head.forward.onnx \
--staticPlugins=../plugins/build/libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--saveEngine=vadv1_prev.pts_bbox_head.forward/vadv1_prev.pts_bbox_head.forward.engine- Run benchmark with tensorrt
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TensorRT Root>/lib
python test_tensorrt.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmpAs we replace the backend from pytorch to tensorrt while keeping other parts like data loading and evaluation unchanged, you are expected to see outputs similar to the pytorch benchmark.
This model is to be deployed on NVIDIA DRIVE Orin with TensorRT 8.6.13.3.
We recommend using the following NVIDIA DRIVE docker image drive-agx-orin-linux-aarch64-sdk-build-x86:6.0.10.0-0009 as the cross-compile environment, this container will be referred to as the build container.
To launch the docker on the host x86 machine, you may run:
docker run --gpus all -it --network=host --rm \
-v /workspace:/workspace \
nvcr.io/drive/driveos-sdk/drive-agx-orin-linux-aarch64-sdk-build-x86:latestTo gain access to the docker image and the corresponding TensorRT, please join the DRIVE AGX SDK Developer Program. You can find more details on NVIDIA Drive site.
Similar to the benchmark section, we run the following command lines inside the build container to build the plugin for NVIDIA Drive Orin.
# inside cross-compile environment
cd /workspace/dl4agx/AV-Solutions/vad-trt/plugins/
mkdir -p build+orin && cd build+orin
cmake .. -DTARGET=aarch64
makeIf everything goes well, you will see libplugins.so under vad-trt/plugins/build+orin/
NOTE If you encountered with
No CMAKE_CUDA_COMPILER could be found., please run the command line below to help cmake locatenvccexport PATH=$PATH:/usr/local/cuda/bin
Similar to what we did when building plugins, you may run the following commands inside the build container.
# inside cross-compile environment
cd /workspace/dl4agx/AV-Solutions/vad-trt/app
bash setup_dep.sh # download dependencies (stb, cuOSD)
mkdir -p build+orin && cd build+orin
cmake -DTARGET=aarch64 .. && makeWe expected to see vad_app under vad-trt/app/build+orin/
In this demo run, we will setup everything under folder vad-trt/app/demo.
- prepare plugin and app
cd /workspace/dl4agx/AV-Solutions/vad-trt/
cp plugins/build+orin/libplugins.so app/demo/
cp app/build+orin/vad_app app/demo- prepare input data and onnx files
In the
torch containerenvironment, run
cd /workspace/dl4agx/AV-Solutions/vad-trt/export_eval
python save_data.py /workspace/VAD/projects/configs/VAD/VAD_tiny_stage_2.py /workspace/VAD/ckpts/VAD_tiny.pth --launcher none --eval bbox --tmpdir tmpThis will dump necessary data files to vad-trt/export_eval/demo_data/<numbers>/.
We can then move them by
cd /workspace/dl4agx/AV-Solutions/vad-trt/
cp -r export_eval/demo_data/ demo/data
cp -r export_eval/scratch/vadv1.extract_img_feat/ demo/onnx_files
cp -r export_eval/scratch/vadv1_prev.pts_bbox_head.forward/ demo/onnx_files
mkdir /workspace/dl4agx/AV-Solutions/vad-trt/app/demo/enginesNow the demo folder should be organized as:
├── config.json
├── data/
│ ├── 1/
│ ├── 2/
│ └── ...
├── libplugins.so
├── onnx_files/
│ ├── vadv1.extract_img_feat
│ ├── vadv1_prev.pts_bbox_head.forward
│ └── vadv1.pts_bbox_head.forward
├── engines/
├── simhei.ttf
└── viz/
You may utilize trtexec to build the engine from the onnx file on NVIDIA Drive Orin.
cd AV-Solutions/vad-trt/app/demo/
# build image encoder
trtexec --onnx=onnx_files/vadv1.extract_img_feat/sim_vadv1.extract_img_feat.onnx \
--staticPlugins=./libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--fp16 \
--saveEngine=engines/vadv1.extract_img_feat.fp16.engine
# build heads
trtexec --onnx=onnx_files/vadv1_prev.pts_bbox_head.forward/sim_vadv1_prev.pts_bbox_head.forward.onnx \
--staticPlugins=./libplugins.so \
--profilingVerbosity=detailed --dumpProfile \
--separateProfileRun --useSpinWait --useManagedMemory \
--saveEngine=engines/vadv1_prev.pts_bbox_head.forward.engineTo run the demo app, just simply call
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TRT_ROOT>/lib
cd AV-Solutions/vad-trt/app/demo
./vad_app config.jsonThen you may find visualize result under vad-trt/app/demo/viz in jpg format.
Similar to uniad-trt/README.md#results, we only show detection and planning results.
You may find the video demo in demo.webm
- VAD and it's related code was licensed under Apache-2.0
- cuOSD and it's related code was licensed under MIT
