This sample application demonstrates the single-view 3D tracking with DeepStream SDK. Given the camera matrix and human model of a static camera, this application estimates and keeps tracking of object states in the 3D physical world. It can recover the complete bounding box, foot location and body convex hulls precisely from partial occlusions. For algorithm and setup details, please refer to DeepStream Single View 3D Tracking Documentation.
This sample application can be run on both x86 and Jetson platforms inside DeepStream container. Check here for DeepStream container setup.
-
Download the latest DeepStream container image from NGC (e.g., DS 7.1 in the example below)
export DS_IMG_NAME="nvcr.io/nvidia/deepstream:7.1-triton-multiarch" docker pull $DS_IMG_NAME
-
Git clone the current
deepstream_reference_apps
repository to the host machine, and enter single-view 3D tracking directory inside the repository.git clone https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps.git cd deepstream_reference_apps/deepstream-tracker-3d
-
Download NVIDIA pretrained
PeopleNet
for detection from [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/peoplenet/files?version=deployable_quantized_onnx_v2.6.3(e.g., PeopleNet v2.6.3 in the example below).# current directory: deepstream_reference_apps/deepstream-tracker-3d mkdir -p models/PeopleNet cd models/PeopleNet wget --no-check-certificate --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/deployable_quantized_onnx_v2.6.3/zip -O peoplenet_deployable_quantized_onnx_v2.6.3.zip unzip peoplenet_deployable_quantized_onnx_v2.6.3.zip
The model files are now stored in
PeopleNet
directory asdeepstream-tracker-3d ├── configs ├── streams └── models └── PeopleNet ├── labels.txt ├── resnet34_peoplenet.onnx └── resnet34_peoplenet_int8.txt
Launch the container from current directory, and execute the 3D tracking pipeline inside the container. The current config requires users to run with a display because it uses EGL sink to visualize the overlay results. To run through ssh without display, please change type=2
to 1
in group [sink0]
in that file. Users can check DeepStream sink group for the usage of each sink.
cd ../..
sudo xhost + # give container access to display
# current directory: deepstream_reference_apps/deepstream-tracker-3d
docker run --gpus all -it --rm --net=host --privileged -v /tmp/.X11-unix:/tmp/.X11-unix -v $(pwd):/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-tracker-3d -e DISPLAY=$DISPLAY $DS_IMG_NAME
Inside container, run the following commands. Please note that when deepstream-app
is launched for the first time, it tries to create model engine files, which may take a couple minutes, depending on HW platforms.
# Install prerequisites
cd /opt/nvidia/deepstream/deepstream/
bash user_additional_install.sh
# Download ReID model
export REID_DIR="/opt/nvidia/deepstream/deepstream/samples/models/Tracker"
mkdir -p $REID_DIR
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/reidentificationnet/versions/deployable_v1.0/files/resnet50_market1501.etlt' -P $REID_DIR
# Run 3D tracking pipeline
cd /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-tracker-3d/configs
mkdir -p track_results
deepstream-app -c deepstream_app_source1_3d_tracking.txt
When the pipeline is launced, DeepStream shows the output video like below while processing the input video. The bounding boxes are reprojected from 3D to 2D and correspond to full human body even though there are partial occlusions. The result video is saved as out.mp4
.
The extracted metadata (e.g., bounding box, frame num, target ID, etc.) is saved in both extended MOT and KITTI format. For detailed explanation, please refer to DeepStream Tracker Miscellaneous Data Output.
The MOT results can be found in track_dump_0.txt
file, which contains all the object metadata for one stream, and the data format is defined below. The foot image position and the convex hull of the projected cylindrical human model are defined in video frame coordinates, and can be used to draw the visualization figures below. Users can create such overlay video or image using their favorite tools like OpenCV. The foot world position is defined in the 3D world ground plane corresponding to the 3x4 camera projection matrix.
frame number(starting from 1) | object unique id | bbox left | bbox top | bbox width | bbox height | confidence | Foot World Position X | Foot World Position Y | blank | class id | tracker state | visibility | Foot Image Position X | Foot Image Position Y | ConvexHull Points Relative to bbox center |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
unsigned int | long unsigned int | int | int | int | int | float | float | float | int | unsigned int | int | float | float | float | int separated by vertical bar |
Sample output is like below. The green dot in the visualization figure is plotted as (Foot Image Position X, Foot Image Position Y)
, and the cylinder is plotted by connecting convex hull points. For example, the bbox in the first line of the output file has center (1433, -50)
. Then its convex hull points are (1433 - 94, -50 - 170), (1433 - 87, -50 - 176), ..., (1433 - 23, -50 + 190)
. Users can plot the foot location and convex hull on their own as shown in the figure below.
1,1,1366,-195,134,290,1.000,-171.080,1058.295,-1,0,2,0.220,1458,80,-94|-170|-87|-176|-71|-183|-49|-191|-23|-196|0|-200|18|-201|29|-198|95|165|95|173|85|183|66|191|42|198|16|202|-4|201|-18|197|-23|190
2,1,1365,-194,135,290,0.989,-170.646,1045.254,-1,0,2,0.230,1458,84,-94|-170|-87|-176|-71|-183|-49|-191|-23|-196|0|-200|18|-201|29|-198|95|165|95|173|85|183|66|191|42|198|16|202|-4|201|-18|197|-23|190
3,1,1366,-196,134,290,0.860,-170.679,1054.089,-1,0,2,0.229,1458,82,-94|-170|-87|-176|-71|-183|-49|-191|-23|-196|0|-200|18|-201|29|-198|95|165|95|173|85|183|66|191|42|198|16|202|-4|201|-18|197|-23|190
...
The KITTI results can be found in track_results
folder. A file will be created for each frame in each stream, and the data format is defined below.
object Label | object Unique Id | blank | blank | blank | bbox left | bbox top | bbox right | bbox bottom | blank | blank | blank | blank | blank | blank | blank | confidence | visibility (optional) | Foot Image Position X (optional) | Foot Image Position Y (optional) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
string | long unsigned | float | int | float | float | float | float | float | float | float | float | float | float | float | float | float | float | float | float |
Each frame is saved as track_results/00_000_xxxxxx.txt
. Sample output of a frame is like below. Note that if a object is found in past frame data, it wouldn't have visibility and foot position in KITTI dump.
person 1 0.0 0 0.0 1365.907227 -196.875290 1500.249146 93.554581 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.890186 0.229870 1458.166992 81.585060
person 6 0.0 0 0.0 1419.655151 72.774818 1647.446167 561.540894 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.841420 0.408513 1575.264160 531.851746
person 0 0.0 0 0.0 1008.387817 36.228714 1202.886353 421.609314 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.632065 0.504738 1148.117188 399.992645
...
To run the single view 3D tracking on other videos, the following changes are required.
- In
deepstream_app_source1_3d_tracking.txt
, changeuri=file://../streams/Retail02_short.mp4
to the new video name,width=1920, height=1080, tracker-width=1920, tracker-height=1080
to the new video's resolution. - Generate camera projection matrix for the new video. Change
projectionMatrix_3x4
incamInfo.yml
into the new matrix.
Note: Multiple streams can run at the same time. Add all the sources to deepstream_app_source1_3d_tracking.txt
, and all the camera information files to config_tracker_NvDCF_accuracy_3D.yml
.
cameraModelFilepath: # In order of the source streams
- 'camInfo-01.yml'
- 'camInfo-02.yml'
- ...