Skip to content

Rangenet++ with high-version TensorRT (e.g.8~10), libtorch, CUDA programming.

License

Notifications You must be signed in to change notification settings

Natsu-Akatsuki/RangeNet-TensorRT

Repository files navigation

🎉 This project has been a pleasure, allowing me to repay technical debt, learn how to locate bugs during model deployment, gain experience with GitHub Actions, and explore CUDA programming. I greatly appreciate the valuable feedback from others that has contributed to improving the project. I hope that this project will be of use to you.

1. Purpose

  1. Use more newer dependencies and APIs. Specifically, we deploy the RangeNet repository in an environment with TensorRT 8+, Ubuntu 20.04+, remove Boost dependency, manage TensorRT objects and GPU memory with smart pointers, and provide ROS demo.

  2. Faster Performance. Resolve the issue of reduced segmentation accuracy when using FP16 (issue#9), achieving a significant speed boost without sacrificing accuracy. Preprocess data using CUDA. Perform KNN post-processing with libtorch ( refer to here).

img

2. Installation

2.1 Docker installation

We provide a Docker installation, please see more in docker/README.md

2.2 Source installation

Step 1: Download and Extract libtorch

Note

Using the Torch library from Conda was observed to slow down the post-processing stage from 6 ms to 30 ms.

$ wget -c https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.10.2%2Bcu113.zip -O libtorch.zip
$ unzip libtorch.zip

Step 2: Set up the deep learning environment (install NVIDIA driver, CUDA, TensorRT, cuDNN). The tested configurations are listed below. At least 3000 MB of GPU memory is required.

Ubuntu GPU TensorRT CUDA cuDNN
20.04 TITAN RTX 8.2.3 CUDA 11.4.r11.4 cuDNN 8.2.4 ✔️
20.04 NVIDIA GeForce RTX 3060 8.4.1.5 CUDA 11.3.r11.3 cuDNN 8.0.5 ✔️
20.04 NVIDIA GeForce RTX 3060
NVIDIA GeForce RTX 4070
10.6.0.26 CUDA 11.1.105 cuDNN 8.0.5.39 ✔️
20.04 NVIDIA GeForce RTX 3060
NVIDIA GeForce RTX 4070
10.6.0.26 CUDA 12.4.r12.4 cuDNN 9.1.0.70-1 ✔️
22.04 NVIDIA GeForce RTX 3060 8.2.5.1 CUDA 11.3.r11.3 cuDNN 8.8.0 ✔️
22.04 NVIDIA GeForce RTX 3060 8.4.1.5 CUDA 11.3.r11.3 cuDNN 8.8.0 ✔️
22.04 NVIDIA GeForce RTX 3060 8.4.3.1 CUDA 11.3.r11.3 cuDNN 8.8.0 ✔️
22.04 NVIDIA GeForce RTX 3060 8.6.1.6 CUDA 11.3.r11.3 cuDNN 8.8.0 ✔️
22.04 NVIDIA GeForce RTX 3060 10.6.0.26 CUDA 11.3.r11.3 cuDNN 8.8.0 ✔️

Note

You must choose the appropriate version of CUDA based on your Compute Capability. For example, if your want to use Compute Capability 89, you must choose CUDA 11.8+.

You can see Compute Capability in https://developer.nvidia.com/cuda-gpus#compute.

GPU Hardware Architecture Compute Capability Relevant GPUs Minimum CUDA Version
Ampere Architecture 86 RTX 3060,RTX3070,RTX 3080,RTX 3090 CUDA 11.1
Ada Lovelace Architecture 89 RTX 4090, RTX 4080 CUDA 11.8

Note

You must choose the appropriate version of CUDA based on your nvidia-driver.

nvidia-driver Version Maximum CUDA Version
545 CUDA 12.3
550 CUDA 12.4

Add the following environment variables to ~/.bashrc:

# Example configuration:

# >>> Deep Learning Configuration >>>
# Import CUDA environment
CUDA_PATH=/usr/local/cuda/bin
CUDA_LIB_PATH=/usr/local/cuda/lib64

# Import TensorRT environment
export TENSORRT_DIR=${HOME}/Application/TensorRT-8.4.1.5/
TENSORRT_PATH=${TENSORRT_DIR}/bin
TENSORRT_LIB_PATH=${TENSORRT_DIR}/lib

# Import libtorch environment
export Torch_DIR=${HOME}/Application/libtorch/share/cmake/Torch

export PATH=${PATH}:${CUDA_PATH}:${TENSORRT_PATH}
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_LIB_PATH}:${TENSORRT_LIB_PATH}

Step 3: (Optional, if ROS components are needed). Please install ROS1 (Noetic) or ROS2 (Humble).

# Install ROS
$ ...
# Install extra dependency
$ sudo apt install ros-${ROS_DISTRO}-pcl-ros

Step 4: Install apt-related and Python packages

$ sudo apt install build-essential python3-dev python3-pip apt-utils git cmake libboost-all-dev libyaml-cpp-dev libopencv-dev python3-empy libfmt-dev
$ pip install catkin_tools trollius numpy

Step 5: Clone the Repository

$ git clone https://github.com/Natsu-Akatsuki/RangeNet-TensorRT ~/rangenet/src/rangenet/

Step 6: Import model files and datasets.

# Download model files
$ wget -c https://github.com/Natsu-Akatsuki/RangeNet-TensorRT/releases/download/v0.0.0-alpha/model.onnx -O ~/rangenet/src/rangenet/model/model.onnx

Download datasets: see Baidu Cloud.

Directory Structure
.
├── model
│   ├── arch_cfg.yaml
│   ├── data_cfg.yaml
│   └── model.onnx
├── data
└── ├── 000000.pcd
    ├── kitti_2011_09_30_drive_0027_synced
    └── kitti_2011_09_30_drive_0027_synced.bag
    

3. Usage

Note

The first run may take some time to generate the TensorRT optimized engine.

Note

Since we use set(CMAKE_CUDA_STANDARD 17), a feature introduced in CMake 3.18, it requires at least version 3.18. Unfortunately, the default CMake version in Ubuntu 20.04 is 3.16.3. Therefore, we provide a workaround to use a higher version of CMake with minimal effort.

$ pip3 install --user cmake==3.18
$ echo 'export PATH=${HOME}/.local/bin:${PATH}' >> ~/.bashrc
🔧 Usage 1: Run data in ROS1 or ROS2

img

# >>> ROS1 >>>
$ cd ~/rangetnet/
# USE -Wno-dev to suppress PCL WARNING
$ catkin build --cmake-args -Wno-dev
$ source devel/setup.bash
$ roslaunch rangenet_pp ros1_rangenet.launch
$ roslaunch rangenet_pp ros1_bag.launch

# >>> ROS2 >>>
$ cd ~/rangetnet/
$ colcon build --symlink-install
$ source install/setup.bash
$ ros2 launch rangenet_pp ros2_rangenet.launch
$ ros2 launch rangenet_pp ros2_bag.launch
🔧 Usage 2: Predict single-frame point clouds (PCD format)

[!note] PCD point cloud fields must be xyzi, and the intensity field should be normalized (0-1).

# Modify the parameters in config/infer.yaml
$ cd ~/rangenet/src/rangenet/
$ mkdir build
$ cd build

# To display inference time: cmake -DPERFORMANCE_LOG=ON .. && make
$ unset ROS_VERSION && cmake -Wno-dev .. && make -j4
$ ./demo
Step Time
Preprocessing 1.51363 ms
Inference 21.8513 ms
Postprocessing 4.98176 ms

4. FAQ

Issue 1: [libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1:

The ONNX model is incomplete. Please Re-download the model.

Issue 2: Segmentation fault [Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)] when visualizing single point cloud frames in Ubuntu 22.04 using PCL.

Use PCL library version 1.13.0+. Please provide variable PCL_DIR in cmake/ThirdParty.cmake. See more in Here.

Roadmap

  • Test ROS1 demo
  • Resolve issue#8 (2023.07.01)
  • Add English documentation (2024.11.19)
  • Explain why using FP16 leads to precision degradation [See more in Here] (2024.11.28)
  • Provide a Docker environment (2024.11.30)
  • Add Pybind11 implementation
  • Resolve non-reproducibility
  • Refactor code to follow coding standards and improve readability