Skip to content

[RA-L] FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models

Notifications You must be signed in to change notification settings

HKUST-Aerial-Robotics/FM-Fusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FM-Fusion: Instance-aware Semantic Mapping
Boosted by Vision-Language Foundation Models

IEEE RA-L 2024
Chuhao Liu1, Ke Wang2,*, Jieqi Shi1, Zhijian Qiao1, and Shaojie Shen1

1HKUST Aerial Robotics Group    2Chang'an University, China   
*Corresponding Author

arxiv YouTube

News

  • [27th Nov 2024] Support semantic reconstruction using SLABIM dataset.
  • [25th Oct 2024] Publish code and RGB-D sequences from ScanNet and SgSlam.
  • [5th Jan 2024] Paper accepted by IEEE RA-L.

FM-Fusion utilizes RAM, GroundingDINO and SAM to reconstruct an instance-aware semantic map. Boosted by the vision foundational models, FM-Fusion can reconstruct semantic instances in real-world cluttered indoor environments.

The following instruction explains how to run FM-Fusion on our RGB-D sequences or ScanNet sequences. If you find its useful, please cite our paper.

@article{
  title={{FM-Fusion}: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models}, 
  author={Liu, Chuhao and Wang, Ke and Shi, Jieqi and Qiao, Zhijian and Shen, Shaojie},
  journal={IEEE Robotics and Automation Letters(RA-L)}, 
  year={2024}, 
  volume={9}, 
  number={3},
  pages={2232-2239}
}

Tabel of Contents

  1. Install
  2. Download Data
  3. Run Instance-aware Semantic Mapping
  4. Use it in Your RGB-D Camera
  5. Acknowledge
  6. License

1. Install

Install dependency packages from Ubuntu Source

sudo apt-get install libboost-dev libomp-dev libeigen3-dev

Install Open3D from its source code.(Install Tutorial)

git clone https://github.com/isl-org/Open3D
cd Open3D
mkdir build && cd build
cmake -DBUILD_SHARED_LIBS=ON ..
make -j12
sudo make install

Follow the official tutorials to install OpenCV, GLOG, jsoncpp. To make it compatible with ROS, please install OpenCV 3.4.xx.

Clone and compile FM-Fusion,

git clone [email protected]:HKUST-Aerial-Robotics/FM-Fusion.git
mkdir build && cd build
cmake .. -DINSTALL_FMFUSION=ON
make -j12
make install

Install the ROS node program, which renders the semantic instance map in Rviz. Install ROS platform following its official guidance. Then, build the ros node we provide,

git submodule update --init --recursive
cd catkin_ws && catkin_make
source devel/setup.bash

2. Download Data

We provide two datasets to evaluate: SgSlam (captured using Intel Realsense L-515) and ScanNet. Their sequences can be downloaded:

Please check data format for the illustration about data in each sequence. After download the scans folder in each dataset, go to uncompress_data.py and set the data directories to your local directories. Then, un-compress the sequence data.

python scripts/uncompress_data.py

3. Run Instance-aware Semantic Mapping

a. Run with Rviz.

In launch/semantic_mapping.launch, set the directories to your local dataset directories. Then, launch Rviz and the ROS node,

roslaunch sgloop_ros visualize.launch
roslaunch sgloop_ros semantic_mapping.launch

It should incremental reconstruct the semantic map and render the results on Rviz. At the end of the sequence, the program save the output results, where the output format is illustrated in the data format. To visualize the results that are previously reconstructed, open launch/render_semantic_map.launch and set the result_folder directory accordingly. Then,

roslaunch sgloop_ros render_semantic_map.launch

Tips: If you are running the program on a remote server, you can utilize the ROS across machine function. After set the rosmaster following the ROS tutorial, you can launch visualize.launch at your local machine and semantic_mapping.launch at the server. So, you can still visualize the result on your local machine.

b. Run without visualization.

If you do not need the ROS node to visualize, you can skip its install in the above instruction. Then, simply run the C++ executable program and the results will be saved at ${SGSLAM_DATAROOT}/output. The output directory can be set before run the program.

./build/src/IntegrateInstanceMap --config config/realsense.yaml --root ${SGSLAM_DATAROOT}/scans/ab0201_03a --output ${SGSLAM_DATAROOT}/output

4. Use it in Your RGB-D Camera

In SgSlam dataset, we use Intel Realsense-D515 camera and DJI A3 flight controller to collect data sequence Details of the hardware suite can be found in this paper. You can also collect your own dataset using a similar hardware suite.

a. Prepare RGB-D and Camera poses.

We use VINS-Mono to compute visual-inertial odometry (VIO). We save the camera poses of its keyframes in a pose folder.

b. Run RAM, GroundingDINO and SAM.

The three models are combined to run in Grounded-SAM. Please find our adopted Grounded-SAM here. It should generate a prediction folder as explained in data format. Then, you can run the semantic mapping on your dataset.

5. Acknowledge

The hardware used in SgSlam is supported by Luqi Wang. The lidar-camera hardware used in SLABIM is supported by Skyland Innovation. In our program, we use and adopt Open3D to reconstruct instance sub-volume. The vision foundation models RAM, GroundingDINO, and SAM provide instance segmentation on images.

6. License

The source code is released under GPLv3 license. For technical issues, please contact Chuhao LIU ([email protected]).

About

[RA-L] FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published