Chuhao Liu1, Ke Wang2,*, Jieqi Shi1, Zhijian Qiao1, and Shaojie Shen1
1HKUST Aerial Robotics Group
2Chang'an University, China
*Corresponding Author
- [27th Nov 2024] Support semantic reconstruction using SLABIM dataset.
- [25th Oct 2024] Publish code and RGB-D sequences from ScanNet and SgSlam.
- [5th Jan 2024] Paper accepted by IEEE RA-L.
FM-Fusion utilizes RAM, GroundingDINO and SAM to reconstruct an instance-aware semantic map. Boosted by the vision foundational models, FM-Fusion can reconstruct semantic instances in real-world cluttered indoor environments.
The following instruction explains how to run FM-Fusion on our RGB-D sequences or ScanNet sequences. If you find its useful, please cite our paper.
@article{
title={{FM-Fusion}: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models},
author={Liu, Chuhao and Wang, Ke and Shi, Jieqi and Qiao, Zhijian and Shen, Shaojie},
journal={IEEE Robotics and Automation Letters(RA-L)},
year={2024},
volume={9},
number={3},
pages={2232-2239}
}
- Install
- Download Data
- Run Instance-aware Semantic Mapping
- Use it in Your RGB-D Camera
- Acknowledge
- License
Install dependency packages from Ubuntu Source
sudo apt-get install libboost-dev libomp-dev libeigen3-dev
Install Open3D from its source code.(Install Tutorial)
git clone https://github.com/isl-org/Open3D
cd Open3D
mkdir build && cd build
cmake -DBUILD_SHARED_LIBS=ON ..
make -j12
sudo make install
Follow the official tutorials to install OpenCV, GLOG, jsoncpp.
To make it compatible with ROS, please install OpenCV 3.4.xx
.
Clone and compile FM-Fusion,
git clone [email protected]:HKUST-Aerial-Robotics/FM-Fusion.git
mkdir build && cd build
cmake .. -DINSTALL_FMFUSION=ON
make -j12
make install
Install the ROS node program, which renders the semantic instance map in Rviz. Install ROS platform following its official guidance. Then, build the ros node we provide,
git submodule update --init --recursive
cd catkin_ws && catkin_make
source devel/setup.bash
We provide two datasets to evaluate: SgSlam (captured using Intel Realsense L-515) and ScanNet. Their sequences can be downloaded:
Please check data format for the illustration about data in each sequence.
After download the scans
folder in each dataset, go to uncompress_data.py and set the data directories to your local directories. Then, un-compress the sequence data.
python scripts/uncompress_data.py
In launch/semantic_mapping.launch
, set the directories to your local dataset directories. Then, launch Rviz and the ROS node,
roslaunch sgloop_ros visualize.launch
roslaunch sgloop_ros semantic_mapping.launch
It should incremental reconstruct the semantic map and render the results on Rviz. At the end of the sequence, the program save the output results, where the output format is illustrated in the data format. To visualize the results that are previously reconstructed, open launch/render_semantic_map.launch
and set the result_folder
directory accordingly. Then,
roslaunch sgloop_ros render_semantic_map.launch
Tips: If you are running the program on a remote server, you can utilize the ROS across machine function. After set the rosmaster
following the ROS tutorial, you can launch visualize.launch
at your local machine and semantic_mapping.launch
at the server. So, you can still visualize the result on your local machine.
If you do not need the ROS node to visualize, you can skip its install in the above instruction. Then, simply run the C++ executable program and the results will be saved at ${SGSLAM_DATAROOT}/output
. The output directory can be set before run the program.
./build/src/IntegrateInstanceMap --config config/realsense.yaml --root ${SGSLAM_DATAROOT}/scans/ab0201_03a --output ${SGSLAM_DATAROOT}/output
In SgSlam
dataset, we use Intel Realsense-D515 camera and DJI A3 flight controller to collect data sequence Details of the hardware suite can be found in this paper. You can also collect your own dataset using a similar hardware suite.
We use VINS-Mono to compute visual-inertial odometry (VIO). We save the camera poses of its keyframes in a pose
folder.
b. Run RAM, GroundingDINO and SAM.
The three models are combined to run in Grounded-SAM. Please find our adopted Grounded-SAM here. It should generate a prediction
folder as explained in data format. Then, you can run the semantic mapping on your dataset.
The hardware used in SgSlam is supported by Luqi Wang. The lidar-camera hardware used in SLABIM is supported by Skyland Innovation. In our program, we use and adopt Open3D to reconstruct instance sub-volume. The vision foundation models RAM, GroundingDINO, and SAM provide instance segmentation on images.
The source code is released under GPLv3 license. For technical issues, please contact Chuhao LIU ([email protected]).