Project Page | Paper | Arxiv | Video
InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes,
Zesong Yang, Bangbang Yang, Wenqi Dong, Chenxuan Cao, Liyuan Cui, Yuewen Ma, Zhaopeng Cui, Hujun Bao
ICCV 2025
teaser.mp4
- Installation of Scene Decomposition.
conda create -n instascene python=3.9 -y
conda activate instascene
pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install --extra-index-url=https://pypi.nvidia.com "cudf-cu11==24.2.*" "cuml-cu11==24.2.*"
pip install -r requirements.txt
Install CropFormer
for instance-level segmentation.
cd semantic_modules/CropFormer
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../
git clone [email protected]:facebookresearch/detectron2.git
cd detectron2
pip install -e .
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git
cd ..
pip install -r requirements.txt
pip install -U openmim
mim install mmcv
mkdir ckpts
Manually
download CropFormer checkpoint
into semantic_modules/CropFormer/ckpts
- Installation of in-situ generation.
Please follow the steps below to process your custom dataset, or directly download our preprocessed datasets.
- It's ok to use other 2D segmentation models, but make sure the input masks don't exhibit overly complex hierarchy relationships; otherwise, our method will default to the finest level.
cd semantic_modules/CropFormer
bash run_segmentation.sh "$DATA_DIR"
cd ../..
Follow the original repository to train the 2dgs model.
python train.py -s data/3dovs/bed -m output/3dovs/bed/train_2dgs
Optional mono normal prior (StableNormal) is available to enhance the reconstruction quality.
## Prepare Normal Priors
cd semantic_modules
git clone https://github.com/Stable-X/StableNormal && cd StableNormal
pip install -r requirements.txt
mv ../inference_stablenormal.py ./
python inference_stablenormal.py "$DATA_DIR"
cd ../..
## Training 2DGS with Normal Priors
python train.py -s data/3dovs/bed --w_normal_prior stablenormal_normals -m output/3dovs/bed/train_2dgs
Put the trained point_cloud.ply
file into the $DATA_DIR
directory. After successfully executing the above steps, the
data directory should be structured as follows:
data
|——————3D_OVS
| |——————bed
| |——————point_cloud.ply
| |——————images
| |——————00.jpg
| ...
| |——————sam
| |——————mask
| |——————00.png
| ...
| |——————sparse
| |——————0
| |——————cameras.bin
| ...
| |——————(optional) stablenormal_normals
| |——————00.png
| ...
| ...
Note that for simple scenes, such as 3D-OVS (simple-object centered without overlap), no need to use spatial relationships to obtain robust semantic priors as shown in our supplementary material. Single-view constrastive learning is sufficient to achieve strong performance.
We train the model on a NVIDIA Tesla A100 GPU (40GB) with 10,000 iterations for about 20 minutes & less than 8GB GPU.
- Reduce the GPU & Speed the time with
--sample_batchsize 8 * 1024
or-r 2
. - Use
--gram_feat_3d
for a more robust feature field in complex scenes. - It's normal to get stuck at the
DBScan Filter Stage
, since the backgrount gaussian points may be divided into multi-regions. - Use
--consider_negative_labels
to suppress floaters during background segmentation.
python train_semantic.py -s data/lerf/waldo_kitchen \
-m train_semanticgs \
--use_seg_feature --iterations 10000 \
--load_filter_segmap --consider_negative_labels
After completing the training, we provide a GUI modified from Omniseg3D for real-time ineractive segmentation.
The point_cloud.ply
in our preprocessed datasets already has pretrained semantic features.
python semantic_gui.py \
--ply_path data/lerf/waldo_kitchen/point_cloud.ply \
--interactive_note lerf_waldo_kitchen \
--use_colmap_camera \
--source_path data/lerf/waldo_kitchen --resolution 1
Left Mouse
for changing rendering viewClick Mode
+ 0.9Threshold
+Right Mouse
for segmentationClear Edit
for clear the segmentation cacheDelete 3D
for remove the chosen gaussiansSegment 3D
for only keep the chosen gaussiansReload Data
for reload the gaussian model
Screencast.2025-07-24.13_31_27.mp4
Feishu20250723-192829.mp4
🔥 Feel free to raise any requests, including support for additional datasets or broader applications of segmentation~
- Release project page and paper.
- Release scene decomposition code.
- Release in-situ generation code.
Some codes are modified from Omniseg3D, MaskClustering, 2DGS++, thanks for the authors for their valuable works.
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{yang2025instascene,
title={InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes},
author={Yang, Zesong and Yang, Bangbang and Dong, Wenqi and Cao, Chenxuan and Cui, Liyuan and Ma, Yuewen and Cui, Zhaopeng and Bao, Hujun},
booktitle=ICCV,
year={2025}
}