InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes

Project Page | Paper | Arxiv | Video

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes,
Zesong Yang, Bangbang Yang, Wenqi Dong, Chenxuan Cao, Liyuan Cui, Yuewen Ma, Zhaopeng Cui, Hujun Bao
ICCV 2025

teaser.mp4

Installation

Installation of Scene Decomposition.

conda create -n instascene python=3.9 -y
conda activate instascene 

pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

pip install --extra-index-url=https://pypi.nvidia.com "cudf-cu11==24.2.*" "cuml-cu11==24.2.*"

pip install -r requirements.txt

Install CropFormer for instance-level segmentation.

cd semantic_modules/CropFormer
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../
git clone [email protected]:facebookresearch/detectron2.git
cd detectron2
pip install -e .
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git
cd ..
pip install -r requirements.txt
pip install -U openmim
mim install mmcv
mkdir ckpts

Manually download CropFormer checkpoint into semantic_modules/CropFormer/ckpts

Installation of in-situ generation.

Data Preprocessing

Please follow the steps below to process your custom dataset, or directly download our preprocessed datasets.

1. Run instance-level segmentation.

It's ok to use other 2D segmentation models, but make sure the input masks don't exhibit overly complex hierarchy relationships; otherwise, our method will default to the finest level.

cd semantic_modules/CropFormer
bash run_segmentation.sh "$DATA_DIR"
cd ../..

2. Training 2DGS.

Follow the original repository to train the 2dgs model.

python train.py -s data/3dovs/bed -m output/3dovs/bed/train_2dgs

Optional mono normal prior (StableNormal) is available to enhance the reconstruction quality.

## Prepare Normal Priors
cd semantic_modules
git clone https://github.com/Stable-X/StableNormal && cd StableNormal
pip install -r requirements.txt
mv ../inference_stablenormal.py ./
python inference_stablenormal.py "$DATA_DIR"
cd ../..

## Training 2DGS with Normal Priors 
python train.py -s data/3dovs/bed --w_normal_prior stablenormal_normals -m output/3dovs/bed/train_2dgs

Put the trained point_cloud.ply file into the $DATA_DIR directory. After successfully executing the above steps, the data directory should be structured as follows:

data
   |——————3D_OVS
   |   |——————bed
   |      |——————point_cloud.ply
   |      |——————images
   |         |——————00.jpg
   |         ...
   |      |——————sam
   |         |——————mask
   |            |——————00.png
   |            ...
   |      |——————sparse
   |         |——————0
   |            |——————cameras.bin
   |            ...
   |      |——————(optional) stablenormal_normals
   |         |——————00.png
   |         ...
   |     ...

Training with Spatial Contrastive Learning

Note that for simple scenes, such as 3D-OVS (simple-object centered without overlap), no need to use spatial relationships to obtain robust semantic priors as shown in our supplementary material. Single-view constrastive learning is sufficient to achieve strong performance.

We train the model on a NVIDIA Tesla A100 GPU (40GB) with 10,000 iterations for about 20 minutes & less than 8GB GPU.

Reduce the GPU & Speed the time with --sample_batchsize 8 * 1024 or -r 2.
Use --gram_feat_3d for a more robust feature field in complex scenes.
It's normal to get stuck at the DBScan Filter Stage, since the backgrount gaussian points may be divided into multi-regions.
Use --consider_negative_labels to suppress floaters during background segmentation.

python train_semantic.py -s data/lerf/waldo_kitchen \
                         -m train_semanticgs \
                         --use_seg_feature --iterations 10000 \
                         --load_filter_segmap --consider_negative_labels

After completing the training, we provide a GUI modified from Omniseg3D for real-time ineractive segmentation. The point_cloud.ply in our preprocessed datasets already has pretrained semantic features.

python semantic_gui.py \
  --ply_path data/lerf/waldo_kitchen/point_cloud.ply \
  --interactive_note lerf_waldo_kitchen \
  --use_colmap_camera \
  --source_path data/lerf/waldo_kitchen --resolution 1

Left Mouse for changing rendering view
Click Mode + 0.9 Threshold + Right Mouse for segmentation
Clear Edit for clear the segmentation cache
Delete 3D for remove the chosen gaussians
Segment 3D for only keep the chosen gaussians
Reload Data for reload the gaussian model

Screencast.2025-07-24.13_31_27.mp4

Feishu20250723-192829.mp4

ToDos

🔥 Feel free to raise any requests, including support for additional datasets or broader applications of segmentation~

Release project page and paper.
Release scene decomposition code.
Release in-situ generation code.

Acknowledgements

Some codes are modified from Omniseg3D, MaskClustering, 2DGS++, thanks for the authors for their valuable works.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{yang2025instascene,
    title={InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes},
    author={Yang, Zesong and Yang, Bangbang and Dong, Wenqi and Cao, Chenxuan and Cui, Liyuan and Ma, Yuewen and Cui, Zhaopeng and Bao, Hujun},
    booktitle=ICCV,
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
arguments		arguments
assets		assets
gaussian_renderer		gaussian_renderer
raytracing		raytracing
scene		scene
semantic_modules		semantic_modules
spatial_track		spatial_track
submodules		submodules
utils		utils
vis_utils		vis_utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
semantic_gui.py		semantic_gui.py
train.py		train.py
train_semantic.py		train_semantic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes

Project Page | Paper | Arxiv | Video

Installation

Data Preprocessing

1. Run instance-level segmentation.

2. Training 2DGS.

Training with Spatial Contrastive Learning

ToDos

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

zju3dv/InstaScene

Folders and files

Latest commit

History

Repository files navigation

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes

Project Page | Paper | Arxiv | Video

Installation

Data Preprocessing

1. Run instance-level segmentation.

2. Training 2DGS.

Training with Spatial Contrastive Learning

ToDos

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages