Skip to content

🏯 [NeurIPS'24] MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

License

Notifications You must be signed in to change notification settings

donydchen/mvsplat360

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MVSplat360: Feed-Forward 360 Scene Synthesis
from Sparse Views

Yuedong Chen  ·  Chuanxia Zheng  ·  Haofei Xu  ·  Bohan Zhuang
Andrea Vedaldi  ·  Tat-Jen Cham  ·  Jianfei Cai

NeurIPS 2024


mvsplat360.mp4

Installation

To get started, create a conda virtual environment using Python 3.10+ and install the requirements:

conda create -n mvsplat360 python=3.10
conda avtivate mvsplat360
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 xformers==0.0.25.post1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Acquiring Datasets

This project mainly uses DL3DV and RealEstate10K datasets.

The dataset structure aligns with our previous work, MVSplat. You may refer to the script convert_dl3dv.py for converting the DL3DV-10K datasets to the torch chunks used in this project.

You might also want to check out the DepthSplat's DATASETS.md, which provides detailed instructions on pre-processing DL3DV and RealEstate10K for use here (as both projects share the same code base from pixelSplat).

A pre-processed tiny subset of DL3DV (containing 5 scenes) is provided here for quick reference. To use it, simply download it and unzip it to datasets/dl3dv_tiny.

Running the Code

Evaluation

To render novel views,

  • get the pretrained models dl3dv_480p.ckpt, and save them to /checkpoints

  • run the following:

# dl3dv; requires at least 22G VRAM
python -m src.main +experiment=dl3dv_mvsplat360 \
wandb.name=dl3dv_480P_ctx5_tgt56 \
mode=test \
dataset/view_sampler=evaluation \
dataset.roots=[datasets/dl3dv_tiny] \
checkpointing.load=outputs/dl3dv_480p.ckpt
  • the rendered novel views will be stored under outputs/test/{wandb.name}

To evaluate the quantitative performance, kindly refer to compute_dl3dv_metrics.py

To render videos from a pretrained model, run the following

# dl3dv; requires at least 38G VRAM
python -m src.main +experiment=dl3dv_mvsplat360_video \
wandb.name=dl3dv_480P_ctx5_tgt56_video \
mode=test \
dataset/view_sampler=evaluation \
dataset.roots=[datasets/dl3dv_tiny] \
checkpointing.load=outputs/dl3dv_480p.ckpt 

Training

  • Download the encoder pretrained weight from MVSplat and save it to checkpoints/re10k.ckpt.
  • Download SVD pretrained weight from generative-models and save it to checkpoints/svd.safetensors
  • Run the following:
# train mvsplat360; requires at least 80G VRAM
python -m src.main +experiment=dl3dv_mvsplat360 dataset.roots=[datasets/dl3dv]
  • To fine tune from our released model, append checkpointing.load=outputs/dl3dv_480p.ckpt and checkpointing.resume=false to the above command.
  • You can also set up your wandb account here for logging. Have fun.

BibTeX

@article{chen2024mvsplat360,
    title     = {MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views},
    author    = {Chen, Yuedong and Zheng, Chuanxia and Xu, Haofei and Zhuang, Bohan and Vedaldi, Andrea and Cham, Tat-Jen and Cai, Jianfei},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
    year      = {2024},
}

Acknowledgements

The project is based on MVSplat, pixelSplat, UniMatch and generative-models. Many thanks to these projects for their excellent contributions!