Yuedong Chen
·
Chuanxia Zheng
·
Haofei Xu
·
Bohan Zhuang
Andrea Vedaldi
·
Tat-Jen Cham
·
Jianfei Cai
mvsplat360.mp4
To get started, create a conda virtual environment using Python 3.10+ and install the requirements:
conda create -n mvsplat360 python=3.10
conda avtivate mvsplat360
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 xformers==0.0.25.post1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
This project mainly uses DL3DV and RealEstate10K datasets.
The dataset structure aligns with our previous work, MVSplat. You may refer to the script convert_dl3dv.py for converting the DL3DV-10K datasets to the torch chunks used in this project.
You might also want to check out the DepthSplat's DATASETS.md, which provides detailed instructions on pre-processing DL3DV and RealEstate10K for use here (as both projects share the same code base from pixelSplat).
A pre-processed tiny subset of DL3DV (containing 5 scenes) is provided here for quick reference. To use it, simply download it and unzip it to datasets/dl3dv_tiny
.
To render novel views,
-
get the pretrained models dl3dv_480p.ckpt, and save them to
/checkpoints
-
run the following:
# dl3dv; requires at least 22G VRAM
python -m src.main +experiment=dl3dv_mvsplat360 \
wandb.name=dl3dv_480P_ctx5_tgt56 \
mode=test \
dataset/view_sampler=evaluation \
dataset.roots=[datasets/dl3dv_tiny] \
checkpointing.load=outputs/dl3dv_480p.ckpt
- the rendered novel views will be stored under
outputs/test/{wandb.name}
To evaluate the quantitative performance, kindly refer to compute_dl3dv_metrics.py
To render videos from a pretrained model, run the following
# dl3dv; requires at least 38G VRAM
python -m src.main +experiment=dl3dv_mvsplat360_video \
wandb.name=dl3dv_480P_ctx5_tgt56_video \
mode=test \
dataset/view_sampler=evaluation \
dataset.roots=[datasets/dl3dv_tiny] \
checkpointing.load=outputs/dl3dv_480p.ckpt
- Download the encoder pretrained weight from MVSplat and save it to
checkpoints/re10k.ckpt
. - Download SVD pretrained weight from generative-models and save it to
checkpoints/svd.safetensors
- Run the following:
# train mvsplat360; requires at least 80G VRAM
python -m src.main +experiment=dl3dv_mvsplat360 dataset.roots=[datasets/dl3dv]
- To fine tune from our released model, append
checkpointing.load=outputs/dl3dv_480p.ckpt
andcheckpointing.resume=false
to the above command. - You can also set up your wandb account here for logging. Have fun.
@article{chen2024mvsplat360,
title = {MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views},
author = {Chen, Yuedong and Zheng, Chuanxia and Xu, Haofei and Zhuang, Bohan and Vedaldi, Andrea and Cham, Tat-Jen and Cai, Jianfei},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2024},
}
The project is based on MVSplat, pixelSplat, UniMatch and generative-models. Many thanks to these projects for their excellent contributions!