Skip to content
/ Free4D Public

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

License

Notifications You must be signed in to change notification settings

TQTQliu/Free4D

Repository files navigation

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

1Huazhong University of Science and Technology  2Nanyang Technological University  3Great Bay University
*Equal Contribution  Corresponding Authors

ArXiv Visitors

TL;DR: Free4D is a tuning-free framework for 4D scene generation.

Click to expand Free4D introduction

teaser

Click to expand 4D interactive viewer demo

teaser

🌟 Abstract

We present Free4D, a novel tuning-free framework for 4D scene generation from a single image. Existing methods either focus on object-level generation, making scene-level generation infeasible, or rely on large-scale multi-view video datasets for expensive training, with limited generalization ability due to the scarcity of 4D scene data. In contrast, our key insight is to distill pre-trained foundation models for consistent 4D scene representation, which offers promising advantages such as efficiency and generalizability. 1) To achieve this, we first animate the input image using image-to-video diffusion models followed by 4D geometric structure initialization. 2) To turn this coarse structure into spatial-temporal consistent multiview videos, we design an adaptive guidance mechanism with a point-guided denoising strategy for spatial consistency and a novel latent replacement strategy for temporal coherence. 3) To lift these generated observations into consistent 4D representation, we propose a modulation-based refinement to mitigate inconsistencies while fully leveraging the generated information. The resulting 4D representation enables real-time, controllable spatial-temporal rendering, marking a significant advancement in single-image-based 4D scene generation.

🛠️ Installation

Clone Free4D

git clone https://github.com/TQTQliu/Free4D.git
cd Free4D

Setup environments

# Create conda environment
conda create -n free4d python=3.11
conda activate free4d
pip install -r requirements.txt
pip install torch==2.4.1 torchvision==0.19.1

# Softmax-splatting
pip install git+https://github.com/Free4D/splatting

# PyTorch3D
conda install https://anaconda.org/pytorch3d/pytorch3d/0.7.8/download/linux-64/pytorch3d-0.7.8-py311_cu121_pyt241.tar.bz2

# Gaussian Splatting renderer
pip install -e lib/submodules/depth-diff-gaussian-rasterization
pip install -e lib/submodules/simple-knn

# Install colmap on headless server
conda install conda-forge::colmap

Download pretrained models

sh scripts/download_ckpt.sh

🚀 Usage

For a single image or text input, we first use an off-the-shelf video generation model (such as KLing, Wan, CogVideo, etc.) to obtain a single-view video. Then, run the following commands for 4D generation. We have provided some pre-generated video frames in data/vc. Below, we take the scene fox as an example.

# Multi-View Video Generation
sh scripts/run_mst.sh fox

# organize data
python lib/utils/organize_mst.py -i output/vc/fox/fox -o data/gs/fox

# colmap
sh ./scripts/colmap.sh fox

# 4DGS training
python train_mst.py -s data/gs/fox --expname fox

# 4DGS rendering.
python render.py --model_path output/gs/fox

The rendered multi-view video will be saved in output/gs/fox/test. For camera trajectory setup, please refer here.

Here is an alternative solution that does not use MonST3R, but instead employs DUSt3R and optical flow.

# Multi-View Generation for the first frame
sh scripts/run_dst.sh fox

# organize data
python lib/utils/organize_dst.py -vd data/vc/fox -mv output/vc/fox_dst/0000 -o data/gs/fox_dst

# colmap
sh ./scripts/colmap.sh fox_dst

# 4DGS training and Flow-guided Multi-View Video Generation
python train_dst.py -s data/gs/fox_dst --expname fox_dst

# 4DGS rendering
python render.py --model_path output/gs/fox_dst

📚 Citation

If you find our work useful for your research, please consider citing our paper:

 @article{liu2025free4d,
     title={Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency},
     author={Liu, Tianqi and Huang, Zihao and Chen, Zhaoxi and Wang, Guangcong and Hu, Shoukang and Shen, liao and Sun, Huiqiang and Cao, Zhiguo and Li, Wei and Liu, Ziwei},
     journal={arXiv preprint arXiv:2503.20785},
     year={2025}
 }

♥️ Acknowledgement

This work is built on many amazing open-source projects shared by 4DGaussians, ViewCrafter, MonST3R, DUSt3R, and VistaDream. Thanks all the authors for their excellent contributions!

📧 Contact

If you have any questions, please feel free to contact Tianqi Liu (tq_liu at hust.edu.cn).