- [Apr 14, 2024] Pretrained autoencoders and LiDMs for different tasks are released!
- [Apr 5, 2024] Our codebase and a detailed study of our autoencoder design along with the pretrained models is released!
We provide an available conda environment named lidar_diffusion
:
sh init/create_env.sh
conda activate lidar_diffusion
Overview of evaluation metrics:
Perceptual Metrics (generation & reconstruction) |
Statistical Metrics (generation only) |
Distance metrics (reconstruction only) |
||||
---|---|---|---|---|---|---|
FRID | FSVD | FPVD | JSD | MMD | CD | EMD |
To standardize the evaluation of LiDAR generative models, we provide a self-contained and mostly CUDA-accelerated evaluation toolbox in the directory ./lidm/eval/
. It implements and integrates various evaluation metrics, including:
- Perceptual metrics:
- Fréchet Range Image Distance (FRID)
- Fréchet Sparse Volume Distance (FSVD)
- Fréchet Point-based Volume Distance (FPVD)
- Statistical metrics:
- Minimum Matching Distance (MMD)
- Jensen-Shannon Divergence (JSD)
- Statistical pairwise metrics (for reconstruction only):
- Chamfer Distance (CD)
- Earth Mover's Distance (EMD)
For more details about setup and usage, please refer to the Evaluation Toolbox README.
To test different tasks below, please download the pretrained LiDM and its corresponding autoencoder:
Encoder | rFRID(↓) | rFSVD(↓) | rFPVD(↓) | CD(↓) | EMD(↓) | Checkpoint | Rec. Results val (Point Cloud) |
Comment |
---|---|---|---|---|---|---|---|---|
f_c2_p4 | 2.15 | 20.2 | 16.2 | 0.160 | 0.203 | [Google Drive] (205MB) |
[Video] | |
f_c2_p4* | 2.06 | 20.3 | 15.7 | 0.092 | 0.176 | [Google Drive] (205MB) |
[Video] | *: w/o logarithm scaling |
Method | Encoder | FRID(↓) | FSVD(↓) | FPVD(↓) | JSD(↓) | MMD (10^-4,↓) |
Checkpoint | Output LiDAR Point Clouds |
---|---|---|---|---|---|---|---|---|
LiDAR-GAN | 1222 | 183.4 | 168.1 | 0.272 | 4.74 | - | [2k samples] | |
LiDAR-VAE | 199.1 | 129.9 | 105.8 | 0.237 | 7.07 | - | [2k samples] | |
ProjectedGAN | 149.7 | 44.7 | 33.4 | 0.188 | 2.88 | - | [2k samples] | |
UltraLiDAR§ | 370.0 | 72.1 | 66.6 | 0.747 | 17.12 | - | [2k samples] | |
LiDARGen (1160s)† | 129.0 | 39.2 | 33.4 | 0.188 | 2.88 | - | [2k samples] | |
LiDARGen (50s)† | 2051 | 480.6 | 400.7 | 0.506 | 9.91 | - | [2k samples] | |
LiDM (50s) | f_c2_p4 | 135.8 | 37.9 | 28.7 | 0.211 | 3.87 | [Google Drive] (3.9GB) |
[2k samples] |
LiDM (50s) | f_c2_p4* | 125.1 | 38.8 | 29.0 | 0.211 | 3.84 | [Google Drive] (3.9GB) |
[2k samples] |
NOTE:
- Each method is evaluated with 2,000 randomly generated samples.
- †: samples generated by the officially released pretrained model in LiDARGen github repo.
- §: samples borrowed from UltraLiDAR implementation.
- All above results are calculated from our evaluation toolbox. For more details, please refer to Evaluation Toolbox README.
- Each .pcd file is a list of point clouds stored by
joblib
package. To load those files, use commandjoblib.load(path)
.
To evaluate above methods (except LiDM) yourself, download our provided .pcd files in the Output column to directory ./models/baseline/kitti/[method]/
:
CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -f models/baseline/kitti/[method]/samples.pcd --baseline --eval
To evaluate LiDM through the given .pcd files:
CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -f models/lidm/kitti/[method]/samples.pcd --eval
Task | Encoder | Dataset | FRID(↓) | FSVD(↓) | Checkpoint | Output |
---|---|---|---|---|---|---|
Semantic Map to LiDAR | f_c2_p4* | SemanticKITTI | 11.8 | 19.1 | [Google Drive] (3.9GB) |
[log.tar.gz] (2.1GB) |
Camera to LiDAR | f_c2_p4* | KITTI-360 | 38.9 | 32.1 | [Google Drive] (7.5GB) |
[log.tar.gz] (5.4GB) |
Text to LiDAR | f_c2_p4* | zero-shot | - | - | From Camera-to-LiDAR | - |
NOTE:
- The output
log.tar.gz
contains input conditions (.png
), generated range images (.png
), generated point clouds (.txt
), and a collection of all output point clouds (.pcd
).
For full details of our studies on the design of LiDAR Compression, please refer to LiDAR Compression Design README.
Tip: Download the video instead of watching it with the Google Drive's built-in video player provides a better visualization.
Curvewise Factor |
Patchwise Factor |
Output Size |
rFRID(↓) | rFSVD(↓) | #Params (M) | Visualization of Reconstruction (val) |
---|---|---|---|---|---|---|
N/A | N/A | Ground Truth | - | - | - | [Range Image], [Point Cloud] |
4 | 1 | 64x256x2 | 0.2 | 12.9 | 9.52 | [Range Image], [Point Cloud] |
8 | 1 | 64x128x3 | 0.9 | 21.2 | 10.76 | [Range Image], [Point Cloud] |
16 | 1 | 64x64x4 | 2.8 | 31.1 | 12.43 | [Range Image], [Point Cloud] |
32 | 1 | 64x32x8 | 16.4 | 49.0 | 13.72 | [Range Image], [Point Cloud] |
1 | 2 | 32x512x2 | 1.5 | 25.0 | 2.87 | [Range Image], [Point Cloud] |
1 | 4 | 16x256x4 | 0.6 | 15.4 | 12.45 | [Range Image], [Point Cloud] |
1 | 8 | 8x128x16 | 17.7 | 35.7 | 15.78 | [Range Image], [Point Cloud] |
1 | 16 | 4x64x64 | 37.1 | 68.7 | 16.25 | [Range Image], [Point Cloud] |
2 | 2 | 32x256x3 | 0.4 | 11.2 | 13.09 | [Range Image], [Point Cloud] |
4 | 2 | 32x128x4 | 3.9 | 19.6 | 14.35 | [Range Image], [Point Cloud] |
8 | 2 | 32x64x8 | 8.0 | 25.3 | 16.06 | [Range Image], [Point Cloud] |
16 | 2 | 32x32x16 | 21.5 | 54.2 | 17.44 | [Range Image], [Point Cloud] |
2 | 4 | 16x128x8 | 2.5 | 16.9 | 15.07 | [Range Image], [Point Cloud] |
4 | 4 | 16x128x16 | 13.8 | 29.5 | 16.86 | [Range Image], [Point Cloud] |
To run sampling on pretrained models (and to evaluate your results with flag "--eval"), firstly download our provided pretrained autoencoders to directory ./models/first_stage_models/kitti/[model_name]
and pretrained LiDMs to directory ./models/lidm/kitti/[model_name]
:
CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -r models/lidm/kitti/[model_name]/model.ckpt -n 2000 --eval
To check the conditional results on a full sequence of semantic maps (sequence '08'), please refer to this video
Before run this task, set up the SemanticKITTI dataset first for semantic labels as input.
To run sampling on pretrained models (and to evaluate your results with flag "--eval"):
CUDA_VISIBLE_DEVICES=0 python scripts/sample_cond.py -r models/lidm/kitti/sem2lidar/model.ckpt -d kitti [--eval]
Before run this task, set up the KITTI-360 dataset first for camera images as input.
To run sampling on pretrained models:
CUDA_VISIBLE_DEVICES=0 python scripts/sample_cond.py -r models/lidm/kitti/sem2lidar/model.ckpt -d kitti [--eval]
To run sampling on pretrained models:
CUDA_VISIBLE_DEVICES=0 python scripts/text2lidar.py -r models/lidm/kitti/cam2lidar/model.ckpt -d kitti -p "an empty road with no object"
Besides, to train your own LiDAR Diffusion Models, just run this command (for example, train both autoencoder and lidm on four gpus):
# train an autoencoder
python main.py -b configs/autoencoder/kitti/autoencoder_c2_p4.yaml -t --gpus 0,1,2,3
# train an LiDM
python main.py -b configs/lidar_diffusion/kitti/uncond_c2_p4.yaml -t --gpus 0,1,2,3
To debug the training process, just add flag -d
:
python main.py -b path/to/your/config.yaml -t --gpus 0, -d
To resume your training from an existing log directory or an existing checkpoint file, use the flag -r
:
# using a log directory
python main.py -b path/to/your/config.yaml -t --gpus 0, -r path/to/your/log
# or, using a checkpoint
python main.py -b path/to/your/config.yaml -t --gpus 0, -r path/to/your/ckpt/file
- Our codebase for the diffusion models builds heavily on Latent Diffusion
If you find this project useful in your research, please consider citing:
@inproceedings{ran2024towards,
title={Towards Realistic Scene Generation with LiDAR Diffusion Models},
author={Ran, Haoxi and Guizilini, Vitor and Wang, Yue},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}