Rui Li1 · Tobias Fischer1 · Mattia Segu1 · Marc Pollefeys1
Luc Van Gool1 · Federico Tombari2,3
1ETH Zürich · 2Google · 3Technical University of Munich
CVPR 2024
This work presents Know-Your-Neighbors (KYN), a single-view 3D reconstruction method that disambiguates occluded scene geometry by utilizing Vision-Language semantics and spatial reasoning.
# python virtual environment
python -m venv kyn
source kyn/bin/activate
pip install -r requirements.txt
Download our pre-trianed model and the LSeg model, put them into ./checkpoints
. Then run the demo:
python scripts/demo.py --img media/example/0000.png --model_path checkpoints/kyn.pt --save_path /your/save/path
Herein --img
specifies the input image path, --model_path
is the model checkpoint path, and --save_path
stores the resulting depth map, BEV map, as well as 3D voxel grids.
We use the KITTI-360 dataset and process it as follows:
- Register at https://www.cvlibs.net/datasets/kitti-360/index.php and download perspective images, fisheye images, raw Velodyne scans, calibrations, and vehicle poses. The required KITTI-360 official scripts & data are:
download_2d_fisheye.zip download_2d_perspective.zip download_3d_velodyne.zip calibration.zip data_poses.zip
- Preprocess with the Python script below. It rectifies the fisheye views, resizes all images, and stores them in separate folders:
python datasets/kitti_360/preprocess_kitti_360.py --data_path ./KITTI-360 --save_path ./KITTI-360
- The final folder structure should look like:
KITTI-360 ├── calibration ├── data_poses ├── data_2d_raw │ ├── 2013_05_28_drive_0003_sync │ │ ├── image_00 │ │ │ ├── data_192x640 │ │ │ └── data_rect │ │ ├── image_01 │ │ ├── image_02 │ │ │ ├── data_192x640_0x-15 │ │ │ └── data_rgb │ │ └── image_03 │ └── ... └── data_3d_raw ├── 2013_05_28_drive_0003_sync └── ...
- The data directory is set to
./KITTI-360
by default. - Download and unzip the pre-computed GT occupancy maps into
./KITTI-360
. You can also compute and store your customized GT occupancy maps by settingread_gt_occ_path: ''
and specifyingsave_gt_occ_map_path
inconfigs/eval_kyn.yaml
. - Download and unzip the object labels to
./KITTI-360
. - Download our pre-trianed model and the LSeg model, put them into
./checkpoints
. - Run the following command for evaluation:
python eval.py -cn eval_kyn
Run the following command to generate 3D voxel models on the KITTI-360 test set:
python scripts/gen_kitti360_voxel.py -cn gen_voxel
Download the LSeg model and put it into ./checkpoints
. Then run:
torchrun --nproc_per_node=<num_of_gpus> train.py -cn train_kyn
where <num_of_gpus>
denotes the number of available GPUs. Models will be saved in ./result
by defualt.
Please cite our paper if you use the code in this repository:
@inproceedings{li2024know,
title={Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning},
author={Li, Rui and Fischer, Tobias and Segu, Mattia and Pollefeys, Marc and Van Gool, Luc and Tombari, Federico},
booktitle={CVPR},
year={2024}
}