Skip to content

Latest commit

 

History

History
116 lines (100 loc) · 5.84 KB

README.md

File metadata and controls

116 lines (100 loc) · 5.84 KB

🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li1 · Tobias Fischer1 · Mattia Segu1 · Marc Pollefeys1
Luc Van Gool1 · Federico Tombari2,3

1ETH Zürich · 2Google · 3Technical University of Munich

CVPR 2024

Paper PDF Project Page Hugging Face

This work presents Know-Your-Neighbors (KYN), a single-view 3D reconstruction method that disambiguates occluded scene geometry by utilizing Vision-Language semantics and spatial reasoning.

teaser

🔗 Environment Setup

# python virtual environment
python -m venv kyn
source kyn/bin/activate
pip install -r requirements.txt

🚀 Quick Start

Download our pre-trianed model and the LSeg model, put them into ./checkpoints. Then run the demo:

python scripts/demo.py --img media/example/0000.png --model_path checkpoints/kyn.pt --save_path /your/save/path

Herein --img specifies the input image path, --model_path is the model checkpoint path, and --save_path stores the resulting depth map, BEV map, as well as 3D voxel grids.

📁 Dataset Setup

We use the KITTI-360 dataset and process it as follows:

  1. Register at https://www.cvlibs.net/datasets/kitti-360/index.php and download perspective images, fisheye images, raw Velodyne scans, calibrations, and vehicle poses. The required KITTI-360 official scripts & data are:
    download_2d_fisheye.zip
    download_2d_perspective.zip
    download_3d_velodyne.zip
    calibration.zip
    data_poses.zip
    
  2. Preprocess with the Python script below. It rectifies the fisheye views, resizes all images, and stores them in separate folders:
    python datasets/kitti_360/preprocess_kitti_360.py --data_path ./KITTI-360 --save_path ./KITTI-360
    
  3. The final folder structure should look like:
    KITTI-360
       ├── calibration
       ├── data_poses
       ├── data_2d_raw
       │   ├── 2013_05_28_drive_0003_sync
       │   │   ├── image_00
       │   │   │    ├── data_192x640
       │   │   │    └── data_rect
       │   │   ├── image_01
       │   │   ├── image_02
       │   │   │    ├── data_192x640_0x-15
       │   │   │    └── data_rgb
       │   │   └── image_03
       │   └── ...
       └── data_3d_raw
               ├── 2013_05_28_drive_0003_sync
               └── ...
    

📊 Evaluation

Quantitative Evaluation

  1. The data directory is set to ./KITTI-360 by default.
  2. Download and unzip the pre-computed GT occupancy maps into ./KITTI-360. You can also compute and store your customized GT occupancy maps by setting read_gt_occ_path: '' and specifying save_gt_occ_map_path in configs/eval_kyn.yaml.
  3. Download and unzip the object labels to ./KITTI-360.
  4. Download our pre-trianed model and the LSeg model, put them into ./checkpoints.
  5. Run the following command for evaluation:
    python eval.py -cn eval_kyn

Voxel Visualization

Run the following command to generate 3D voxel models on the KITTI-360 test set:

python scripts/gen_kitti360_voxel.py -cn gen_voxel

💻 Training

Download the LSeg model and put it into ./checkpoints. Then run:

torchrun --nproc_per_node=<num_of_gpus> train.py -cn train_kyn

where <num_of_gpus> denotes the number of available GPUs. Models will be saved in ./result by defualt.

📰 Citation

Please cite our paper if you use the code in this repository:

@inproceedings{li2024know,
      title={Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning}, 
      author={Li, Rui and Fischer, Tobias and Segu, Mattia and Pollefeys, Marc and Van Gool, Luc and Tombari, Federico},
      booktitle={CVPR},
      year={2024}
}