Skip to content

UVA-Computer-Vision-Lab/ovmono3d

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Vocabulary Monocular 3D Object Detection

Jin Yao, Hao Gu, Xuweiyi Chen, Jiayun Wang, Zezhou Cheng

Website Paper

Zero-shot predictions on COCO COCO demo

Installation

Our used cuda version is 12.1.1. Run

conda create -n ovmono3d python=3.8.20
conda activate ovmono3d

pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121

to create the environment and install pytorch.

Run

sh setup.sh

to install additional dependencies and download model checkpoints of OVMono3D-LIFT and other foundation models.

Demo

Run

python demo/demo.py --config-file configs/OVMono3D_dinov2_SFP.yaml \
	--input-folder datasets/coco_examples \
	--labels-file datasets/coco_examples/labels.json \
	--threshold 0.45 \
	MODEL.ROI_HEADS.NAME ROIHeads3DGDINO \
	MODEL.WEIGHTS checkpoints/ovmono3d_lift.pth \
	OUTPUT_DIR output/coco_examples 

to get the results for the example COCO images.

You can also try your own images and prompted category labels. See the format of the label file in labels.json. If you know the camera intrinsics you could input them as arguments with the convention --focal-length <float> and --principal-point <float> <float>. Check demo.py for more details.

Data

Please follow the instructions in Omni3D to set up the datasets.
Run

sh ./download_data.sh

to download our pre-processed OVMono3D 2D predictions (12 GB after unzipping).

Evaluation

To run inference and evaluation of OVMono3D-LIFT, use the following command:

python tools/train_net.py --eval-only  --config-file configs/OVMono3D_dinov2_SFP.yaml --num-gpus 2 \
    OUTPUT_DIR  output/ovmono3d_lift  \
    MODEL.WEIGHTS checkpoints/ovmono3d_lift.pth \
    TEST.CAT_MODE "novel" \
    DATASETS.ORACLE2D_FILES.EVAL_MODE "target_aware"

TEST.CAT_MODE denotes the category set to be evaluated: novel or base or all

DATASETS.ORACLE2D_FILES.EVAL_MODE denotes the evaluation protocol: target_aware or previous_metric

To run inference and evaluation of OVMono3D-GEO, use the following commands:

python tools/ovmono3d_geo.py
python tools/eval_ovmono3d_geo.py

Training

To run training of OVMono3D-LIFT, use the following command:

python tools/train_net.py --config-file configs/OVMono3D_dinov2_SFP.yaml --num-gpus 8 \
    OUTPUT_DIR  output/ovmono3d_lift \
    VIS_PERIOD 500 TEST.EVAL_PERIOD 2000 \
    MODEL.STABILIZE  0.03 \
    SOLVER.BASE_LR 0.012 \
    SOLVER.CHECKPOINT_PERIOD 1000 \
    SOLVER.IMS_PER_BATCH 64 

The training hyperparameters above are used in our experiments. While these parameters can be customized to suit your specific requirements, please note that performance may vary across different configurations.

Citing

If you find this work useful for your research, please kindly cite:

@article{yao2024open,
  title={Open Vocabulary Monocular 3D Object Detection},
  author={Yao, Jin and Gu, Hao and Chen, Xuweiyi and Wang, Jiayun and Cheng, Zezhou},
  journal={arXiv preprint arXiv:2411.16833},
  year={2024}
}

Please also consider cite the awesome work of Omni3D and datasets used in Omni3D.

BibTex
@inproceedings{brazil2023omni3d,
  author =       {Garrick Brazil and Abhinav Kumar and Julian Straub and Nikhila Ravi and Justin Johnson and Georgia Gkioxari},
  title =        {{Omni3D}: A Large Benchmark and Model for {3D} Object Detection in the Wild},
  booktitle =    {CVPR},
  address =      {Vancouver, Canada},
  month =        {June},
  year =         {2023},
  organization = {IEEE},
}
@inproceedings{Geiger2012CVPR,
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {CVPR},
  year = {2012}
}
@inproceedings{caesar2020nuscenes,
  title={nuscenes: A multimodal dataset for autonomous driving},
  author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar},
  booktitle={CVPR},
  year={2020}
}
@inproceedings{song2015sun,
  title={Sun rgb-d: A rgb-d scene understanding benchmark suite},
  author={Song, Shuran and Lichtenberg, Samuel P and Xiao, Jianxiong},
  booktitle={CVPR},
  year={2015}
}
@inproceedings{dehghan2021arkitscenes,
  title={{ARK}itScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile {RGB}-D Data},
  author={Gilad Baruch and Zhuoyuan Chen and Afshin Dehghan and Tal Dimry and Yuri Feigin and Peter Fu and Thomas Gebauer and Brandon Joffe and Daniel Kurz and Arik Schwartz and Elad Shulman},
  booktitle={NeurIPS Datasets and Benchmarks Track (Round 1)},
  year={2021},
}
@inproceedings{hypersim,
  author    = {Mike Roberts AND Jason Ramapuram AND Anurag Ranjan AND Atulit Kumar AND
                 Miguel Angel Bautista AND Nathan Paczan AND Russ Webb AND Joshua M. Susskind},
  title     = {{Hypersim}: {A} Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding},
  booktitle = {ICCV},
  year      = {2021},
}
@article{objectron2021,
  title={Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations},
  author={Ahmadyan, Adel and Zhang, Liangkai and Ablavatski, Artsiom and Wei, Jianing and Grundmann, Matthias},
  journal={CVPR},
  year={2021},
}