Our used cuda version is 12.1.1. Run
conda create -n ovmono3d python=3.8.20
conda activate ovmono3d
pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121
to create the environment and install pytorch.
Run
sh setup.sh
to install additional dependencies and download model checkpoints of OVMono3D-LIFT and other foundation models.
Run
python demo/demo.py --config-file configs/OVMono3D_dinov2_SFP.yaml \
--input-folder datasets/coco_examples \
--labels-file datasets/coco_examples/labels.json \
--threshold 0.45 \
MODEL.ROI_HEADS.NAME ROIHeads3DGDINO \
MODEL.WEIGHTS checkpoints/ovmono3d_lift.pth \
OUTPUT_DIR output/coco_examples
to get the results for the example COCO images.
You can also try your own images and prompted category labels. See the format of the label file in labels.json
. If you know the camera intrinsics you could input them as arguments with the convention --focal-length <float>
and --principal-point <float> <float>
. Check demo.py
for more details.
Please follow the instructions in Omni3D to set up the datasets.
Run
sh ./download_data.sh
to download our pre-processed OVMono3D 2D predictions (12 GB after unzipping).
To run inference and evaluation of OVMono3D-LIFT, use the following command:
python tools/train_net.py --eval-only --config-file configs/OVMono3D_dinov2_SFP.yaml --num-gpus 2 \
OUTPUT_DIR output/ovmono3d_lift \
MODEL.WEIGHTS checkpoints/ovmono3d_lift.pth \
TEST.CAT_MODE "novel" \
DATASETS.ORACLE2D_FILES.EVAL_MODE "target_aware"
TEST.CAT_MODE denotes the category set to be evaluated: novel
or base
or all
DATASETS.ORACLE2D_FILES.EVAL_MODE denotes the evaluation protocol: target_aware
or previous_metric
To run inference and evaluation of OVMono3D-GEO, use the following commands:
python tools/ovmono3d_geo.py
python tools/eval_ovmono3d_geo.py
To run training of OVMono3D-LIFT, use the following command:
python tools/train_net.py --config-file configs/OVMono3D_dinov2_SFP.yaml --num-gpus 8 \
OUTPUT_DIR output/ovmono3d_lift \
VIS_PERIOD 500 TEST.EVAL_PERIOD 2000 \
MODEL.STABILIZE 0.03 \
SOLVER.BASE_LR 0.012 \
SOLVER.CHECKPOINT_PERIOD 1000 \
SOLVER.IMS_PER_BATCH 64
The training hyperparameters above are used in our experiments. While these parameters can be customized to suit your specific requirements, please note that performance may vary across different configurations.
If you find this work useful for your research, please kindly cite:
@article{yao2024open,
title={Open Vocabulary Monocular 3D Object Detection},
author={Yao, Jin and Gu, Hao and Chen, Xuweiyi and Wang, Jiayun and Cheng, Zezhou},
journal={arXiv preprint arXiv:2411.16833},
year={2024}
}
Please also consider cite the awesome work of Omni3D and datasets used in Omni3D.
BibTex
@inproceedings{brazil2023omni3d,
author = {Garrick Brazil and Abhinav Kumar and Julian Straub and Nikhila Ravi and Justin Johnson and Georgia Gkioxari},
title = {{Omni3D}: A Large Benchmark and Model for {3D} Object Detection in the Wild},
booktitle = {CVPR},
address = {Vancouver, Canada},
month = {June},
year = {2023},
organization = {IEEE},
}
@inproceedings{Geiger2012CVPR,
author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
booktitle = {CVPR},
year = {2012}
}
@inproceedings{caesar2020nuscenes,
title={nuscenes: A multimodal dataset for autonomous driving},
author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar},
booktitle={CVPR},
year={2020}
}
@inproceedings{song2015sun,
title={Sun rgb-d: A rgb-d scene understanding benchmark suite},
author={Song, Shuran and Lichtenberg, Samuel P and Xiao, Jianxiong},
booktitle={CVPR},
year={2015}
}
@inproceedings{dehghan2021arkitscenes,
title={{ARK}itScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile {RGB}-D Data},
author={Gilad Baruch and Zhuoyuan Chen and Afshin Dehghan and Tal Dimry and Yuri Feigin and Peter Fu and Thomas Gebauer and Brandon Joffe and Daniel Kurz and Arik Schwartz and Elad Shulman},
booktitle={NeurIPS Datasets and Benchmarks Track (Round 1)},
year={2021},
}
@inproceedings{hypersim,
author = {Mike Roberts AND Jason Ramapuram AND Anurag Ranjan AND Atulit Kumar AND
Miguel Angel Bautista AND Nathan Paczan AND Russ Webb AND Joshua M. Susskind},
title = {{Hypersim}: {A} Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding},
booktitle = {ICCV},
year = {2021},
}
@article{objectron2021,
title={Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations},
author={Ahmadyan, Adel and Zhang, Liangkai and Ablavatski, Artsiom and Wei, Jianing and Grundmann, Matthias},
journal={CVPR},
year={2021},
}