Important
🌟 Stay up to date at opendrivelab.com!
This is the official repository for the Detect Anything 3D in the Wild, a promptable 3D detection foundation model capable of detecting any novel object under arbitrary camera configurations using only monocular inputs
- 📌 TODO
- 🚀 Getting Started
- 📦 Checkpoints
- 📁 Dataset Preparation
- 🏋️♂️ Training
- 🔍 Inference
- 🌐 Launch Online Demo
- 📚 Citation
- Release full code
- Provide training and inference scripts
- Release the model weights
- TODO: Provide full conversion scripts for constructing DA3D locally
- TODO: Simplify the inference process
- TODO: Provide a tutorial for creating customized datasets and finetuning
conda create -n detany3d python=3.8
conda activate detany3d
✅ (1) Install Segment Anything (SAM)
Follow the official instructions to install SAM and download its checkpoints.
✅ (2) Install UniDepth
Follow the UniDepth setup guide to compile and install all necessary packages.
✅ (3) Clone and configure GroundingDINO
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
pip install -e .
👉 The exact dependency versions are listed in our
requirements.txt
Please download third-party checkpoints from the following sources:
- SAM checkpoint: Please download
sam_vit_h.pthfrom the official SAM GitHub Releases - UniDepth / DINO checkpoints: Available via Google Drive
detany3d_private/
├── checkpoints/
│ ├── sam_ckpts/
│ │ └── sam_vit_h.pth
│ ├── unidepth_ckpts/
│ │ └── unidepth.pth
│ ├── dino_ckpts/
│ │ └── dino_swin_large.pth
│ └── detany3d_ckpts/
│ └── detany3d.pth
GroundingDINO's checkpoint should be downloaded from its official repo and placed as instructed in their documentation.
The data/ directory should follow the structure below:
data/
├── DA3D_pkls/ # DA3D processed pickle files
├── kitti/
│ ├── test_depth_front/
│ ├── ImageSets/
│ ├── training/
│ └── testing/
├── nuscenes/
| ├── nuscenes_depth/
│ └── samples/
├── 3RScan/
│ └── <token folders>/ # e.g., 10b17940-3938-...
├── hypersim/
| ├── depth_in_meter/
│ └── ai_XXX_YYY/ # e.g., ai_055_009
├── waymo/
│ └── kitti_format/ # KITTI-format data for Waymo
│ ├── validation_depth_front/
│ ├── ImageSets/
│ ├── training/
│ └── testing/
├── objectron/
│ ├── train/
│ └── test/
├── ARKitScenes/
│ ├── Training/
│ └── Validation/
├── cityscapes3d/
│ ├── depth/
│ └── leftImg8bit/
├── SUNRGBD/
│ ├── realsense/
│ ├── xtion/
| ├── kv1/
│ └── kv2/
The download for
kitti,nuscenes,hypersim,objectron,arkitscenes, andsunrgbdfollow the Omni3D convention. Please refer to the Omni3D repository for details on how to organize and preprocess these datasets.
🗂️ The
DA3D_pkls(minimal metadata for inference) can be downloaded from Google Drive.
🧩 Note: This release currently supports a minimal inference-only version. The conversion scripts of full dataset + all depth-related files will be provided later.
⚠️ Depth files are not required for inference. You can safely setdepth_path = Nonein detany3d_dataset.py to bypass depth loading.
torchrun \
--nproc_per_node=8 \
--master_addr=${MASTER_ADDR} \
--master_port=${MASTER_PORT} \
--nnodes=8 \
--node_rank=${RANK} \
./train.py \
--config_path \
./detect_anything/configs/train.yaml
torchrun \
--nproc_per_node=8 \
--master_addr=${MASTER_ADDR} \
--master_port=${MASTER_PORT} \
--nnodes=1 \
--node_rank=${RANK} \
./train.py \
--config_path \
./detect_anything/configs/inference_indomain_gt_prompt.yaml
After inference, a file named {dataset}_output_results.json will be generated in the exps/<your_exp_dir>/ directory.
⚠️ Due to compatibility issues betweenpytorch3dand the current environment, we recommend copying the output JSON file into the evaluation script of repositories like Omni3D or OVMono3D for standardized metric evaluation.
TODO: Evaluation for zero-shot datasets currently requires manual modification of the Omni3D or OVMono3D repositories and is not yet fully supported here.
We plan to release a merged evaluation script in this repository to make direct evaluation more convenient in the future.
python ./deploy.py
If you find this repository useful, please consider citing:
@article{zhang2025detect,
title={Detect Anything 3D in the Wild},
author={Zhang, Hanxue and Jiang, Haoran and Yao, Qingsong and Sun, Yanan and Zhang, Renrui and Zhao, Hao and Li, Hongyang and Zhu, Hongzi and Yang, Zetong},
journal={arXiv preprint arXiv:2504.07958},
year={2025}
}