|
1 |
| -# Depth-Anything-V2 |
2 |
| -Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation |
| 1 | +<div align="center"> |
| 2 | +<h1>Depth Anything V2</h1> |
| 3 | + |
| 4 | +[**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> · [**Bingyi Kang**](https://bingykang.github.io/)<sup>2†</sup> · [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup> |
| 5 | +<br> |
| 6 | +[**Zhen Zhao**](http://zhaozhen.me/) · [**Xiaogang Xu**](https://xiaogang00.github.io/) · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> · [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup> |
| 7 | + |
| 8 | +<sup>1</sup>HKU   <sup>2</sup>TikTok |
| 9 | +<br> |
| 10 | +†project lead *corresponding author |
| 11 | + |
| 12 | +<a href=""><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a> |
| 13 | +<a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a> |
| 14 | +<a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> |
| 15 | +<a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-green' alt='Benchmark'></a> |
| 16 | +</div> |
| 17 | + |
| 18 | +This work presents Depth Anything V2. Compared with V1, this version produces significantly more fine-grained and robust depth predictions. Compared with SD-based models, it is much more efficient and lightweight. |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | +## News |
| 23 | + |
| 24 | +- **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released. |
| 25 | + |
| 26 | + |
| 27 | +## Pre-trained Models |
| 28 | + |
| 29 | +We provide **four models** of varying scales for robust relative depth estimation: |
| 30 | + |
| 31 | +| Model | Params | Checkpoint | |
| 32 | +|:-|-:|:-:| |
| 33 | +| Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) | |
| 34 | +| Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) | |
| 35 | +| Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) | |
| 36 | +| Depth-Anything-V2-Giant | 1.3B | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Giant/resolve/main/depth_anything_v2_vitg.pth?download=true) | |
| 37 | + |
| 38 | + |
| 39 | +### Code snippet to use our models |
| 40 | +```python |
| 41 | +import cv2 |
| 42 | +import torch |
| 43 | + |
| 44 | +from depth_anything_v2.dpt import DepthAnythingV2 |
| 45 | + |
| 46 | +# take depth-anything-v2-giant as an example |
| 47 | +model = DepthAnythingV2(encoder='vitg', features=384, out_channels=[1536, 1536, 1536, 1536]) |
| 48 | +model.load_state_dict(torch.load('checkpoints/depth_anything_v2_vitg.pth', map_location='cpu')) |
| 49 | +model.eval() |
| 50 | + |
| 51 | +raw_img = cv2.imread('your/image/path') |
| 52 | +depth = model.infer_img(raw_img) # HxW raw depth map |
| 53 | +``` |
| 54 | + |
| 55 | +## Usage |
| 56 | + |
| 57 | +### Installation |
| 58 | + |
| 59 | +```bash |
| 60 | +git clone https://github.com/DepthAnything/Depth-Anything-V2 |
| 61 | +cd Depth-Anything-V2 |
| 62 | +pip install -r requirements.txt |
| 63 | +``` |
| 64 | + |
| 65 | +### Running |
| 66 | + |
| 67 | +```bash |
| 68 | +python run.py --encoder <vits | vitb | vitl | vitg> --img-path <path> --outdir <outdir> [--input-size <size>] [--pred-only] [--grayscale] |
| 69 | +``` |
| 70 | +Options: |
| 71 | +- `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths. |
| 72 | +- `--input-size` (optional): By default, we use input size `518` for model inference. **You can increase the size for even more fine-grained results.** |
| 73 | +- `--pred-only` (optional): Only save the predicted depth map, without raw image. |
| 74 | +- `--grayscale` (optional): Save the grayscale depth map, without applying color palette. |
| 75 | + |
| 76 | +For example: |
| 77 | +```bash |
| 78 | +python run.py --encoder vitg --img-path assets/examples --outdir depth_vis |
| 79 | +``` |
| 80 | + |
| 81 | +**If you want to use Depth Anything V2 on videos:** |
| 82 | + |
| 83 | +```bash |
| 84 | +python run_video.py --encoder vitg --video-path assets/examples_video --outdir video_depth_vis |
| 85 | +``` |
| 86 | + |
| 87 | +*Please note that our larger model has better temporal consistency on videos.* |
| 88 | + |
| 89 | + |
| 90 | +### Gradio demo |
| 91 | + |
| 92 | +To use our gradio demo locally: |
| 93 | + |
| 94 | +```bash |
| 95 | +python app.py |
| 96 | +``` |
| 97 | + |
| 98 | +You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2). |
| 99 | + |
| 100 | +**Note:** Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)). In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use intermediate features instead. Although this modification did not improve details or accuracy, we decided to follow this common practice. |
| 101 | + |
| 102 | + |
| 103 | + |
| 104 | +## Fine-tuned to Metric Depth Estimation |
| 105 | + |
| 106 | +Please refer to [metric depth estimation](./metric_depth). |
| 107 | + |
| 108 | + |
| 109 | +## DA-2K Evaluation Benchmark |
| 110 | + |
| 111 | +Please refer to [DA-2K benchmark](./DA-2K.md). |
| 112 | + |
| 113 | + |
| 114 | +## Citation |
| 115 | + |
| 116 | +If you find this project useful, please consider citing: |
| 117 | + |
| 118 | +```bibtex |
| 119 | +@article{depth_anything_v2, |
| 120 | + title={Depth Anything V2}, |
| 121 | + author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, |
| 122 | + journal={arXiv preprint arXiv:}, |
| 123 | + year={2024} |
| 124 | +} |
| 125 | +``` |
0 commit comments