Skip to content

Commit 2cbc36a

Browse files
authored
Initial commit
1 parent a0c63fc commit 2cbc36a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+91693
-2
lines changed

DA-2K.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# DA-2K Evaluation Benchmark
2+
3+
## Introduction
4+
5+
![DA-2K](assets/DA-2K.png)
6+
7+
DA-2K is proposed in [Depth Anything V2](https://depth-anything-v2.github.io) to evaluate the relative depth estimation capability. It encompasses eight representative scenarios of `indoor`, `outdoor`, `non_real`, `transparent_reflective`, `adverse_style`, `aerial`, `underwater`, and `object`. It consists of 1K diverse high-quality images and 2K precise pair-wise relative depth annotations.
8+
9+
Please refer to our [paper](https://depth-anything-v2.github.io) for details in constructing this benchmark.
10+
11+
12+
## Usage
13+
14+
Please first [download the benchmark]().
15+
16+
All annotations are stored in [annotations.json](./annotations.json). The annotation file is a JSON object where each key is the path to an image file, and the value is a list of annotations associated with that image. Each annotation describes two points and identifies which point is closer to the camera. The structure is detailed below:
17+
18+
```
19+
{
20+
"image_path": [
21+
{
22+
"point1": [h1, w1], # (vertical position, horizontal position)
23+
"point2": [h2, w2], # (vertical position, horizontal position)
24+
"closer_point": "point1" # we always set "point1" as the closer one
25+
},
26+
...
27+
],
28+
...
29+
}
30+
```
31+
32+
To visualize the annotations:
33+
```bash
34+
python visualize.py [--scene-type <type>]
35+
```
36+
37+
**Options**
38+
- `--scene-type <type>` (optional): Specify the scene type (`indoor`, `outdoor`, `non_real`, `transparent_reflective`, `adverse_style`, `aerial`, `underwater`, and `object`). Skip this argument or set <type> as `""` to include all scene types.
39+
40+
## Citation
41+
42+
If you find this benchmark useful, please consider citing:
43+
44+
```bibtex
45+
@article{depth_anything_v2,
46+
title={Depth Anything V2},
47+
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
48+
journal={arXiv preprint arXiv:},
49+
year={2024}
50+
}
51+
```

README.md

Lines changed: 125 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,125 @@
1-
# Depth-Anything-V2
2-
Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
1+
<div align="center">
2+
<h1>Depth Anything V2</h1>
3+
4+
[**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> · [**Bingyi Kang**](https://bingykang.github.io/)<sup>2&dagger;</sup> · [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup>
5+
<br>
6+
[**Zhen Zhao**](http://zhaozhen.me/) · [**Xiaogang Xu**](https://xiaogang00.github.io/) · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> · [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup>
7+
8+
<sup>1</sup>HKU&emsp;&emsp;&emsp;<sup>2</sup>TikTok
9+
<br>
10+
&dagger;project lead&emsp;*corresponding author
11+
12+
<a href=""><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a>
13+
<a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a>
14+
<a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
15+
<a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-green' alt='Benchmark'></a>
16+
</div>
17+
18+
This work presents Depth Anything V2. Compared with V1, this version produces significantly more fine-grained and robust depth predictions. Compared with SD-based models, it is much more efficient and lightweight.
19+
20+
![teaser](assets/teaser.png)
21+
22+
## News
23+
24+
- **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released.
25+
26+
27+
## Pre-trained Models
28+
29+
We provide **four models** of varying scales for robust relative depth estimation:
30+
31+
| Model | Params | Checkpoint |
32+
|:-|-:|:-:|
33+
| Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) |
34+
| Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) |
35+
| Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) |
36+
| Depth-Anything-V2-Giant | 1.3B | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Giant/resolve/main/depth_anything_v2_vitg.pth?download=true) |
37+
38+
39+
### Code snippet to use our models
40+
```python
41+
import cv2
42+
import torch
43+
44+
from depth_anything_v2.dpt import DepthAnythingV2
45+
46+
# take depth-anything-v2-giant as an example
47+
model = DepthAnythingV2(encoder='vitg', features=384, out_channels=[1536, 1536, 1536, 1536])
48+
model.load_state_dict(torch.load('checkpoints/depth_anything_v2_vitg.pth', map_location='cpu'))
49+
model.eval()
50+
51+
raw_img = cv2.imread('your/image/path')
52+
depth = model.infer_img(raw_img) # HxW raw depth map
53+
```
54+
55+
## Usage
56+
57+
### Installation
58+
59+
```bash
60+
git clone https://github.com/DepthAnything/Depth-Anything-V2
61+
cd Depth-Anything-V2
62+
pip install -r requirements.txt
63+
```
64+
65+
### Running
66+
67+
```bash
68+
python run.py --encoder <vits | vitb | vitl | vitg> --img-path <path> --outdir <outdir> [--input-size <size>] [--pred-only] [--grayscale]
69+
```
70+
Options:
71+
- `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
72+
- `--input-size` (optional): By default, we use input size `518` for model inference. **You can increase the size for even more fine-grained results.**
73+
- `--pred-only` (optional): Only save the predicted depth map, without raw image.
74+
- `--grayscale` (optional): Save the grayscale depth map, without applying color palette.
75+
76+
For example:
77+
```bash
78+
python run.py --encoder vitg --img-path assets/examples --outdir depth_vis
79+
```
80+
81+
**If you want to use Depth Anything V2 on videos:**
82+
83+
```bash
84+
python run_video.py --encoder vitg --video-path assets/examples_video --outdir video_depth_vis
85+
```
86+
87+
*Please note that our larger model has better temporal consistency on videos.*
88+
89+
90+
### Gradio demo
91+
92+
To use our gradio demo locally:
93+
94+
```bash
95+
python app.py
96+
```
97+
98+
You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2).
99+
100+
**Note:** Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)). In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use intermediate features instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.
101+
102+
103+
104+
## Fine-tuned to Metric Depth Estimation
105+
106+
Please refer to [metric depth estimation](./metric_depth).
107+
108+
109+
## DA-2K Evaluation Benchmark
110+
111+
Please refer to [DA-2K benchmark](./DA-2K.md).
112+
113+
114+
## Citation
115+
116+
If you find this project useful, please consider citing:
117+
118+
```bibtex
119+
@article{depth_anything_v2,
120+
title={Depth Anything V2},
121+
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
122+
journal={arXiv preprint arXiv:},
123+
year={2024}
124+
}
125+
```

app.py

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
import glob
2+
import gradio as gr
3+
import matplotlib
4+
import numpy as np
5+
from PIL import Image
6+
import torch
7+
import tempfile
8+
from gradio_imageslider import ImageSlider
9+
10+
from depth_anything_v2.dpt import DepthAnythingV2
11+
12+
css = """
13+
#img-display-container {
14+
max-height: 100vh;
15+
}
16+
#img-display-input {
17+
max-height: 80vh;
18+
}
19+
#img-display-output {
20+
max-height: 80vh;
21+
}
22+
#download {
23+
height: 62px;
24+
}
25+
"""
26+
DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
27+
model_configs = {
28+
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
29+
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
30+
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
31+
'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
32+
}
33+
encoder = 'vitl'
34+
model = DepthAnythingV2(**model_configs[encoder])
35+
state_dict = torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location="cpu")
36+
model.load_state_dict(state_dict)
37+
model = model.to(DEVICE).eval()
38+
39+
title = "# Depth Anything V2"
40+
description = """Official demo for **Depth Anything V2**.
41+
Please refer to our [paper](), [project page](https://depth-anything-v2.github.io), or [github](https://github.com/DepthAnything/Depth-Anything-V2) for more details."""
42+
43+
def predict_depth(image):
44+
return model.infer_image(image)
45+
46+
with gr.Blocks(css=css) as demo:
47+
gr.Markdown(title)
48+
gr.Markdown(description)
49+
gr.Markdown("### Depth Prediction demo")
50+
51+
with gr.Row():
52+
input_image = gr.Image(label="Input Image", type='numpy', elem_id='img-display-input')
53+
depth_image_slider = ImageSlider(label="Depth Map with Slider View", elem_id='img-display-output', position=0.5)
54+
submit = gr.Button(value="Compute Depth")
55+
gray_depth_file = gr.File(label="Grayscale depth map", elem_id="download",)
56+
raw_file = gr.File(label="16-bit raw output (can be considered as disparity)", elem_id="download",)
57+
58+
cmap = matplotlib.colormaps.get_cmap('Spectral_r')
59+
60+
def on_submit(image):
61+
original_image = image.copy()
62+
63+
h, w = image.shape[:2]
64+
65+
depth = predict_depth(image[:, :, ::-1])
66+
67+
raw_depth = Image.fromarray(depth.astype('uint16'))
68+
tmp_raw_depth = tempfile.NamedTemporaryFile(suffix='.png', delete=False)
69+
raw_depth.save(tmp_raw_depth.name)
70+
71+
depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
72+
depth = depth.astype(np.uint8)
73+
colored_depth = (cmap(depth)[:, :, :3] * 255).astype(np.uint8)
74+
75+
gray_depth = Image.fromarray(depth)
76+
tmp_gray_depth = tempfile.NamedTemporaryFile(suffix='.png', delete=False)
77+
gray_depth.save(tmp_gray_depth.name)
78+
79+
return [(original_image, colored_depth), tmp_gray_depth.name, tmp_raw_depth.name]
80+
81+
submit.click(on_submit, inputs=[input_image], outputs=[depth_image_slider, gray_depth_file, raw_file])
82+
83+
example_files = glob.glob('assets/examples/*')
84+
examples = gr.Examples(examples=example_files, inputs=[input_image], outputs=[depth_image_slider, gray_depth_file, raw_file], fn=on_submit)
85+
86+
87+
if __name__ == '__main__':
88+
demo.queue().launch()

assets/DA-2K.png

1.13 MB
Loading

assets/examples/demo01.jpg

477 KB
Loading

assets/examples/demo02.jpg

499 KB
Loading

assets/examples/demo03.jpg

454 KB
Loading

assets/examples/demo04.jpg

293 KB
Loading

assets/examples/demo05.jpg

345 KB
Loading

assets/examples/demo06.jpg

764 KB
Loading

0 commit comments

Comments
 (0)