-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to step further by using off-the-shelf mono-depth instead of the features only? #1
Comments
Hi, thank you for your insightful question. Indeed, we initially considered directly using monocular depth predictions from Depth Anything. However, the monodepth model predicts relative depth values with unknown scale and shift parameters. For our application in Gaussian splatting, we require multi-view consistent depths, which can be combined into a coherent global 3D representation. We found it challenging to convert the relative depth to scale-consistent depths. This issue becomes even more pronounced as the number of views increases. On the other hand, we explored an alternative approach of feature-level fusion, which we found worked surprisingly well. The method is also very simple, which avoids the complications associated with aligning relative depth scales. As a result, we opted for this design over relying on direct depth predictions. It's also worth noting a related observation: when fine-tuning a pre-trained relative depth model for metric depth predictions, a common strategy is to retain only the pre-trained encoder and introduce a new decoder to predict metric depth. Our design shares similarities with this approach. I hope this helps and we’re happy to continue the discussion if you have further questions or insights. |
Thank you for your detailed and insightful answer! |
Hi, thank you for the great work! I noticed in the paper, you mentioned that the depth is regressed with a Unet from the concatenated monocular features and cost volumes. After roughly read the code, I found that the depth is regressed with a DPT head, is it a misunderstanding for this part? |
Hi, the DPT head is mentioned in the last sentence of section "Feature Fusion and Depth Regression" in our paper. Since the UNet regresses depth maps at the downsampled feature resolution, we use an additional DPT head to upsample depth to the full resolution. |
Thanks for the answer! I wonder why not directly use DPT to regress a full resolution depth? |
Hi, the low-resolution depth is predicted with a softmax layer by performing a weighted average of all the potential depth candidates, which is compatible with the cost volume representation (i.e., feature matching). The importance of the cost volume is ablated in Table 2 and we found it significantly helps. The combination of U-Net and DPT head can be understood as "matching + regression", which we found works well, as also similarly observed in our unimatch paper. |
Thank you for this great work and I was impressed by the design before you posted it on ArXiv (I noticed this on OpenReview and I am not a reviewer).
Is it possible to directly employ the monocular depth results? In your design, only the feature is applied but the DPT-head is dropped. But we know that Depth-anything2 can produce high-quality depth. It would be a pity that such prior information is lost. Have you tried some regarding experiments?
The text was updated successfully, but these errors were encountered: