You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to train the model with the NYU dataset. The paper says about 21 mins per epoch for 8 A100 GPUs.
Say, I am using a single A100 GPU with batch-size 32, and in my case, it seems stuck in the following step for ever... Meanwhile, I can run the evaluation without issue. I don't know what the problem could be and I would appreciate any help/hint!
with torch.no_grad():
# convert the input image to latent space and scale.
latents = self.encoder_vq.encode(x).mode().detach() * self.config.model.params.scale_factor
P.S., The evaluation results match with the paper well except for sq_rel.
Hi @yongmayer, thanks for appreciating our work. So we used a per device batch size of 4 resulting in a total batch size of 32 with 8 GPUs. The speed issue is probably because you are using a per device batch size of 32 instead of 4. Could you try once with a batch size of 4 (with a single GPU ie. you current setup) and let me know if it works?
I have another question if you don't mind asking. How should I understand the diffusion process in EcoDepth?
From line 96 in EcoDepth/depth/models/model.py (EcoDepthEncoder.forward), I see it uses the Unet from stable diffusion, but cannot see the forward diffusion process. Am I understanding it wrong? I am new to the diffusion-based depth estimation, and I would appreciate your kind explanation a lot!
Hello,
Thank you so much for sharing this amazing work!
I try to train the model with the NYU dataset. The paper says about 21 mins per epoch for 8 A100 GPUs.
Say, I am using a single A100 GPU with batch-size 32, and in my case, it seems stuck in the following step for ever... Meanwhile, I can run the evaluation without issue. I don't know what the problem could be and I would appreciate any help/hint!
with torch.no_grad():
P.S., The evaluation results match with the paper well except for sq_rel.
Again, thanks for the great work!
The text was updated successfully, but these errors were encountered: