You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for sharing this wonderful work. Since you use the multi-frame multi-view inputs during pretraining stage, I want to know whether did you still use the temporal multi-frame inputs during fine-tune stage?
If you did not use the temporal multi-frame inputs in the downstream tasks, did it mean you discard the voxel decoder in the finetune stage and only load the pre-trained voxel encoder?
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our work. Whether we used temporal multi-frame inputs during fine-tuning depended on whether the methods we were comparing against did so. You can roughly understand it as us providing a pre-trained backbone (such as ResNet50), and during fine-tuning, we adopted exactly the same training strategy as the baseline.
Hi, thanks for sharing this wonderful work. Since you use the multi-frame multi-view inputs during pretraining stage, I want to know whether did you still use the temporal multi-frame inputs during fine-tune stage?
If you did not use the temporal multi-frame inputs in the downstream tasks, did it mean you discard the voxel decoder in the finetune stage and only load the pre-trained voxel encoder?
The text was updated successfully, but these errors were encountered: