You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for the amazing work! I have a small conceptual question about how the proposed cross-view attention mechanism works in the transformer.
If I understand correctly, during cross-view attention, each image patch only attends to image patches at the same spatial location (in image space) in other images, but not the other patches. I'm curious why this makes sense, since clearly in different views, images patches in the same spatial location do not necessarily correspond to same parts of the scene, so the information passed around might not be that useful for reconstruction. Is this a non-issue because of the shifted window approach that swin transformers take, and somehow global information still gets passed around effectively? Or am I misunderstanding how cross-view attention works?
Any guidance would be appreciated!
The text was updated successfully, but these errors were encountered:
Hi, thanks for the amazing work! I have a small conceptual question about how the proposed cross-view attention mechanism works in the transformer.
If I understand correctly, during cross-view attention, each image patch only attends to image patches at the same spatial location (in image space) in other images, but not the other patches. I'm curious why this makes sense, since clearly in different views, images patches in the same spatial location do not necessarily correspond to same parts of the scene, so the information passed around might not be that useful for reconstruction. Is this a non-issue because of the shifted window approach that swin transformers take, and somehow global information still gets passed around effectively? Or am I misunderstanding how cross-view attention works?
Any guidance would be appreciated!
The text was updated successfully, but these errors were encountered: