Conceptual question about cross-view attention #66

hanyucc · 2024-10-07T22:05:45Z

Hi, thanks for the amazing work! I have a small conceptual question about how the proposed cross-view attention mechanism works in the transformer.

If I understand correctly, during cross-view attention, each image patch only attends to image patches at the same spatial location (in image space) in other images, but not the other patches. I'm curious why this makes sense, since clearly in different views, images patches in the same spatial location do not necessarily correspond to same parts of the scene, so the information passed around might not be that useful for reconstruction. Is this a non-issue because of the shifted window approach that swin transformers take, and somehow global information still gets passed around effectively? Or am I misunderstanding how cross-view attention works?

Any guidance would be appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conceptual question about cross-view attention #66

Conceptual question about cross-view attention #66

hanyucc commented Oct 7, 2024

Conceptual question about cross-view attention #66

Conceptual question about cross-view attention #66

Comments

hanyucc commented Oct 7, 2024