Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conceptual question about cross-view attention #66

Open
hanyucc opened this issue Oct 7, 2024 · 0 comments
Open

Conceptual question about cross-view attention #66

hanyucc opened this issue Oct 7, 2024 · 0 comments

Comments

@hanyucc
Copy link

hanyucc commented Oct 7, 2024

Hi, thanks for the amazing work! I have a small conceptual question about how the proposed cross-view attention mechanism works in the transformer.

If I understand correctly, during cross-view attention, each image patch only attends to image patches at the same spatial location (in image space) in other images, but not the other patches. I'm curious why this makes sense, since clearly in different views, images patches in the same spatial location do not necessarily correspond to same parts of the scene, so the information passed around might not be that useful for reconstruction. Is this a non-issue because of the shifted window approach that swin transformers take, and somehow global information still gets passed around effectively? Or am I misunderstanding how cross-view attention works?

Any guidance would be appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant