Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dispatch] Make dynamic attention ineligible for collapse #19929

Closed
wants to merge 1 commit into from

Conversation

IanWood1
Copy link
Contributor

@IanWood1 IanWood1 commented Feb 6, 2025

For the same reasons we can't collapse dynamic linalg.generic ops, we also cannot collapse attention ops with dynamic dimensions. This moves the check for dynamic dims before checking for LinalgExt::AttentionOp. Saw causing collapse/expand ops in dispatches in #19921

@MaheshRavishankar
Copy link
Contributor

For the same reasons we can't collapse dynamic linalg.generic ops, we also cannot collapse attention ops with dynamic dimensions. This moves the check for dynamic dims before checking for LinalgExt::AttentionOp. Saw causing collapse/expand ops in dispatches in #19921

why cant we collapse dynamic dimensions in attention op (cause I cant immediately remember why we cant collapse dynamic dimensions of linalg.generic. The only reason I can think of was that tensor.expand_shape didnt support dynamic dimensions, but now it does).

@IanWood1
Copy link
Contributor Author

IanWood1 commented Feb 7, 2025

For the same reasons we can't collapse dynamic linalg.generic ops, we also cannot collapse attention ops with dynamic dimensions. This moves the check for dynamic dims before checking for LinalgExt::AttentionOp. Saw causing collapse/expand ops in dispatches in #19921

why cant we collapse dynamic dimensions in attention op (cause I cant immediately remember why we cant collapse dynamic dimensions of linalg.generic. The only reason I can think of was that tensor.expand_shape didnt support dynamic dimensions, but now it does).

There are some changes to the pass needed (see #19654). But in this case this might be overkill since the actual collapsed tensor is fully static even though the attention op has dynamic dims. The attention pattern just bails when any of the loops are dynamic.

@MaheshRavishankar
Copy link
Contributor

How this work for LLaMa then? I thought that we previously had issues with attention op not being collapsed back down. If this doesnt cause regression on LLaMa (which isnt in CI), this looks fine to me.

@IanWood1
Copy link
Contributor Author

Not needed after #19654

@IanWood1 IanWood1 closed this Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants