Support more than 1 shape/attention_params for DotProductAttention decision cache #1349

parthmannan · 2024-11-29T08:31:52Z

Currently, DotProductAttention caches decisions like get_attention_backend for 1 set of attention_params and this helps reduce CPU overhead in the DotProductAttention call. However, when using model architectures with more than 1 shape for Attention (for example, Self and Cross Attention), this caching fails as it resets each time the params change.
This is a feature request to support more than 1 attention_params in the cache. Ideally this number can be configurable as some models tend to have more than 2 shapes as well but maybe 4 can be a safe number to start (if not configurable).

The text was updated successfully, but these errors were encountered:

cyanguwa self-assigned this Nov 29, 2024

cyanguwa linked a pull request Dec 19, 2024 that will close this issue

[PyTorch] Add caching for attention backend selection results #1381

Draft

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support more than 1 shape/attention_params for DotProductAttention decision cache #1349

Support more than 1 shape/attention_params for DotProductAttention decision cache #1349

parthmannan commented Nov 29, 2024

Support more than 1 shape/attention_params for DotProductAttention decision cache #1349

Support more than 1 shape/attention_params for DotProductAttention decision cache #1349

Comments

parthmannan commented Nov 29, 2024