Qualcomm AI Engine Direct - VIT Optimization #15696

winskuo-quic · 2025-11-10T09:09:36Z

Summary

QNN doesn't have much 5D permute optimization, which causes Vit running slower than CPU.

Switched pattern from unsqueeze->permute 5d->squeeze to permute 4d.

Improvements: 150ms/inference -> 4.2ms/inference.

Test plan

Pass Vit UT

pytorch-bot · 2025-11-10T09:09:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15696

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Cancelled Job, 4 Unrelated Failures

As of commit da35397 with merge base d07a49a ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest / macos / macos-job (gh)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest-editable / linux / linux-job (gh)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe
pull / unittest-editable / macos / macos-job (gh)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe

CANCELLED JOB - The following job was cancelled. Please retry:

pull / test-openvino-linux / linux-job (gh)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-binary-size-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-setup-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest / windows / windows-job (gh) (trunk failure)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-11-10T09:10:22Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

winskuo-quic · 2025-11-10T09:15:58Z

Hi @cccclai,
This PR is to address the issue you have reported about QNN running slower than CPU for Vit.
I have tested SM8750 with this PR, and inference speed improves from 140ms/inference -> 4.1ms/inference, which should be around 35 times faster.
I have made a source level transformation on the Vit model.
The bottleneck is caused by the 5D permute pattern here: https://github.com/pytorch/pytorch/blob/3cfbf98ea9d937d23f3700168b22706c957308ce/torch/nn/functional.py#L5825.
I did source level transformation because this pattern is a bit specific and it is hard to generalize.
Please have a look and let me know if you have any questions.
Thanks

cccclai · 2025-11-11T20:18:45Z

examples/qualcomm/scripts/torchvision_vit.py


+# Copied from torch/nn/functional.py
+# QNN does not have 5D permute optimization. Fuse to a single 4D optimization
+# Changed unsqueeze(0).transpose(0, -2).squeeze(-2) to permute(2, 0, 1, 3)


It can be a pass ideally, is it correct?

Qualcomm AI Engine Direct - VIT Optimization

da35397

winskuo-quic requested a review from cccclai as a code owner November 10, 2025 09:09

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 10, 2025

cccclai reviewed Nov 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - VIT Optimization #15696

Qualcomm AI Engine Direct - VIT Optimization #15696

winskuo-quic commented Nov 10, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

winskuo-quic commented Nov 10, 2025

Uh oh!

cccclai Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qualcomm AI Engine Direct - VIT Optimization #15696

Are you sure you want to change the base?

Qualcomm AI Engine Direct - VIT Optimization #15696

Conversation

winskuo-quic commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15696

❌ 4 New Failures, 1 Cancelled Job, 4 Unrelated Failures

Uh oh!

github-actions bot commented Nov 10, 2025

This PR needs a release notes: label

Uh oh!

winskuo-quic commented Nov 10, 2025

Uh oh!

cccclai Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

winskuo-quic commented Nov 10, 2025 •

edited

Loading

pytorch-bot bot commented Nov 10, 2025 •

edited

Loading

This PR needs a `release notes:` label