Skip to content

Conversation

@xhcao
Copy link
Contributor

@xhcao xhcao commented Oct 29, 2025

Description

Motivation and Context

@xhcao
Copy link
Contributor Author

xhcao commented Oct 29, 2025

@guschmue @fs-eire @qjia7 @jchen10 @Jiawei-Shao
Hi, all, I want to discuss with you whether we could optimize Gemm, MatMul and Conv operators on some special vendor and special architectures as this PR.
Reasons:

  1. It is difficult to design an algorithm that benifits all vendors and all architectures.
  2. Even for the same vendor, but different architectures, it is also difficult.
  3. Maintaining and reviewing the code is also difficult if adding some vendor and architecture information in common files.
  4. There are not enough devices to test the correctness and performance.
    ...

@jchen10
Copy link
Contributor

jchen10 commented Oct 29, 2025

@xhcao It seems you didn't include your test cases in this PR. What's your concern? BTW, you'd better provide some performance data for reference!

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Oct 29, 2025
@guschmue guschmue requested a review from Copilot October 29, 2025 15:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Intel-specific optimizations for GEMM and MatMul operations in the WebGPU provider, targeting Intel Xe-2 GPU architectures (xe-2lpg and xe-2hpg) using subgroup operations.

  • Implements subgroup-based GEMM and MatMul kernels optimized for Intel GPUs
  • Adds vendor-specific code paths that are conditionally executed based on adapter info and matrix dimensions
  • Integrates Intel optimizations into existing GEMM and MatMul operators

Reviewed Changes

Copilot reviewed 2 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
matmul_intel.h Defines MatMulSubgroupProgram class and Intel MatMul entry points
matmul_intel.cc Implements Intel-optimized MatMul with subgroup support and batched optimizations
gemm_utils_intel.h Declares helper functions for reading/writing matrix data in Intel kernels
gemm_utils_intel.cc Implements shader generation helpers for Intel-specific GEMM operations
gemm_subgroup.h Defines constants and functions for subgroup-based matrix operations
gemm_subgroup.cc Implements vec4 and scalar variants of subgroup matrix multiplication
gemm_intel.h Defines GemmSubgroupProgram class and Intel GEMM entry points
gemm_intel.cc Implements Intel-optimized GEMM with subgroup support
matmul.cc Integrates Intel MatMul optimization check into existing MatMul operator
gemm.cc Integrates Intel GEMM optimization check into existing GEMM operator

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@xhcao
Copy link
Contributor Author

xhcao commented Oct 31, 2025

@xhcao It seems you didn't include your test cases in this PR. What's your concern? BTW, you'd better provide some performance data for reference!

Firstly, I want being consented to create a vendor directory and make some optimizations on some special platforms, especially from Microsoft reviewers' consent.
If so, I will add the test cases in the other PR and apply performance data here.

@jchen10
Copy link
Contributor

jchen10 commented Oct 31, 2025

@xhcao It seems you didn't include your test cases in this PR. What's your concern? BTW, you'd better provide some performance data for reference!

Firstly, I want being consented to create a vendor directory and make some optimizations on some special platforms, especially from Microsoft reviewers' consent. If so, I will add the test cases in the other PR and apply performance data here.

You don't need to depend on it. You can just best improve your PR with more test cases, better perf data, and easier to review.

@fs-eire
Copy link
Contributor

fs-eire commented Nov 4, 2025

could you please help to merge to latest main and push?

The pipeline failures should be unrelated, and may be fixed by rerun.

@guschmue
Copy link
Contributor

guschmue commented Nov 4, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants