Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[webgpu] Implement SubGroupMatrix based MatMulNBits for Metal #23729

Merged
merged 15 commits into from
Feb 21, 2025

Conversation

sushraja-msft
Copy link
Contributor

@sushraja-msft sushraja-msft commented Feb 17, 2025

Description

Recent progress with SubGroupMatrix prototype in Dawn https://issues.chromium.org/issues/348702031, exposes SIMD-Group Matrix Functions to webgpu. This shader implements a matmulnbits using that primitive.

Observed perf gains, in terms of LLM inference speed, prefill perf for Phi 3.5 for a 1K token prefill see 3x improvement. 5.4s from 15s.

With Changes

./model_benchmark -i ~/Phi-3.5-mini-instruct-onnx-web -l 1000
Batch size: 1, prompt tokens: 1001, tokens to generate: 128
Prompt processing (time to first token):
	avg (us):       5.42498e+06                    <<< SubGroupMatrix 5.4s
	avg (tokens/s): 184.517
	p50 (us):       5.41982e+06
	stddev (us):    12023.8
	n:              5 * 1001 token(s)
Token generation:
	avg (us):       91138.5
	avg (tokens/s): 10.9723
	p50 (us):       89488.5
	stddev (us):    35136.2
	n:              635 * 1 token(s)

Baseline

./model_benchmark -i ~/Phi-3.5-mini-instruct-onnx-web -l 1000
Batch size: 1, prompt tokens: 1001, tokens to generate: 128
Prompt processing (time to first token):
	avg (us):       1.45507e+07                     <<< Baseline 14.5s
	avg (tokens/s): 68.7938
	p50 (us):       1.45413e+07
	stddev (us):    22208.9
	n:              5 * 1001 token(s)
Token generation:
	avg (us):       94109.8
	avg (tokens/s): 10.6259
	p50 (us):       89660
	stddev (us):    61579
	n:              635 * 1 token(s)

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@sushraja-msft sushraja-msft marked this pull request as ready for review February 19, 2025 00:30
@sushraja-msft sushraja-msft force-pushed the user/sushraja/subgroupMatrix branch from e90b823 to 09e30be Compare February 19, 2025 20:37
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Feb 19, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@sushraja-msft sushraja-msft force-pushed the user/sushraja/subgroupMatrix branch from 92db1cf to f7ddbb0 Compare February 20, 2025 00:32
@guschmue
Copy link
Contributor

guschmue commented Feb 20, 2025

the ort web pipeline compiles webgpu ep with emscripten which fails with:
shader_helper.cc:355:45: error: no member named 'ChromiumExperimentalSubgroupMatrix' in 'wgpu::FeatureName'
355 | if (device_.HasFeature(wgpu::FeatureName::ChromiumExperimentalSubgroupMatrix))

Possible the headerfile that comes with emscripten doesn't know that featurename yet

Maybe use
#if !defined(wasm)

@sushraja-msft
Copy link
Contributor Author

the ort web pipeline compiles webgpu ep with emscripten which fails with: shader_helper.cc:355:45: error: no member named 'ChromiumExperimentalSubgroupMatrix' in 'wgpu::FeatureName' 355 | if (device_.HasFeature(wgpu::FeatureName::ChromiumExperimentalSubgroupMatrix))

Possible the headerfile that comes with emscripten doesn't know that featurename yet

Maybe use #if !defined(wasm)

done !

@fs-eire
Copy link
Contributor

fs-eire commented Feb 21, 2025

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@guschmue guschmue merged commit 8eb5513 into main Feb 21, 2025
96 of 98 checks passed
@guschmue guschmue deleted the user/sushraja/subgroupMatrix branch February 21, 2025 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants