[webgpu] revise implementation of buffer split support #26429

fs-eire · 2025-10-29T00:42:18Z

Description

This PR addresses a few concerns:

revert const ProgramBase& -> ProgramBase&: this itself is not doing something wrong but gives much more pressure for who reads the code to understand whether/where the program object is modified. It also can introduce further unexpected modifications to the program object (for example the indirect dispatch code)
change bool option "ep.webgpuexecutionprovider.smallStorageBufferBindingSizeForTesting" to "ep.webgpuexecutionprovider.maxStorageBufferBindingSize" so now it's possible to set any value in option. (setting to <128MB will cause an assert failure)
segments are optional in cache key (only present when not equals to 1, which is the common case)
avoid some unnecessary API calls, which is OK for native but may affect web perf.
clean up the code a little bit and add a few comments

Copilot

Pull Request Overview

This PR refactors the WebGPU provider to replace the testing-only boolean flag small_storage_buffer_binding_size_for_testing with a more flexible max_storage_buffer_binding_size parameter that allows users to explicitly set the maximum storage buffer binding size.

Key changes:

Replaced boolean testing flag with a configurable uint64_t parameter for controlling storage buffer binding size
Moved segment calculation from ProgramBase member storage to separate computed vectors
Made ProgramBase const-correct in Run operations to prevent unintended modifications

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
webgpu_provider_options.h	Renamed option key from testing-specific to production-ready name
webgpu_provider_factory.cc	Replaced boolean parsing with uint64_t parsing using std::from_chars
webgpu_context.h	Changed parameter type from bool to uint64_t with validation logic
webgpu_context.cc	Updated Run signature to const and moved indirect dispatch handling
shader_helper.h	Added spans for segments and removed FinalizeInputs method
shader_helper.cc	Updated to use segment spans instead of ProgramInput/Output members
program_manager.h	Changed CalculateSegmentsForInputsAndOutputs to return segments via out parameters
program_manager.cc	Implemented segment calculation into separate vectors
program_cache_key.h	Updated signature to accept segment spans
program_cache_key.cc	Updated to use segments from spans with optimization for default values
program.h	Removed segments members from ProgramInput/Output structs and related setters
program.cc	Moved indirect dispatch tensor registration to SetIndirectDispatchTensor
compute_context.h	Updated RunProgram to accept const reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/providers/webgpu/program_manager.cc

onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc

Co-authored-by: Copilot <[email protected]>

onnxruntime/core/providers/webgpu/program.cc

onnxruntime/core/providers/webgpu/webgpu_context.cc

xiaofeihan1 · 2025-10-29T07:36:49Z

onnxruntime/core/providers/webgpu/webgpu_context.cc

      }
    } else {
-      bind_group_entries.push_back({nullptr, entry_index++, buffer, 0, std::min(kMaxBufferSize, buffer_size), nullptr, nullptr});
+      bind_group_entries.push_back({nullptr, entry_index++, buffer, 0, WGPU_WHOLE_SIZE, nullptr, nullptr});


I wonder that is there a specific reason to use WGPU_WHOLE_SIZE (UINT64_MAX)? Intuitively, using the actual buffer size be more appropriate?

the reason is I try to avoid perform one call to wgpuBufferGetSize, which is trivial in native but can be expensive in web because it involves an inter-op call.

Will it be better to change it only bind the normalized tensor size (align with 16) which is the actual size we need and no extra memory is bound and no wgpuBufferGetSize is called? Anyway, I think it's ok to leave it as WGPU_WHOLE_SIZE. I am just curious whether it helps the performance if we only bind the needed size instead of the buffer size which is the bucket size.

I will think about how to do this in another PR. Ideally, all necessary information for creating binding group can be determined earlier.

…revise

[webgpu] revise implementation of buffer split support

348cc6a

fs-eire requested review from Copilot, qjia7 and xiaofeihan1 October 29, 2025 00:42

Copilot AI reviewed Oct 29, 2025

View reviewed changes

onnxruntime/core/providers/webgpu/program_manager.cc Outdated Show resolved Hide resolved

onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc Outdated Show resolved Hide resolved

fs-eire and others added 3 commits October 28, 2025 17:45

Update onnxruntime/core/providers/webgpu/program_manager.cc

81f46b4

Co-authored-by: Copilot <[email protected]>

Update onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc

bd8bed7

Co-authored-by: Copilot <[email protected]>

revise message and format code

5eccc10

qjia7 reviewed Oct 29, 2025

View reviewed changes

onnxruntime/core/providers/webgpu/program.cc Show resolved Hide resolved

onnxruntime/core/providers/webgpu/webgpu_context.cc Show resolved Hide resolved

xiaofeihan1 reviewed Oct 29, 2025

View reviewed changes

guschmue added the ep:WebGPU ort-web webgpu provider label Oct 29, 2025

fs-eire added 3 commits October 29, 2025 14:30

fix build break

c0f1303

Merge remote-tracking branch 'origin/main' into fs-eire/split-buffer-…

034141d

…revise

resolve comments

11c11b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[webgpu] revise implementation of buffer split support #26429

[webgpu] revise implementation of buffer split support #26429

fs-eire commented Oct 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaofeihan1 Oct 29, 2025

Uh oh!

fs-eire Oct 29, 2025

Uh oh!

qjia7 Oct 30, 2025

Uh oh!

fs-eire Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[webgpu] revise implementation of buffer split support #26429

Are you sure you want to change the base?

[webgpu] revise implementation of buffer split support #26429

Conversation

fs-eire commented Oct 29, 2025

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaofeihan1 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

fs-eire Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

qjia7 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

fs-eire Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fs-eire Oct 31, 2025 •

edited

Loading