Skip to content

Conversation

@fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Oct 29, 2025

Description

This PR addresses a few concerns:

  • revert const ProgramBase& -> ProgramBase&: this itself is not doing something wrong but gives much more pressure for who reads the code to understand whether/where the program object is modified. It also can introduce further unexpected modifications to the program object (for example the indirect dispatch code)
  • change bool option "ep.webgpuexecutionprovider.smallStorageBufferBindingSizeForTesting" to "ep.webgpuexecutionprovider.maxStorageBufferBindingSize" so now it's possible to set any value in option. (setting to <128MB will cause an assert failure)
  • segments are optional in cache key (only present when not equals to 1, which is the common case)
  • avoid some unnecessary API calls, which is OK for native but may affect web perf.
  • clean up the code a little bit and add a few comments

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the WebGPU provider to replace the testing-only boolean flag small_storage_buffer_binding_size_for_testing with a more flexible max_storage_buffer_binding_size parameter that allows users to explicitly set the maximum storage buffer binding size.

Key changes:

  • Replaced boolean testing flag with a configurable uint64_t parameter for controlling storage buffer binding size
  • Moved segment calculation from ProgramBase member storage to separate computed vectors
  • Made ProgramBase const-correct in Run operations to prevent unintended modifications

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
webgpu_provider_options.h Renamed option key from testing-specific to production-ready name
webgpu_provider_factory.cc Replaced boolean parsing with uint64_t parsing using std::from_chars
webgpu_context.h Changed parameter type from bool to uint64_t with validation logic
webgpu_context.cc Updated Run signature to const and moved indirect dispatch handling
shader_helper.h Added spans for segments and removed FinalizeInputs method
shader_helper.cc Updated to use segment spans instead of ProgramInput/Output members
program_manager.h Changed CalculateSegmentsForInputsAndOutputs to return segments via out parameters
program_manager.cc Implemented segment calculation into separate vectors
program_cache_key.h Updated signature to accept segment spans
program_cache_key.cc Updated to use segments from spans with optimization for default values
program.h Removed segments members from ProgramInput/Output structs and related setters
program.cc Moved indirect dispatch tensor registration to SetIndirectDispatchTensor
compute_context.h Updated RunProgram to accept const reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
} else {
bind_group_entries.push_back({nullptr, entry_index++, buffer, 0, std::min(kMaxBufferSize, buffer_size), nullptr, nullptr});
bind_group_entries.push_back({nullptr, entry_index++, buffer, 0, WGPU_WHOLE_SIZE, nullptr, nullptr});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder that is there a specific reason to use WGPU_WHOLE_SIZE (UINT64_MAX)? Intuitively, using the actual buffer size be more appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason is I try to avoid perform one call to wgpuBufferGetSize, which is trivial in native but can be expensive in web because it involves an inter-op call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better to change it only bind the normalized tensor size (align with 16) which is the actual size we need and no extra memory is bound and no wgpuBufferGetSize is called? Anyway, I think it's ok to leave it as WGPU_WHOLE_SIZE. I am just curious whether it helps the performance if we only bind the needed size instead of the buffer size which is the bucket size.

Copy link
Contributor Author

@fs-eire fs-eire Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think about how to do this in another PR. Ideally, all necessary information for creating binding group can be determined earlier.

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants