[ExecuTorch][WebGPU] GPU timestamp query profiling (general implementation)#20201
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20201
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 2 PendingAs of commit cea2305 with merge base 5526971 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@claude review |
|
Claude finished @JulianCloudNTH's task in 2m 2s —— View job Code Review
Overall this is a clean, well-scoped re-port of the Vulkan Correctness / robustness1. const double mean_us = kv.second.first / kv.second.second / 1000.0;
2. Single global query pool is not concurrency-safe — 3. Design / minor4. Heavy include pulled into a widely-used header — 5. Redundant context lookup — 6. Questions
|
SS-JIA
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
…ation) Pull Request resolved: #20201 Backend-agnostic GPU-timestamp infrastructure, split out so the general implementation is foundational (below SDPA) while the SDPA-specific dispatch labeling stays above the SDPA op. Composed of: `WebGPUQueryPool`, a faithful re-port of Vulkan's `vkapi::QueryPool` (`backends/vulkan/runtime/vk_api/QueryPool.{h,cpp}`) — same `ShaderDuration` data model and ticks->ns conversion; three deviations are forced by the WebGPU API (per-dispatch bracketing via a compute-pass `timestampWrites` descriptor since there is no mid-encoder `writeTimestamp`; readback via `resolveQuerySet` + buffer map rather than host-side `vkGetQueryPoolResults`; the `TimestampQuery` capability requested as an explicit device feature, fail-open if the adapter lacks it). `WebGPUDevice` gains timestamp-feature detection, and `WebGPUGraph` gains a per-dispatch `kernel_name` label plus `execute()` bracketing of each compute pass when the pool is active. Opt-in via the `WEBGPU_TIMESTAMP_QUERY` env var; off by default, so the production `execute()` path is byte-identical. The SDPA per-kernel labeling lives in the companion "for SDPA" diff above the SDPA op. Co-authored with Claude. ghstack-source-id: 392956205 @exported-using-ghexport Differential Revision: [D108188287](https://our.internmc.facebook.com/intern/diff/D108188287/)
…ation) Pull Request resolved: #20201 Backend-agnostic GPU-timestamp infrastructure, split out so the general implementation is foundational (below SDPA) while the SDPA-specific dispatch labeling stays above the SDPA op. Composed of: `WebGPUQueryPool`, a faithful re-port of Vulkan's `vkapi::QueryPool` (`backends/vulkan/runtime/vk_api/QueryPool.{h,cpp}`) — same `ShaderDuration` data model and ticks->ns conversion; three deviations are forced by the WebGPU API (per-dispatch bracketing via a compute-pass `timestampWrites` descriptor since there is no mid-encoder `writeTimestamp`; readback via `resolveQuerySet` + buffer map rather than host-side `vkGetQueryPoolResults`; the `TimestampQuery` capability requested as an explicit device feature, fail-open if the adapter lacks it). `WebGPUDevice` gains timestamp-feature detection, and `WebGPUGraph` gains a per-dispatch `kernel_name` label plus `execute()` bracketing of each compute pass when the pool is active. Opt-in via the `WEBGPU_TIMESTAMP_QUERY` env var; off by default, so the production `execute()` path is byte-identical. The SDPA per-kernel labeling lives in the companion "for SDPA" diff above the SDPA op. Co-authored with Claude. ghstack-source-id: 392975889 @exported-using-ghexport Differential Revision: [D108188287](https://our.internmc.facebook.com/intern/diff/D108188287/)
d595e01
into
gh/JulianCloudNTH/22/base
…ation) Pull Request resolved: #20201 Backend-agnostic GPU-timestamp infrastructure, split out so the general implementation is foundational (below SDPA) while the SDPA-specific dispatch labeling stays above the SDPA op. Composed of: `WebGPUQueryPool`, a faithful re-port of Vulkan's `vkapi::QueryPool` (`backends/vulkan/runtime/vk_api/QueryPool.{h,cpp}`) — same `ShaderDuration` data model and ticks->ns conversion; three deviations are forced by the WebGPU API (per-dispatch bracketing via a compute-pass `timestampWrites` descriptor since there is no mid-encoder `writeTimestamp`; readback via `resolveQuerySet` + buffer map rather than host-side `vkGetQueryPoolResults`; the `TimestampQuery` capability requested as an explicit device feature, fail-open if the adapter lacks it). `WebGPUDevice` gains timestamp-feature detection, and `WebGPUGraph` gains a per-dispatch `kernel_name` label plus `execute()` bracketing of each compute pass when the pool is active. Opt-in via the `WEBGPU_TIMESTAMP_QUERY` env var; off by default, so the production `execute()` path is byte-identical. The SDPA per-kernel labeling lives in the companion "for SDPA" diff above the SDPA op. Co-authored with Claude. ghstack-source-id: 392975889 @exported-using-ghexport Differential Revision: [D108188287](https://our.internmc.facebook.com/intern/diff/D108188287/)
…ation) (#20258) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #20201 by @JulianCloudNTH ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/22/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/22/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/22/orig @diff-train-skip-merge Co-authored-by: Julian Ng-Thow-Hing <juliannth@meta.com>
Stack from ghstack (oldest at bottom):
Backend-agnostic GPU-timestamp infrastructure, split out so the general implementation is foundational (below SDPA) while the SDPA-specific dispatch labeling stays above the SDPA op. Composed of:
WebGPUQueryPool, a faithful re-port of Vulkan'svkapi::QueryPool(backends/vulkan/runtime/vk_api/QueryPool.{h,cpp}) — sameShaderDurationdata model and ticks->ns conversion; three deviations are forced by the WebGPU API (per-dispatch bracketing via a compute-passtimestampWritesdescriptor since there is no mid-encoderwriteTimestamp; readback viaresolveQuerySet+ buffer map rather than host-sidevkGetQueryPoolResults; theTimestampQuerycapability requested as an explicit device feature, fail-open if the adapter lacks it).WebGPUDevicegains timestamp-feature detection, andWebGPUGraphgains a per-dispatchkernel_namelabel plusexecute()bracketing of each compute pass when the pool is active. Opt-in via theWEBGPU_TIMESTAMP_QUERYenv var; off by default, so the productionexecute()path is byte-identical. The SDPA per-kernel labeling lives in the companion "for SDPA" diff above the SDPA op.Co-authored with Claude.
@exported-using-ghexport
Differential Revision: D108188287
Differential Revision: D108188287