ROI pool CUDA: compare in acc_type to fix half/MSVC build#9399
ROI pool CUDA: compare in acc_type to fix half/MSVC build#9399kimchioverfit wants to merge 6 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9399
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 20 PendingAs of commit 1d9444a with merge base 4729419 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @kimchioverfit! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Add <ATen/AccumulateType.h>
No functional changes. Only formatting adjustments required by clang-format (include order and spacing).
|
I noticed an ongoing regression where torchvision / PyTorch nightly Windows CUDA 12.6 and 12.8 jobs are failing with a similar cudaErrorSymbolNotFound issue. Since CUDA 13.0 passes and this appears to reproduce in nightly as well, this may not be directly caused by this PR. That said, I might be mistaken, so please let me know if there’s anything I should further investigate or adjust on my side. |
NicolasHug
left a comment
There was a problem hiding this comment.
Hi @kimchioverfit , thanks for the PR.
I'm not sure we'll be able to move forward with it, because the issue seems to be compiler specific and we only try to support the compiler that torchvision is built with for the official wheels (I'm not sure which one it is, but it should be findable through our github workflows in .github).
Since it seems that you have a compilation error, can you just deactivate that kind of error in the compiler you're using? Otherwise, I suggest you align your compiler with the one torchvision uses.
|
I noticed that the CUDA 12.6/12.8 failure is marked as a “NEW FAILURE” in this PR. However, Actions run #22314833507 already shows CUDA 12.6/12.8 failing with the same cudaErrorSymbolNotFound error. Based on that, it seems this might be a pre-existing trunk issue rather than something introduced by this change. Would it be possible to re-run the CI or double-check whether this failure is indeed unrelated to this PR? Thank you for your time. |
Fixes #9246
Summary
Fix MSVC + CUDA compilation failure in ROI Pool when
T=halfis dispatched.On Windows (MSVC) with CUDA, instantiating the kernel through
AT_DISPATCH_FLOATING_TYPES_AND_HALFcauses an ambiguousoperator>error when comparing twoat::Halfvalues.Root Cause
MSVC reports ambiguity for
operator>whenT=halfinsideroi_pool_forward_kernel_impl, due to multiple viable overloads.Change
Perform the comparison in
at::acc_type<T, true>instead ofTto avoid operator ambiguity while preserving numerical behavior.BC / Functional Impact
No functional change is expected.
The modification only affects the comparison type used during compilation.