test(rfdetr): expand Triton preprocess parity and validation coverage#33
Open
aseembits93 wants to merge 2 commits into
Conversation
Adds three groups of tests to test_triton_preprocess.py:
- Input validation: rejects CPU tensors, wrong dtype, wrong shape,
mismatched out buffer shape/dtype.
- Buffer reuse: verifies the kernel writes into a provided out tensor
without reallocating (the real adapter relies on this).
- End-to-end parity against pre_process_network_input STRETCH_TO: the
Triton fast path must match the non-Triton wrapper the model calls
when RFDETR_USE_TRITON_PREPROC is off. Tolerance (2-LSB / min(std))
covers cv2 5-bit fixed-point vs Triton fp32 bilinear divergence.
- Solid-color analytic parity: every output pixel exactly equals the
per-channel (x/255 - mean)/std on a uniform input.
13 tests total in the file (was 3), all passing on T4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…H_FOR_PREPROCESSING path Adds tests/inference/unit_tests/models/rfdetr/test_triton_preprocess_vs_pytorch.py comparing triton_preprocess_rfdetr (the kernel called from _try_triton_preprocess in inference/models/rfdetr/rfdetr.py) against the USE_PYTORCH_FOR_PREPROCESSING code path. Findings (each pinned by a dedicated test): 1. The production pytorch path normalizes BGR-ordered channels using RGB-ordered means/stds, then swaps BGR->RGB at the end. Net: output channel 0 (R) gets R-pixel data normalized with B's mean/std, and vice versa for channel 2. Only G is self-consistent. See test_documented_pytorch_path_channel_mixup. 2. The production pytorch path pads the letterbox region with raw 114.0 in the already-normalized tensor, instead of the per-channel normalized grey the kernel emits. See test_documented_pytorch_path_pad_region. 3. When (target_dim - scaled_dim) is odd, the pytorch path uses asymmetric integer padding (top=floor/2, bottom=remainder) while the kernel keeps pad_y as a float in its inverse bilinear. This introduces a half-pixel vertical drift between the two outputs. See test_letterbox_half_pixel_case_diverges_predictably. Content-region parity tests use source shapes where (target-scaled) is even on both axes to sidestep the half-pixel drift; the kernel matches the "what the pytorch path was trying to express" reference to ~1 LSB. Also pins kernel determinism and the solid-color analytic case. 11 new tests, all passing on T4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expands `test_triton_preprocess.py` from 3 tests (torch-reference parity only) to 13, grouped into:
Brings total file count to 13 tests; all pass on T4.
Test plan
Base branch
Targets `perf/optimize-rfdetr-seg-plus-is-seg-dataclasses-copy` (#31) since the tests import `triton_preprocess_rfdetr_stretch`, which only exists on that branch.
🤖 Generated with Claude Code