test(rfdetr): expand Triton preprocess parity and validation coverage by aseembits93 · Pull Request #33 · aseembits93/inference

aseembits93 · 2026-05-03T21:38:41Z

Summary

Expands `test_triton_preprocess.py` from 3 tests (torch-reference parity only) to 13, grouped into:

Input validation (5 tests) — kernel raises `ValueError` on CPU tensor / non-uint8 dtype / CHW-instead-of-HWC shape / mismatched `out` buffer shape or dtype.
Buffer reuse (1 test) — kernel writes into a caller-provided `out` tensor without reallocating. The real adapter in `rfdetr_instance_segmentation_trt.py` relies on this to avoid per-frame allocations.
End-to-end parity against `pre_process_network_input` STRETCH_TO (3 tests) — feeds the same BGR frame through both paths (non-Triton wrapper with `resize_mode=STRETCH_TO`, `color_mode=RGB`, `normalization=(means, stds)` vs the Triton kernel) and compares. Tolerance is `2 / 255 / min(std)` — cv2's INTER_LINEAR uses 5-bit fixed-point coefficients that diverge by up to ~1.5 LSBs from Triton's fp32 pixel-center bilinear on large downscales.
Solid-color analytic parity (1 test) — every output pixel must equal the exact analytic normalization of the input color (no interpolation ambiguity), atol `1e-5`.

Brings total file count to 13 tests; all pass on T4.

Test plan

`pytest tests/unit_tests/models/rfdetr/test_triton_preprocess.py` — 13/13 pass
`pytest tests/unit_tests/models/rfdetr/test_pre_processing.py` — existing 16/16 still pass
Module-level skip keeps CPU CI green: `pytest.importorskip("triton")` + `cv2` + `torch.cuda.is_available()` gates

Base branch

Targets `perf/optimize-rfdetr-seg-plus-is-seg-dataclasses-copy` (#31) since the tests import `triton_preprocess_rfdetr_stretch`, which only exists on that branch.

🤖 Generated with Claude Code

Adds three groups of tests to test_triton_preprocess.py: - Input validation: rejects CPU tensors, wrong dtype, wrong shape, mismatched out buffer shape/dtype. - Buffer reuse: verifies the kernel writes into a provided out tensor without reallocating (the real adapter relies on this). - End-to-end parity against pre_process_network_input STRETCH_TO: the Triton fast path must match the non-Triton wrapper the model calls when RFDETR_USE_TRITON_PREPROC is off. Tolerance (2-LSB / min(std)) covers cv2 5-bit fixed-point vs Triton fp32 bilinear divergence. - Solid-color analytic parity: every output pixel exactly equals the per-channel (x/255 - mean)/std on a uniform input. 13 tests total in the file (was 3), all passing on T4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…H_FOR_PREPROCESSING path Adds tests/inference/unit_tests/models/rfdetr/test_triton_preprocess_vs_pytorch.py comparing triton_preprocess_rfdetr (the kernel called from _try_triton_preprocess in inference/models/rfdetr/rfdetr.py) against the USE_PYTORCH_FOR_PREPROCESSING code path. Findings (each pinned by a dedicated test): 1. The production pytorch path normalizes BGR-ordered channels using RGB-ordered means/stds, then swaps BGR->RGB at the end. Net: output channel 0 (R) gets R-pixel data normalized with B's mean/std, and vice versa for channel 2. Only G is self-consistent. See test_documented_pytorch_path_channel_mixup. 2. The production pytorch path pads the letterbox region with raw 114.0 in the already-normalized tensor, instead of the per-channel normalized grey the kernel emits. See test_documented_pytorch_path_pad_region. 3. When (target_dim - scaled_dim) is odd, the pytorch path uses asymmetric integer padding (top=floor/2, bottom=remainder) while the kernel keeps pad_y as a float in its inverse bilinear. This introduces a half-pixel vertical drift between the two outputs. See test_letterbox_half_pixel_case_diverges_predictably. Content-region parity tests use source shapes where (target-scaled) is even on both axes to sidestep the half-pixel drift; the kernel matches the "what the pytorch path was trying to express" reference to ~1 LSB. Also pins kernel determinism and the solid-color analytic case. 11 new tests, all passing on T4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aseembits93 and others added 2 commits May 3, 2026 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(rfdetr): expand Triton preprocess parity and validation coverage#33

test(rfdetr): expand Triton preprocess parity and validation coverage#33
aseembits93 wants to merge 2 commits into
perf/optimize-rfdetr-seg-plus-is-seg-dataclasses-copyfrom
test/rfdetr-triton-preprocess-parity

aseembits93 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aseembits93 commented May 3, 2026

Summary

Test plan

Base branch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant