Add Triton Python packages to Jetson ONNX images by aseembits93 · Pull Request #42 · aseembits93/inference

aseembits93 · 2026-05-22T23:40:23Z

Summary

install the Triton Python package in the runtime images built from Dockerfile.onnx.jetson.6.0.0
install the Triton Python package in the runtime images built from Dockerfile.onnx.jetson.6.2.0
install the Triton Python package in the runtime images built from Dockerfile.onnx.jetson.7.1.0

Testing

git diff --check -- docker/dockerfiles/Dockerfile.onnx.jetson.6.0.0 docker/dockerfiles/Dockerfile.onnx.jetson.6.2.0 docker/dockerfiles/Dockerfile.onnx.jetson.7.1.0
docker buildx build --check -f docker/dockerfiles/Dockerfile.onnx.jetson.6.0.0 .
docker buildx build --check -f docker/dockerfiles/Dockerfile.onnx.jetson.6.2.0 .
docker buildx build --check -f docker/dockerfiles/Dockerfile.onnx.jetson.7.1.0 .

Notes

docker buildx build --check reports existing UndefinedVar warnings in these Dockerfiles; this Triton change did not add new ones.

This reverts commit dbb72a4.

* Update requirements on inference_models * CI * sam3 0.1.3 -> 0.1.4 * update uv.lock

…#2373) * optional lock * test * model manager * Move lock to the class; pass lock only if USE_INFERENCE_MODELS is set * inference_models 0.28.4 -> 0.28.5 * inference_models 0.28.5 -> 0.28.6 * changelog

roboflow#2375) * fix: harden auth middleware against Starlette BadHost (CVE-2026-48710) The serverless and dedicated-deployment auth middlewares in http_api.py gated their public-path allowlist on request.url.path, which vulnerable Starlette (< 1.0.1) derives from the unvalidated Host header. A request with a crafted Host (e.g. `Host: x/docs?`) could make request.url.path read as an allowlisted path while ASGI routed to an authenticated handler — bypassing API-key auth. Defense in depth: - Bump fastapi to a release line that ships patched Starlette and add an explicit `starlette>=1.0.1` floor. - Read the path from `request.scope["path"]` (the raw ASGI path, untouched by Host header) inside both auth middlewares. Logging/telemetry uses of request.url.path are left as-is; they are informational and not authorization decisions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(deps): bump fastapi floor to 0.133 so starlette>=1.0.1 can resolve FastAPI < 0.133 caps its starlette dependency below 1.0 (e.g. 0.119.x requires `starlette<0.49.0,>=0.40.0`), so `fastapi<0.120` together with the `starlette>=1.0.1` security floor was an unsatisfiable resolver problem and broke CI installs. FastAPI 0.133.0 dropped the starlette upper bound (`starlette>=0.40.0`, no cap), letting Starlette 1.0.1+ resolve cleanly. Verified locally: unit tests pass on both fastapi==0.133.0 + starlette==1.1.0 (the new floor) and fastapi==0.135.4 + starlette==1.1.0. on_event is still deprecated-but-functional in this range. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test+fix: dedicated-deployment injection coverage; scope-path the ContextVar and auth span Addresses reviewer items #1-#3. Test: adds test_dedicated_deployment_auth_middleware_rejects_host_header_path_injection (plus a _build_dedicated_deployment_interface helper) so the check_authorization middleware has symmetric coverage with check_authorization_serverless. Verified under vulnerable Starlette 0.37.2: reverting just the dedicated middleware to request.url.path makes this test fail on the very first injection variant ("Host-injection bypass for header 'testserver/docs?': expected 401, got 200"). Code: two more request.url.path → request.scope["path"] swaps in the same spirit as the auth middlewares: - set_request_path_context middleware: the current_request_path ContextVar flows into ModelManagerBase._model_request_paths and is reported back in model-info responses (base.py:644). Reviewer flagged this as the "leaks into model_load_info / request_model_ids" case. - check_authorization_serverless OTel span: the span records the auth decision itself ("serverless.authorization.check"), so its http.target attribute must not be Host-forgeable. The remaining informational request.url.path call sites (_log_serverless_authorization_denial L479, _log_serverless_request_received L501, the logging at L1120 / L1155) are deliberately left for a follow-up: they're outside the auth middleware and the original remediation scope explicitly excluded "logging usages". Worth a separate sweep that updates the helpers' signatures consistently rather than wedging the change here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…w#2376)

roboflow#2368) * feat(ent-1188): Brenner camera_focus accepts mono/BGRA/dtypes safely Normalize to grayscale uint8 before Brenner; guard zero max focus matrix. Child of ENT-1168. * style(ent-1188): black format camera_focus v1 --------- Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>

* new block * fix: silence CodeQL weak-hash warning on workspace cache key The md5 hash here is a cache-key derivation, not security-sensitive. Pass usedforsecurity=False to make the intent explicit and satisfy CodeQL's py/weak-sensitive-data-hashing rule. * refactor(edit_image_metadata): drop the unused images input The block never reads image bytes; the images selector existed only as a batch-dimensionality anchor. Use source_id as the anchor instead so workflow authors no longer have to wire a dummy image input. * simplifications * Case 1: this way, inline variables work * CASE 2 fixed typing dicts for batches * offloader * style: apply black formatting to edit_image_metadata/v1.py * fix: drop task_id from batch-update log to satisfy CodeQL CodeQL py/clear-text-logging-sensitive-data flagged task_id at the batch_update_image_metadata_at_roboflow log site, because its taint analysis traces flow from the api_key argument through the API call into the response and on to the log. The taskId is an async job identifier, not sensitive, but the log of it isn't load-bearing — drop it to break the taint chain. ValueError on missing taskId also stops embedding the full response dict for the same reason. * refactor(edit_image_metadata): replace fire_and_forget with injected offloader Per Pawel's feedback in the design thread: the BackgroundTasks / ThreadPoolExecutor branching wasn't the dependency-injection shape he meant. Replace it with a single typed Protocol — UpdateMetadataOffloader — accepting workspace_id, updates, api_key and returning the result dict. The block calls it unconditionally; production wiring decides what the callable actually does (inline API, background queue, log-only, etc.). Drops the fire_and_forget manifest field and the fastapi/concurrent imports. The default offloader is the existing inline single-or-batch endpoint dispatch, so behavior with no wiring is unchanged. Registry: adds update_metadata_offloader=None to REGISTERED_INITIALIZERS so workflows can compile with no executor wired. * style: collapse typing import to single line for isort * Bump EE version and update changelog * fix and renaming * fix single update * fix isort ordering in loader.py --------- Co-authored-by: Damian Kosowski <kosowski.d@gmail.com> Co-authored-by: Paweł Pęczek <pawel@roboflow.com> Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>

* feat: add dynamic blocks collection and application to workflow root This commit introduces a new module for collecting dynamic block definitions from workflows and nested inner workflows. It includes two functions: `collect_dynamic_blocks_definitions_from_workflow_definition` for gathering definitions and `apply_collected_dynamic_blocks_definitions_to_workflow_root` for hoisting them to the root of the workflow definition. The `compile_workflow_graph` function is updated to utilize these new functions, enhancing the dynamic block handling capabilities within the execution engine. * feat: enhance workflow graph compilation with dynamic block handling This commit adds a new test suite for collecting dynamic block definitions from nested inner workflows, ensuring that the `compile_workflow_graph` function correctly processes these definitions. Additionally, a minor whitespace adjustment was made in the `core.py` file for improved readability. * feat: add test for dynamic block equivalence in nested workflows This commit introduces a new test that verifies the equivalence of a child workflow with dynamic block definitions against a flat workflow with inlined definitions. The test ensures that both workflows produce the same output when executed, enhancing the validation of dynamic block handling in the execution engine. * feat: enhance dynamic block collection with logging and validation This commit updates the `collect_dynamic_blocks_definitions_from_workflow_definition` function to log warnings for duplicate dynamic block definitions, ensuring that only the first occurrence is kept. Additionally, it allows malformed entries to pass through for downstream validation. Unit tests are added to verify the logging behavior for duplicates and the handling of non-dict entries, improving the robustness of dynamic block processing in workflows. --------- Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>

* draft: feat(workflows): add get_runtime_issues() with soft/hard severity on 45 blocks * renamed from issue to restriction as per PR comment * added next round of block restrictions * remove background substraction issues * Update workflow runtime restrictions based on actual runtime behavior * adapted for depth stimation * improved more blocks * yet more changes gating errors * added step execution scope to run time restrictions * Unify workflow block runtime restrictions Replace separate runtime and input-mode restriction APIs with a single RuntimeRestriction model that carries runtime, step-execution, and input-mode scopes. This makes still-image and remote-execution caveats composable across workflow blocks without duplicated manifest methods. * added restrictions for remove background * refactor to use constants instead of strings for runtime input mode and step execution mode * trigger commit * fixed bug get/set cache restrictions * introspection: don't let a bad get_restrictions() crash describe_available_blocks; log + skip * remove unused Dict / Runtime imports in workflow blocks Templating leftover from the get_restrictions() rollout: 29 blocks imported `Dict` from typing without using it, and 29 blocks (largely the same set) imported `Runtime` from prototypes.block without using it (they only reference the shared SOFT-restriction presets). ruff/isort would have caught it on a clean pass; no behavior change. * restore SAM2 Video LOCAL-only guard The imperative `if step_execution_mode is not LOCAL: raise NotImplementedError(...)` in SegmentAnything2VideoBlockV1.__init__ was deleted alongside the get_restrictions() rollout. The engine does not yet enforce Severity.HARD restrictions, so removing the raise turned a fail-fast into a silent break (frames dispatched across worker processes would break the per-video SAM2 session that holds temporal memory). Restore the guard; the declarative restriction stays as UI-facing metadata. * unify StepExecutionMode usage across restriction metadata Drop the parallel RuntimeStepExecutionMode enum that duplicated the existing core_steps.common.entities.StepExecutionMode (same LOCAL/REMOTE values). RuntimeRestriction.applies_to_step_execution_modes and the 25 consumer blocks now use the canonical StepExecutionMode directly. Also restores a missing `List` typing import in visualizations/heatmap/v1.py that pre-dated this PR. * added test for restriction with error/ raising * isorted * narrow GPU-required HARD restriction to LOCAL step execution * fix gaze issue * changed seg preview fo rlocal exec message * added comment for local file * removed not factual coment for onvif * move StepExecutionMode to prototypes/block to fix inverted dependency prototypes/block.py was importing StepExecutionMode from core_steps/common/entities.py, reversing the architectural arrow (core_steps depends on prototypes, not the other way around) and leaving a latent circular-import trap. Make prototypes/block.py the canonical home for StepExecutionMode alongside the other runtime/restriction enums (Severity, Runtime, RuntimeInputMode), delete core_steps/common/entities.py, and rewrite all ~175 import sites (blocks, tests, docs) to point at the new location. No shim is left behind. * fix seg preview restriction: block self-hosted runtimes, not HOSTED_SERVERLESS * onvif: split restriction into remote-step-exec and hosted-no-LAN * fix failing test * cache: drop runtime filter so REMOTE-mode restriction applies everywhere Cache Get/Set raise NotImplementedError whenever step_execution_mode is not LOCAL, regardless of runtime. Limiting applies_to_runtimes to HOSTED_SERVERLESS and DEDICATED_DEPLOYMENT hid the failure for self-hosted CPU/GPU + REMOTE. Removing the filter lets the restriction match every runtime, which is what the run() check actually enforces. * Revert "move StepExecutionMode to prototypes/block to fix inverted dependency" This reverts commit 4aadf0b. * prototypes: own StepExecutionMode, leave core_steps shim prototypes/block.py was importing StepExecutionMode from core_steps/common/entities.py, reversing the architectural arrow (core_steps depends on prototypes, not the other way around) and leaving a latent circular-import trap. Move the canonical definition of StepExecutionMode into prototypes/block.py alongside the other runtime/restriction enums (Severity, Runtime, RuntimeInputMode), and turn core_steps/common/entities.py into a thin re-export shim so the ~175 existing import sites keep working unchanged. The shim re-exports the same class object, so identity / isinstance / enum equality checks behave identically across both import paths. --------- Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>

* Update Execution Engine to v1.10.1 with dynamic block enhancements This commit updates the Execution Engine version to `v1.10.1` and introduces support for dynamic blocks in nested inner workflows. The compiler now collects and deduplicates dynamic block definitions from both root and nested workflows, ensuring correct compilation and execution of child steps. Additionally, tests have been updated to reflect the new version in the response assertions. Changelog entry added for the new features and improvements. * Add Execution Engine versioning and changelog guidelines This commit introduces a new markdown file that outlines the process for updating the Execution Engine version and changelog when changes are made to workflow compilation or execution logic. It specifies when version updates are required, the necessary steps for updating the version constant and changelog, and provides guidance on distinguishing between patch and minor version changes. This documentation aims to streamline the versioning process and ensure consistency across updates. * Fix typo in changelog entry for Execution Engine dictionary recognition capability --------- Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>

…ask when used in inference models, enabled by default for old versions of IS block (roboflow#2384)

…ONNX + TorchScript + TRT) (roboflow#2372) * feat(yolo26-sem): YOLO26 semantic segmentation via inference_models (ONNX + TorchScript) Adds public-pretrained YOLO26-Sem support (Ultralytics 8.4.52/53, Cityscapes pretrains, 1024x1024) to inference through inference_models. This PR covers the ONNX and TorchScript backends; the TRT backend ships in a follow-up once we can build and validate the engines. Pieces: 1. Shared semantic-seg post-processing helper `post_process_semantic_segmentation_logits()` in `inference_models/models/common/roboflow/post_processing.py`. Handles the softmax -> argmax -> letterbox-crop -> resize pipeline. All three DeepLabV3+ backends (ONNX, Torch, TRT) refactored to delegate here, replacing 3 near-identical inline implementations. 2. YOLO26ForSemanticSegmentation{Onnx,TorchScript} classes inheriting SemanticSegmentationModel and delegating post-processing to the shared helper. Reuses INFERENCE_MODELS_YOLO26_DEFAULT_CONFIDENCE (no new constant). 3. Two registry tuples in models_registry.py for (yolo26, semantic-segmentation, {ONNX, TORCH_SCRIPT}). One dispatch entry in inference/models/utils.py routing ("semantic-segmentation", "yolo26") -> existing generic InferenceModelsSemanticSegmentationAdapter. Tests: - Unit: import smoke for ONNX + TorchScript, registry lookup for both backends, synthetic-tensor coverage for the shared post-processing helper (winning-class collapse + sub-threshold background fallback). - Integration ONNX path validated end-to-end against staging for all five sizes (yolo26{n,s,m,l,x}-sem-1024) with the bus.jpg test image through /infer/semantic_segmentation. Follow-up PR adds the TRT backend (class + registry tuple + integration test) once the trt-compiler image is rebuilt against this PR and the engine packages are available to test against. * fix(yolo26-sem): require background class in semantic-seg packages Replace the silent `background_class_id = -1` fallback with a shared `resolve_background_class_id()` helper that raises CorruptedModelPackageError when `class_names.txt` has no `background` entry. A negative background id is never a valid output: it aliases a real class via negative indexing in downstream consumers (`class_names[-1]`, palette LUTs) and breaks the platform 0=background convention, silently corrupting the segmentation map. Failing loud at load time surfaces a misbuilt package instead. The conversion side (roboflow-model-conversion#92) guarantees `background` is prepended, so correctly-built packages are unaffected. Wires the helper into both YOLO26-sem classes and all three DeepLabV3+ backends, removing the duplicated try/except idiom. * feat(yolo26-sem): TRT backend for YOLO26 semantic segmentation (roboflow#2379) * fix(yolo26-sem): guard TorchScript load with torchscript_global_lock torch.jit.load shares a non-thread-safe process-global; wrap the load in torchscript_global_lock(torchscript_state_global_lock) and accept the lock in from_pretrained, matching the other YOLO26 TorchScript classes. Addresses review feedback on roboflow#2372. * bump requirements inference_models==0.28.7 --------- Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>

… execution (roboflow#2393)

…oboflow#2394)

… to 2.5-flash (roboflow#2395) This commit cleans up the Gemini model definitions by removing references to outdated models (gemini-2.0-flash and gemini-1.5 series) across v1, v2, and v3 files. The default model version has been updated to "gemini-2.5-flash" in the relevant classes and tests to reflect the current standard. Additionally, integration tests have been adjusted to include the latest model versions, ensuring compatibility with the updated model list.

aseembits93 and others added 25 commits May 22, 2026 16:33

Add Triton to Jetson ONNX images

fa1279c

Add Triton wrapper image smoke test

dbb72a4

Revert "Add Triton wrapper image smoke test"

2705e15

This reverts commit dbb72a4.

Merge branch 'main' into jetson-triton-python-packages

8ee906b

Update changelog for Jetson Triton images

88c78f6

Update requirements on inference_models (roboflow#2370)

a11f1b5

* Update requirements on inference_models * CI * sam3 0.1.3 -> 0.1.4 * update uv.lock

fix: serialize TorchScript load/script behind a global lock (roboflow…

b81e9c2

…#2373) * optional lock * test * model manager * Move lock to the class; pass lock only if USE_INFERENCE_MODELS is set * inference_models 0.28.4 -> 0.28.5 * inference_models 0.28.5 -> 0.28.6 * changelog

fix: drop flat point sentinel in SAM3 visual_segment adapter (roboflo…

83ef480

…w#2376)

Merge branch 'main' into jetson-triton-python-packages

00b6fde

fix: emit RLE masks from instance segmentation v4 block (roboflow#2381)

3a114f3

Merge branch 'main' into jetson-triton-python-packages

ced9f4c

Add change to enforce dense representation of instance segmentation m…

b872b73

…ask when used in inference models, enabled by default for old versions of IS block (roboflow#2384)

Enforce transformers < 5.9 (roboflow#2385)

08aa4ff

Add configurable API proxy base URL (roboflow#2366)

615f919

fix(workflows): select v0 API for hosted semantic-segmentation remote…

ec42f97

… execution (roboflow#2393)

fix(aliases): resolve public yolo26-sem model aliases for v0 clients (r…

345a6be

…oboflow#2394)

Merge branch 'main' into jetson-triton-python-packages

eca70aa

aseembits93 requested a review from dkosowski87 as a code owner June 1, 2026 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Triton Python packages to Jetson ONNX images#42

Add Triton Python packages to Jetson ONNX images#42
aseembits93 wants to merge 25 commits into
mainfrom
jetson-triton-python-packages

aseembits93 commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

aseembits93 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

aseembits93 commented May 22, 2026 •

edited

Loading