Skip to content

Add Triton Python packages to Jetson ONNX images#42

Open
aseembits93 wants to merge 25 commits into
mainfrom
jetson-triton-python-packages
Open

Add Triton Python packages to Jetson ONNX images#42
aseembits93 wants to merge 25 commits into
mainfrom
jetson-triton-python-packages

Conversation

@aseembits93
Copy link
Copy Markdown
Owner

@aseembits93 aseembits93 commented May 22, 2026

Summary

  • install the Triton Python package in the runtime images built from Dockerfile.onnx.jetson.6.0.0
  • install the Triton Python package in the runtime images built from Dockerfile.onnx.jetson.6.2.0
  • install the Triton Python package in the runtime images built from Dockerfile.onnx.jetson.7.1.0

Testing

  • git diff --check -- docker/dockerfiles/Dockerfile.onnx.jetson.6.0.0 docker/dockerfiles/Dockerfile.onnx.jetson.6.2.0 docker/dockerfiles/Dockerfile.onnx.jetson.7.1.0
  • docker buildx build --check -f docker/dockerfiles/Dockerfile.onnx.jetson.6.0.0 .
  • docker buildx build --check -f docker/dockerfiles/Dockerfile.onnx.jetson.6.2.0 .
  • docker buildx build --check -f docker/dockerfiles/Dockerfile.onnx.jetson.7.1.0 .

Notes

  • docker buildx build --check reports existing UndefinedVar warnings in these Dockerfiles; this Triton change did not add new ones.

aseembits93 and others added 25 commits May 22, 2026 16:33
* Update requirements on inference_models
* CI
* sam3 0.1.3 -> 0.1.4
* update uv.lock
…#2373)

* optional lock

* test

* model manager

* Move lock to the class; pass lock only if USE_INFERENCE_MODELS is set

* inference_models 0.28.4 -> 0.28.5

* inference_models 0.28.5 -> 0.28.6

* changelog
roboflow#2375)

* fix: harden auth middleware against Starlette BadHost (CVE-2026-48710)

The serverless and dedicated-deployment auth middlewares in http_api.py
gated their public-path allowlist on request.url.path, which vulnerable
Starlette (< 1.0.1) derives from the unvalidated Host header. A request
with a crafted Host (e.g. `Host: x/docs?`) could make request.url.path
read as an allowlisted path while ASGI routed to an authenticated
handler — bypassing API-key auth.

Defense in depth:
  - Bump fastapi to a release line that ships patched Starlette and add
    an explicit `starlette>=1.0.1` floor.
  - Read the path from `request.scope["path"]` (the raw ASGI path,
    untouched by Host header) inside both auth middlewares.

Logging/telemetry uses of request.url.path are left as-is; they are
informational and not authorization decisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(deps): bump fastapi floor to 0.133 so starlette>=1.0.1 can resolve

FastAPI < 0.133 caps its starlette dependency below 1.0 (e.g. 0.119.x
requires `starlette<0.49.0,>=0.40.0`), so `fastapi<0.120` together with
the `starlette>=1.0.1` security floor was an unsatisfiable resolver
problem and broke CI installs.

FastAPI 0.133.0 dropped the starlette upper bound (`starlette>=0.40.0`,
no cap), letting Starlette 1.0.1+ resolve cleanly. Verified locally:
unit tests pass on both fastapi==0.133.0 + starlette==1.1.0 (the new
floor) and fastapi==0.135.4 + starlette==1.1.0. on_event is still
deprecated-but-functional in this range.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test+fix: dedicated-deployment injection coverage; scope-path the ContextVar and auth span

Addresses reviewer items #1-#3.

Test: adds test_dedicated_deployment_auth_middleware_rejects_host_header_path_injection
(plus a _build_dedicated_deployment_interface helper) so the check_authorization
middleware has symmetric coverage with check_authorization_serverless. Verified
under vulnerable Starlette 0.37.2: reverting just the dedicated middleware to
request.url.path makes this test fail on the very first injection variant
("Host-injection bypass for header 'testserver/docs?': expected 401, got 200").

Code: two more request.url.path → request.scope["path"] swaps in the same
spirit as the auth middlewares:
  - set_request_path_context middleware: the current_request_path ContextVar
    flows into ModelManagerBase._model_request_paths and is reported back in
    model-info responses (base.py:644). Reviewer flagged this as the
    "leaks into model_load_info / request_model_ids" case.
  - check_authorization_serverless OTel span: the span records the auth
    decision itself ("serverless.authorization.check"), so its http.target
    attribute must not be Host-forgeable.

The remaining informational request.url.path call sites
(_log_serverless_authorization_denial L479, _log_serverless_request_received
L501, the logging at L1120 / L1155) are deliberately left for a follow-up:
they're outside the auth middleware and the original remediation scope
explicitly excluded "logging usages". Worth a separate sweep that updates the
helpers' signatures consistently rather than wedging the change here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
roboflow#2368)

* feat(ent-1188): Brenner camera_focus accepts mono/BGRA/dtypes safely

Normalize to grayscale uint8 before Brenner; guard zero max focus matrix.

Child of ENT-1168.

* style(ent-1188): black format camera_focus v1

---------

Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>
* new block

* fix: silence CodeQL weak-hash warning on workspace cache key

The md5 hash here is a cache-key derivation, not security-sensitive.
Pass usedforsecurity=False to make the intent explicit and satisfy
CodeQL's py/weak-sensitive-data-hashing rule.

* refactor(edit_image_metadata): drop the unused images input

The block never reads image bytes; the images selector existed only as a
batch-dimensionality anchor. Use source_id as the anchor instead so
workflow authors no longer have to wire a dummy image input.

* simplifications

* Case 1: this way, inline variables work

* CASE 2 fixed typing dicts for batches

* offloader

* style: apply black formatting to edit_image_metadata/v1.py

* fix: drop task_id from batch-update log to satisfy CodeQL

CodeQL py/clear-text-logging-sensitive-data flagged task_id at the
batch_update_image_metadata_at_roboflow log site, because its taint
analysis traces flow from the api_key argument through the API call
into the response and on to the log. The taskId is an async job
identifier, not sensitive, but the log of it isn't load-bearing —
drop it to break the taint chain. ValueError on missing taskId also
stops embedding the full response dict for the same reason.

* refactor(edit_image_metadata): replace fire_and_forget with injected offloader

Per Pawel's feedback in the design thread: the BackgroundTasks /
ThreadPoolExecutor branching wasn't the dependency-injection shape he
meant. Replace it with a single typed Protocol — UpdateMetadataOffloader
— accepting workspace_id, updates, api_key and returning the result dict.
The block calls it unconditionally; production wiring decides what the
callable actually does (inline API, background queue, log-only, etc.).

Drops the fire_and_forget manifest field and the fastapi/concurrent
imports. The default offloader is the existing inline single-or-batch
endpoint dispatch, so behavior with no wiring is unchanged.

Registry: adds update_metadata_offloader=None to REGISTERED_INITIALIZERS
so workflows can compile with no executor wired.

* style: collapse typing import to single line for isort

* Bump EE version and update changelog

* fix and renaming

* fix single update

* fix isort ordering in loader.py

---------

Co-authored-by: Damian Kosowski <kosowski.d@gmail.com>
Co-authored-by: Paweł Pęczek <pawel@roboflow.com>
Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>
* feat: add dynamic blocks collection and application to workflow root

This commit introduces a new module for collecting dynamic block definitions from workflows and nested inner workflows. It includes two functions: `collect_dynamic_blocks_definitions_from_workflow_definition` for gathering definitions and `apply_collected_dynamic_blocks_definitions_to_workflow_root` for hoisting them to the root of the workflow definition. The `compile_workflow_graph` function is updated to utilize these new functions, enhancing the dynamic block handling capabilities within the execution engine.

* feat: enhance workflow graph compilation with dynamic block handling

This commit adds a new test suite for collecting dynamic block definitions from nested inner workflows, ensuring that the `compile_workflow_graph` function correctly processes these definitions. Additionally, a minor whitespace adjustment was made in the `core.py` file for improved readability.

* feat: add test for dynamic block equivalence in nested workflows

This commit introduces a new test that verifies the equivalence of a child workflow with dynamic block definitions against a flat workflow with inlined definitions. The test ensures that both workflows produce the same output when executed, enhancing the validation of dynamic block handling in the execution engine.

* feat: enhance dynamic block collection with logging and validation

This commit updates the `collect_dynamic_blocks_definitions_from_workflow_definition` function to log warnings for duplicate dynamic block definitions, ensuring that only the first occurrence is kept. Additionally, it allows malformed entries to pass through for downstream validation. Unit tests are added to verify the logging behavior for duplicates and the handling of non-dict entries, improving the robustness of dynamic block processing in workflows.

---------

Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>
* draft: feat(workflows): add get_runtime_issues() with soft/hard severity on 45 blocks

* renamed from issue to restriction as per PR comment

* added next round of block restrictions

* remove background substraction issues

* Update workflow runtime restrictions based on actual runtime behavior

* adapted for depth stimation

* improved more blocks

* yet more changes gating errors

* added step execution scope to run time restrictions

* Unify workflow block runtime restrictions

Replace separate runtime and input-mode restriction APIs with a single RuntimeRestriction model that carries runtime, step-execution, and input-mode scopes. This makes still-image and remote-execution caveats composable across workflow blocks without duplicated manifest methods.

* added restrictions for remove background

* refactor to use constants instead of strings for runtime input mode and step execution mode

* trigger commit

* fixed bug get/set cache restrictions

* introspection: don't let a bad get_restrictions() crash describe_available_blocks; log + skip

* remove unused Dict / Runtime imports in workflow blocks

Templating leftover from the get_restrictions() rollout: 29 blocks
imported `Dict` from typing without using it, and 29 blocks (largely
the same set) imported `Runtime` from prototypes.block without using
it (they only reference the shared SOFT-restriction presets).
ruff/isort would have caught it on a clean pass; no behavior change.

* restore SAM2 Video LOCAL-only guard

The imperative `if step_execution_mode is not LOCAL: raise
NotImplementedError(...)` in SegmentAnything2VideoBlockV1.__init__
was deleted alongside the get_restrictions() rollout. The engine
does not yet enforce Severity.HARD restrictions, so removing the
raise turned a fail-fast into a silent break (frames dispatched
across worker processes would break the per-video SAM2 session
that holds temporal memory). Restore the guard; the declarative
restriction stays as UI-facing metadata.

* unify StepExecutionMode usage across restriction metadata

Drop the parallel RuntimeStepExecutionMode enum that duplicated the
existing core_steps.common.entities.StepExecutionMode (same
LOCAL/REMOTE values). RuntimeRestriction.applies_to_step_execution_modes
and the 25 consumer blocks now use the canonical StepExecutionMode
directly. Also restores a missing `List` typing import in
visualizations/heatmap/v1.py that pre-dated this PR.

* added test for restriction with error/ raising

* isorted

* narrow GPU-required HARD restriction to LOCAL step execution

* fix gaze issue

* changed seg preview fo rlocal exec message

* added comment for local file

* removed not factual coment for onvif

* move StepExecutionMode to prototypes/block to fix inverted dependency

prototypes/block.py was importing StepExecutionMode from
core_steps/common/entities.py, reversing the architectural arrow
(core_steps depends on prototypes, not the other way around) and
leaving a latent circular-import trap.

Make prototypes/block.py the canonical home for StepExecutionMode
alongside the other runtime/restriction enums (Severity, Runtime,
RuntimeInputMode), delete core_steps/common/entities.py, and rewrite
all ~175 import sites (blocks, tests, docs) to point at the new
location. No shim is left behind.

* fix seg preview restriction: block self-hosted runtimes, not HOSTED_SERVERLESS

* onvif: split restriction into remote-step-exec and hosted-no-LAN

* fix failing test

* cache: drop runtime filter so REMOTE-mode restriction applies everywhere

Cache Get/Set raise NotImplementedError whenever step_execution_mode is
not LOCAL, regardless of runtime. Limiting applies_to_runtimes to
HOSTED_SERVERLESS and DEDICATED_DEPLOYMENT hid the failure for
self-hosted CPU/GPU + REMOTE. Removing the filter lets the restriction
match every runtime, which is what the run() check actually enforces.

* Revert "move StepExecutionMode to prototypes/block to fix inverted dependency"

This reverts commit 4aadf0b.

* prototypes: own StepExecutionMode, leave core_steps shim

prototypes/block.py was importing StepExecutionMode from
core_steps/common/entities.py, reversing the architectural arrow
(core_steps depends on prototypes, not the other way around) and
leaving a latent circular-import trap.

Move the canonical definition of StepExecutionMode into
prototypes/block.py alongside the other runtime/restriction enums
(Severity, Runtime, RuntimeInputMode), and turn
core_steps/common/entities.py into a thin re-export shim so the
~175 existing import sites keep working unchanged. The shim
re-exports the same class object, so identity / isinstance / enum
equality checks behave identically across both import paths.

---------

Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>
* Update Execution Engine to v1.10.1 with dynamic block enhancements

This commit updates the Execution Engine version to `v1.10.1` and introduces support for dynamic blocks in nested inner workflows. The compiler now collects and deduplicates dynamic block definitions from both root and nested workflows, ensuring correct compilation and execution of child steps. Additionally, tests have been updated to reflect the new version in the response assertions.

Changelog entry added for the new features and improvements.

* Add Execution Engine versioning and changelog guidelines

This commit introduces a new markdown file that outlines the process for updating the Execution Engine version and changelog when changes are made to workflow compilation or execution logic. It specifies when version updates are required, the necessary steps for updating the version constant and changelog, and provides guidance on distinguishing between patch and minor version changes. This documentation aims to streamline the versioning process and ensure consistency across updates.

* Fix typo in changelog entry for Execution Engine dictionary recognition capability

---------

Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>
…ask when used in inference models, enabled by default for old versions of IS block (roboflow#2384)
…ONNX + TorchScript + TRT) (roboflow#2372)

* feat(yolo26-sem): YOLO26 semantic segmentation via inference_models (ONNX + TorchScript)

Adds public-pretrained YOLO26-Sem support (Ultralytics 8.4.52/53,
Cityscapes pretrains, 1024x1024) to inference through inference_models.
This PR covers the ONNX and TorchScript backends; the TRT backend ships
in a follow-up once we can build and validate the engines.

Pieces:

1. Shared semantic-seg post-processing helper
   `post_process_semantic_segmentation_logits()` in
   `inference_models/models/common/roboflow/post_processing.py`. Handles
   the softmax -> argmax -> letterbox-crop -> resize pipeline. All three
   DeepLabV3+ backends (ONNX, Torch, TRT) refactored to delegate here,
   replacing 3 near-identical inline implementations.

2. YOLO26ForSemanticSegmentation{Onnx,TorchScript} classes inheriting
   SemanticSegmentationModel and delegating post-processing to the
   shared helper. Reuses INFERENCE_MODELS_YOLO26_DEFAULT_CONFIDENCE
   (no new constant).

3. Two registry tuples in models_registry.py for
   (yolo26, semantic-segmentation, {ONNX, TORCH_SCRIPT}). One dispatch
   entry in inference/models/utils.py routing
   ("semantic-segmentation", "yolo26") -> existing generic
   InferenceModelsSemanticSegmentationAdapter.

Tests:
- Unit: import smoke for ONNX + TorchScript, registry lookup for both
  backends, synthetic-tensor coverage for the shared post-processing
  helper (winning-class collapse + sub-threshold background fallback).
- Integration ONNX path validated end-to-end against staging for all
  five sizes (yolo26{n,s,m,l,x}-sem-1024) with the bus.jpg test image
  through /infer/semantic_segmentation.

Follow-up PR adds the TRT backend (class + registry tuple + integration
test) once the trt-compiler image is rebuilt against this PR and the
engine packages are available to test against.

* fix(yolo26-sem): require background class in semantic-seg packages

Replace the silent `background_class_id = -1` fallback with a shared
`resolve_background_class_id()` helper that raises CorruptedModelPackageError
when `class_names.txt` has no `background` entry.

A negative background id is never a valid output: it aliases a real class
via negative indexing in downstream consumers (`class_names[-1]`, palette
LUTs) and breaks the platform 0=background convention, silently corrupting
the segmentation map. Failing loud at load time surfaces a misbuilt package
instead. The conversion side (roboflow-model-conversion#92) guarantees
`background` is prepended, so correctly-built packages are unaffected.

Wires the helper into both YOLO26-sem classes and all three DeepLabV3+
backends, removing the duplicated try/except idiom.

* feat(yolo26-sem): TRT backend for YOLO26 semantic segmentation (roboflow#2379)

* fix(yolo26-sem): guard TorchScript load with torchscript_global_lock

torch.jit.load shares a non-thread-safe process-global; wrap the load in
torchscript_global_lock(torchscript_state_global_lock) and accept the lock
in from_pretrained, matching the other YOLO26 TorchScript classes.
Addresses review feedback on roboflow#2372.

* bump requirements inference_models==0.28.7

---------

Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>
… to 2.5-flash (roboflow#2395)

This commit cleans up the Gemini model definitions by removing references to outdated models (gemini-2.0-flash and gemini-1.5 series) across v1, v2, and v3 files. The default model version has been updated to "gemini-2.5-flash" in the relevant classes and tests to reflect the current standard. Additionally, integration tests have been adjusted to include the latest model versions, ensuring compatibility with the updated model list.
@aseembits93 aseembits93 requested a review from dkosowski87 as a code owner June 1, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants