Skip to content

Conversation

@featzhang
Copy link
Member

What is the purpose of the change

This PR introduces a new optional Triton inference module under flink-models, enabling Flink to invoke external NVIDIA Triton Inference Server for batch-oriented model inference.

The module implements a reusable runtime-level integration based on the existing model provider SPI, allowing users to define Triton-backed models via CREATE MODEL and execute inference through ML_PREDICT without modifying the Flink planner or SQL execution semantics.


Brief change log

  • Added a new flink-model-triton module under flink-models
  • Implemented a Triton model provider based on the existing model inference framework
  • Supported asynchronous and batched inference via HTTP/REST API
  • Added documentation for Triton model usage and configuration
  • Extended SQL documentation to list Triton as a supported model provider

Verifying this change

  • Verified module compilation and packaging
  • Added unit tests for the Triton model provider factory
  • Manually validated model invocation logic against a local Triton server

Does this pull request potentially affect one of the following parts?

  • API changes: No
  • Planner changes: No
  • Runtime changes: No
  • SQL semantics changes: No

Documentation

  • Added dedicated documentation under docs/connectors/models/triton.md
  • Updated SQL model inference documentation to include Triton as a supported provider

Related issues

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 5, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build


# Triton

The Triton Model Function allows Flink SQL to call [NVIDIA Triton Inference Server](https://github.com/triton-inference-server/server) for real-time model inference tasks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments, can you ask on the dev list whether this requires a Flip please; to me it seems big enough to warrant a Flip.

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Jan 5, 2026
```sql
CREATE TEMPORARY VIEW movie_reviews(id, movie_name, user_review, actual_sentiment)
AS VALUES
(1, 'Great Movie', 'This movie was absolutely fantastic! Great acting and storyline.', 'positive'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder whether -1, 0 and +1 would be more intuitive values.


Here's an example `config.pbtxt` for a text classification model:

```protobuf
Copy link
Contributor

@davidradl davidradl Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we explicitly say that this should be in the text-classification/ folder.

├── text-classification/
│ ├── config.pbtxt
│ └── 1/
│ └── model.py # or model.onnx, model.plan, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the following example what file do we use for model.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — this refers to the Triton Python backend model file.

In this example, model.py is the Python backend implementation located in the Triton model repository, specifically under:

text-classification/
├── config.pbtxt
└── 1/
└── model.py

The exact contents of model.py are not relevant to Flink itself. Flink interacts with the model only via the Triton HTTP/gRPC inference API, and does not load or execute the model code directly.

To avoid ambiguity, I will update the documentation to explicitly state that this file resides in the text-classification/ model directory.

@featzhang featzhang changed the title [FLINK-38857][Model] Introduce a Triton inference module under flink-models for batch-oriented AI inference [FLINK-38857][Model] Introduce a Triton inference module under flink-models Jan 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants