Skip to content

Conversation

@LukeAVanDrie
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR extends the Endpoint Picker (EPP) plugin system to support a Transient lifecycle, enabling the creation of stateful, per-flow plugin instances at runtime.

Context:
Previously, the EPP plugin system treated all plugins as Singletons instantiated once at startup. While this works well for the existing EPP extension points, it is insufficient for the Flow Control layer. Flow Control requires stateful components (e.g., Queues, Fairness/Ordering Policies) that must be instantiated uniquely for each Flow or Priority Band.

Changes:

  1. Registry & Lifecycle: Introduced LifecycleTransient to pkg/epp/plugins. Transient plugins are registered as "Blueprints" rather than active instances.
  2. Plugin Factory: Added PluginFactory and EPPPluginFactory to instantiate transient plugins on-demand using configuration from the Handle.
    • Supports instanceAlias to assign unique runtime identities (e.g., "tenant-a-queue") to instances created from the same blueprint.
  3. Handle Updates: Updated Handle to store configuration PluginSpecs (blueprints) alongside active plugin instances.
  4. Bootstrap Logic: Updated config/loader to skip the instantiation of Transient plugins during startup. They now remain as blueprints in the Handle, waiting to be hydrated by the Factory.
  5. Documentation: Added pkg/epp/plugins/doc.go providing a high-level architectural overview of the Registry, Factory, and the distinction between the Scheduling DAG (Singletons) and Flow Control (Transient).

Which issue(s) this PR fixes:
Tracks #1715

Does this PR introduce a user-facing change?:

NONE

This commit introduces the foundational support for "Transient" plugins,
plugins that are instantiated on-demand at runtime rather than as
singletons at startup.

- Adds `LifecycleTransient` to the `Registry`.
- Adds `PluginFactory` interface and `EPPPluginFactory` implementation
  for creating instances from blueprints.
- Adds `doc.go` to explain the new architecture.
- Adds comprehensive unit tests for `Registry` and `Factory`.
Updates the `plugins.Handle` interface to serve as a repository for
Plugin Blueprints (Specs). This is required for the Factory to look up
configuration when instantiating transient plugins.

- Adds `PluginSpec(name string)` to the `Handle` interface.
- Updates `NewEppHandle` to accept a list of `PluginSpecs`.
- Updates `test/utils/handle.go` (mock) to support specs.
- Adds unit tests for `Handle` immutability and lookup.
Integrates the new plugin lifecycle into the application startup flow.

- Updates `instantiatePlugins` in the config loader to skip
  instantiation of `Transient` plugins (leaving them as blueprints in
	the `Handle`).
- Updates `parseConfigurationPhaseTwo` in the runner to pass the raw
  plugin specs into the `Handle` constructor.
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 9, 2025
@netlify
Copy link

netlify bot commented Dec 9, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit a0b9d0c
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6937703195676a0008f138d4
😎 Deploy Preview https://deploy-preview-1977--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LukeAVanDrie
Once this PR has been reviewed and has the lgtm label, please assign kfswain for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and shmuelk December 9, 2025 00:41
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 9, 2025
@LukeAVanDrie
Copy link
Contributor Author

FYI @RyanRosario + those looking into #1797. This starts the process of making these policies easily injectable (and eventually configurable).

@LukeAVanDrie
Copy link
Contributor Author

FYI @kfswain as we have discussed approaches for this offline before.

In subsequent PRs, I will pass the new EPPPluginFactory as a dependency to the FlowRegistry. Flows will then be configured with PluginRefs which allows us to dynamically look up the configured plugin spec and instantiate them. I can then rip out the bespoke plugin registration code from the Flow Control module (Flow Registry stays though as it is responsible for state management / binding).

@shmuelk
Copy link
Contributor

shmuelk commented Dec 9, 2025

I think a better way to handle the need for stateful plugins in the FlowControl layer, is to simply add a NewFlow() function to the FlowControl plugins. This call returns a struct/pointer to struct that to the FlowControl layer itself is completely hidden.
The signature might be something like this:

 `func NewFlow(ctx context.Context) interface{}`

The struct/pointer returned is passed to the plugin calls associated with the appropriate flow. The first thing the plugin does is cast it back to the appropriate type before doing what ever processing it needs.

This makes these plugins more like the rest of the plugins in the system and keeps our code base smaller and easier to understand.

@kfswain
Copy link
Collaborator

kfswain commented Dec 9, 2025

t is insufficient for the Flow Control layer. Flow Control requires stateful components (e.g., Queues, Fairness/Ordering Policies) that must be instantiated uniquely for each Flow or Priority Band.

This is rather handwaved and requires a strong argument imo, as we are making some pretty fundamental changes. I remember having an offline discussion where we found where one could use singleton plugins for Flow Control. We should explore how we can fit into the current ecosystem first and then, after strong justification, make changes.

@LukeAVanDrie
Copy link
Contributor Author

I think a better way to handle the need for stateful plugins in the FlowControl layer, is to simply add a NewFlow() function to the FlowControl plugins. This call returns a struct/pointer to struct that to the FlowControl layer itself is completely hidden. The signature might be something like this:

 `func NewFlow(ctx context.Context) interface{}`

The struct/pointer returned is passed to the plugin calls associated with the appropriate flow. The first thing the plugin does is cast it back to the appropriate type before doing what ever processing it needs.

This makes these plugins more like the rest of the plugins in the system and keeps our code base smaller and easier to understand.

This is an interesting approach, thanks! We need to pick between two patterns:

  1. stateless singleton relying on state-passing (your suggestion)
  2. stateful transient relying on factory (this PR)

Both should work. I will draft up a diagram of the Flow Control state model and provide an example implementation and call sites snippets for a stateful fairness policy, Round Robin under both models.

I suspect your suggestion will be the best path forwards, but let's lay out the pros/cons concretely.

@LukeAVanDrie
Copy link
Contributor Author

I'm still looking into this. pkg/epp/flowcontrol/registry/... needs to undergo some changes for either proposal. Trying to find an efficient path to delivering @shmuelk's approach. Going to mark this as a draft PR until I have an update. Thanks for the feedback so far!

@LukeAVanDrie LukeAVanDrie marked this pull request as draft December 10, 2025 00:49
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants