Skip to content

Conversation

@X1aoZEOuO
Copy link
Contributor

@X1aoZEOuO X1aoZEOuO commented Sep 28, 2025

What this PR does / why we need it

In this update, several key improvements were made to support serverless operations and model activation. New constants were introduced to manage model activation states and cache information effectively.

Environment variables like POD_IP were added to dynamically configure networking settings, enhancing deployment flexibility. The main function was updated to include flags for enabling serverless features and configuring pod IPs, ensuring controllers can handle these operations smoothly.

RBAC rules were expanded to allow more comprehensive resource management, including patching and updating endpoints. A new controller, ActivatorReconciler, was implemented to manage model activation, service reconciliation, and traffic forwarding, crucial for serverless activations.

Lastly, service creation logic was updated to include model annotations, ensuring services are correctly configured for activation purposes. These changes collectively improve the system's ability to manage dynamic and scalable deployments.

Which issue(s) this PR fixes

Fixes #362

Special notes for your reviewer

Does this PR introduce a user-facing change?


cc @pacoxu @kerthcet

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025
@X1aoZEOuO X1aoZEOuO force-pushed the feat/0-1-activator branch 2 times, most recently from 5424dec to a7ae00f Compare September 28, 2025 12:19
@X1aoZEOuO
Copy link
Contributor Author

/kind feature

@InftyAI-Agent InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025
@X1aoZEOuO X1aoZEOuO force-pushed the feat/0-1-activator branch 2 times, most recently from 5d23e51 to 0a1d0fe Compare September 28, 2025 15:40
@pacoxu
Copy link
Contributor

pacoxu commented Oct 10, 2025

/cc @kerthcet
/assign

flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy")
flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature")
flag.StringVar(&podIP, "pod-ip", "", "The pod IP of the llmaz controller manager")
Copy link
Contributor

@pacoxu pacoxu Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something like --bind-address ? Generally, we use 0.0.0.0 by default and check the exact pod IP at runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If podIP is only used for serverless mode, we should add a comment.

// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
ModelActivatorAnnoKey = "activator.llmaz.io/playground"
// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.
CachedModelActivatorAnnoKey = "cached.activator.llmaz.io"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the annotation naming style is not following the annotation and labels above.


// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
ModelActivatorAnnoKey = "activator.llmaz.io/playground"
// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is CachedModelActivatorAnnotationKey, below is CachedModelActivatorAnnoKey.

// Once either of them qualified, we'll expose this as a field in Model.
ModelPreheatAnnoKey = "llmaz.io/model-preheat"

// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment name is not same as ModelActivatorAnnoKey

@@ -0,0 +1,564 @@
/*
Copyright 2024 The InftyAI Team.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OSPP] KEDA-based Serverless Elastic Scaling for llmaz

3 participants