-
-
Couldn't load subscription status.
- Fork 43
feat(controller): support serverless serving with 0-1 activator. #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
5424dec to
a7ae00f
Compare
|
/kind feature |
5d23e51 to
0a1d0fe
Compare
|
/cc @kerthcet |
| flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") | ||
| flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy") | ||
| flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature") | ||
| flag.StringVar(&podIP, "pod-ip", "", "The pod IP of the llmaz controller manager") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something like --bind-address ? Generally, we use 0.0.0.0 by default and check the exact pod IP at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If podIP is only used for serverless mode, we should add a comment.
| // ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator. | ||
| ModelActivatorAnnoKey = "activator.llmaz.io/playground" | ||
| // CachedModelActivatorAnnotationKey is used to cache the activator info of the model. | ||
| CachedModelActivatorAnnoKey = "cached.activator.llmaz.io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the annotation naming style is not following the annotation and labels above.
|
|
||
| // ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator. | ||
| ModelActivatorAnnoKey = "activator.llmaz.io/playground" | ||
| // CachedModelActivatorAnnotationKey is used to cache the activator info of the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is CachedModelActivatorAnnotationKey, below is CachedModelActivatorAnnoKey.
| // Once either of them qualified, we'll expose this as a field in Model. | ||
| ModelPreheatAnnoKey = "llmaz.io/model-preheat" | ||
|
|
||
| // ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment name is not same as ModelActivatorAnnoKey
| @@ -0,0 +1,564 @@ | |||
| /* | |||
| Copyright 2024 The InftyAI Team. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit 2025
What this PR does / why we need it
In this update, several key improvements were made to support serverless operations and model activation. New constants were introduced to manage model activation states and cache information effectively.
Environment variables like
POD_IPwere added to dynamically configure networking settings, enhancing deployment flexibility. The main function was updated to include flags for enabling serverless features and configuring pod IPs, ensuring controllers can handle these operations smoothly.RBAC rules were expanded to allow more comprehensive resource management, including patching and updating endpoints. A new controller,
ActivatorReconciler, was implemented to manage model activation, service reconciliation, and traffic forwarding, crucial for serverless activations.Lastly, service creation logic was updated to include model annotations, ensuring services are correctly configured for activation purposes. These changes collectively improve the system's ability to manage dynamic and scalable deployments.
Which issue(s) this PR fixes
Fixes #362
Special notes for your reviewer
Does this PR introduce a user-facing change?
cc @pacoxu @kerthcet