-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add capacity buffers scale up events #8621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: abdelrahman882 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1b2a75d
to
7dee2b6
Compare
cluster-autoscaler/processors/capacitybuffer/scale_up_status_processor.go
Show resolved
Hide resolved
// relevant events for pods depending on their post scale-up status. | ||
func (p *EventingScaleUpStatusProcessor) Process(context *context.AutoscalingContext, status *ScaleUpStatus) { | ||
consideredNodeGroupsMap := nodeGroupListToMapById(status.ConsideredNodeGroups) | ||
consideredNodeGroupsMap := NodeGroupListToMapById(status.ConsideredNodeGroups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge conflict pro tip here:
(your change here didn't actually conflict w/ the autoscalingCtx *ca_context.AutoscalingContext
change but FYI)
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
v1alpha1 "k8s.io/autoscaler/cluster-autoscaler/apis/capacitybuffer/autoscaling.x-k8s.io/v1alpha1" | ||
"k8s.io/autoscaler/cluster-autoscaler/context" | ||
"k8s.io/autoscaler/cluster-autoscaler/processors/nodegroupset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge conflict resolution pro tip:
git doesn't know how to resolve the surrounding import additions on top of the change to the "k8s.io/autoscaler/cluster-autoscaler/context"
import alias change
7dee2b6
to
469e2c9
Compare
bufferPodInjector := cbprocessor.NewCapacityBufferPodListProcessor(capacitybufferClient, []string{common.ActiveProvisioningStrategy}, buffersPodsRegistry) | ||
podListProcessor = pods.NewCombinedPodListProcessor([]pods.PodListProcessor{bufferPodInjector, podListProcessor}) | ||
opts.Processors.ScaleUpStatusProcessor = status.NewCombinedScaleUpStatusProcessor([]status.ScaleUpStatusProcessor{cbprocessor.NewFakePodsScaleUpStatusProcessor(), opts.Processors.ScaleUpStatusProcessor}) | ||
opts.Processors.ScaleUpStatusProcessor = status.NewCombinedScaleUpStatusProcessor([]status.ScaleUpStatusProcessor{cbprocessor.NewFakePodsScaleUpStatusProcessor(buffersPodsRegistry), opts.Processors.ScaleUpStatusProcessor}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you break up this line? It is very long.
|
||
func nodeGroupListToMapById(nodeGroups []cloudprovider.NodeGroup) map[string]cloudprovider.NodeGroup { | ||
// NodeGroupListToMapById returns a map of node group ID to nonode group | ||
func NodeGroupListToMapById(nodeGroups []cloudprovider.NodeGroup) map[string]cloudprovider.NodeGroup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing specific to eventing processor about this function. WDYT about moving it next to NodeGroup
interface? I.e. here:
autoscaler/cluster-autoscaler/cloudprovider/cloud_provider.go
Lines 262 to 263 in 34cfc53
} | |
func (p *CapacityBufferPodListProcessor) updateCapacityBufferRegistry(fakePods []*apiv1.Pod, buffer *v1alpha1.CapacityBuffer) { | ||
p.buffersRegistry.fakePodsUIDToBuffer = make(map[string]*v1alpha1.CapacityBuffer, len(fakePods)) | ||
for _, fakePod := range fakePods { | ||
if p.buffersRegistry != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check every iteration doesn't do anything. If p.buffersRegistry == nil, you'll get a panic 2 lines earlier, trying to assign something to p.buffersRegistry.fakePodsUIDToBuffer.
If it is a valid state of CapacityBufferPodListProcessor to have nil there, it should be checked here once, when entering the function. If it is not a valid state, it should be checked NewCapacityBufferPodListProcessor and an error should be returned there.
totalFakePods := []*apiv1.Pod{} | ||
for _, buffer := range buffers { | ||
fakePods := p.provision(buffer) | ||
p.updateCapacityBufferRegistry(fakePods, buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a comment to this PR per se, but adding a note since you're modifying this code: why do we need to recalculate all that every loop? This is O(N) in terms of # of fake pods and requires costly deep copying, which will lead to performance issues at large scale. Ideally, CA would only watch for changes to buffers and templates and recalculate the set of fake pods when there is a need for that.
I'm not saying to refactor it all in this PR, but it should be fixed as a follow up.
filteredPodsSources := make([]T, 0) | ||
removedPods := make([]*apiv1.Pod, 0) | ||
// filterOutFakeUnschedulablePods filters out NoScaleUpInfo for capacity buffers fake pods | ||
func filterOutFakeUnschedulablePods(noScaleUpInfo []status.NoScaleUpInfo) ([]status.NoScaleUpInfo, []status.NoScaleUpInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why duplicate the existing generics based logic?
cluster-autoscaler/processors/capacitybuffer/scale_up_status_processor.go
Show resolved
Hide resolved
} | ||
|
||
for _, bufferInfo := range buffersInfo { | ||
context.Recorder.Event(bufferInfo.buffer, apiv1.EventTypeNormal, "NotTriggerScaleUp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Eventf
instead of Event
to avoid calling Sprintf
directly.
}, | ||
PodsAwaitEvaluation: []*apiv1.Pod{}, | ||
}, | ||
expectedTriggeredScaleUp: 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about changing this from int to a slice of error strings so that it is possible to validate the message content as well?
What type of PR is this?
/kind feature
What this PR does / why we need it:
Add events to capacity buffers objects when buffer causes scale up or fails to do so.
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
This changes is mainly caching fake pods to buffer objects in step 1. then in step 2 we use that mapping to get the buffer object from fake pods objects and create scale up events for the buffers
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
NONE