Skip to content
This repository has been archived by the owner on Oct 10, 2024. It is now read-only.

Change the formula cat-gate uses to schedule pods #20

Merged
merged 8 commits into from
May 24, 2024

Conversation

zeroalphat
Copy link
Contributor

@zeroalphat zeroalphat commented May 15, 2024

  • Change the formula cat-gate uses to schedule pods
    • Change 'Number of nodes with pods on which the container status is Running or Terminated' to 'Use the number of nodes with container images'.
  • Modify the implementation to avoid requeueing when the scheduling gate is successfully removed.
  • Add a process to prevent removing the scheduling gate for other Pods immediately after removing the scheduling gate, in order to prevent race conditions.

@zeroalphat zeroalphat self-assigned this May 15, 2024
@zeroalphat zeroalphat marked this pull request as ready for review May 21, 2024 01:03
@zeroalphat zeroalphat requested a review from masa213f May 21, 2024 01:03
internal/runners/garbage_collector.go Outdated Show resolved Hide resolved
internal/controller/pod_controller.go Show resolved Hide resolved
pods := &corev1.PodList{}
err = r.List(ctx, pods, client.MatchingFields{constants.ImageHashAnnotationField: reqImagesHash})
if err != nil {
logger.Error(err, "failed to list pods")
return ctrl.Result{}, err
}

nodeSet := make(map[string]struct{})

numSchedulablePods := 0
numImagePulledPods := 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the calculation method of numImagePullingPods.
I think we need to count the pods with the following conditions.

(no cat-gate.cybozu.io/gate scheduling gate) AND (pod.status.phase==Pending) AND (pod.spec.nodeName is not contained in nodeSet)

IMO, we need not check the status of individual containers. The pod phase is sufficient for our usage.

Copy link
Contributor Author

@zeroalphat zeroalphat May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masa213f
I agree that it is better to use Pod phase.
I didn't really understand why you included whether nodeset is included in the counting method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I changed my mind. We do not need to care about "nodeSet".

I thought pods scheduled to "nodeSet" nodes would not pull images from upstream repositories.
But if the "imagePullPolicy" is "Always", the kubelets may pull the new images.

Copy link
Contributor Author

@zeroalphat zeroalphat May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change the conditional expression of numImagePullingPods with f3455c6.

internal/controller/pod_controller.go Outdated Show resolved Hide resolved
internal/controller/pod_controller.go Outdated Show resolved Hide resolved
internal/controller/pod_controller_test.go Show resolved Hide resolved
@zeroalphat zeroalphat requested a review from masa213f May 23, 2024 04:31
@zeroalphat zeroalphat requested a review from masa213f May 24, 2024 01:44
Copy link
Contributor

@masa213f masa213f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeroalphat zeroalphat merged commit 77529fb into main May 24, 2024
3 checks passed
@zeroalphat zeroalphat deleted the fix-schedule-formula branch May 24, 2024 05:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants