Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for daemonSet and statefulSet resources in charts #405

Merged
merged 6 commits into from
Sep 1, 2023

Conversation

dcurran90
Copy link
Contributor

@dcurran90 dcurran90 commented Aug 10, 2023

Update waitForDeployments to waitForWorkloads and add the ability to wait for DaemonSets and StatefulSets.
Add tests for this new feature and increase code coverage

Closes #286

Copy link
Contributor

@mgoerens mgoerens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, thanks ! I've added some small suggestions.

Comment on lines 132 to 133
// Just after rollout, pods from the previous deployment revision may still be in a
// terminating state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this comment. Here we check if the pods have become available, and I don't really what this has to do with a previous deployment. Wdyt ? Should we remove/update it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. I'll double check all the comments and make them more relevant.

}
}
}

if len(getDeploymentsError) > 0 {
errorMsg := fmt.Sprintf("Time out retrying after %s", getDeploymentsError)
if len(getWorkloadResourceError) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(getWorkloadResourceError) > 0 {
if getWorkloadResourceError != "" {

or, longer but might show the intention a bit better:

Suggested change
if len(getWorkloadResourceError) > 0 {
if errDeployment != nil || errDaemonSets != nil || errStatefulSets != nil

internal/tool/kubectl_test.go Show resolved Hide resolved
internal/tool/kubectl.go Show resolved Hide resolved
unavailableWorkloadResources = []workloadNotReady{{Name: "none", ResourceType: "Deployment", Unavailable: 1}}
getWorkloadResourceError = fmt.Sprintf("error getting deployments from namespace %s : %v", namespace, errDeployments)
utils.LogWarning(getWorkloadResourceError)
time.Sleep(time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of sleeping here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually left over from what the function previously did.
I believe the idea is to avoid excessive calls and rate limiting since it'll immediately hit the API again after these failures. Happy to remove it if this is a non-issue but trying to find the default rate limit info first

Comment on lines 116 to 130
if errDeployments != nil {
unavailableWorkloadResources = []workloadNotReady{{Name: "none", ResourceType: "Deployment", Unavailable: 1}}
getWorkloadResourceError = fmt.Sprintf("error getting deployments from namespace %s : %v", namespace, errDeployments)
utils.LogWarning(getWorkloadResourceError)
time.Sleep(time.Second)
} else if errDaemonSets != nil {
unavailableWorkloadResources = []workloadNotReady{{Name: "none", ResourceType: "DaemonSet", Unavailable: 1}}
getWorkloadResourceError = fmt.Sprintf("error getting daemon sets from namespace %s : %v", namespace, errDaemonSets)
utils.LogWarning(getWorkloadResourceError)
time.Sleep(time.Second)
} else if errStatefulSets != nil {
unavailableWorkloadResources = []workloadNotReady{{Name: "none", ResourceType: "StatefulSet", Unavailable: 1}}
getWorkloadResourceError = fmt.Sprintf("error getting stateful sets from namespace %s : %v", namespace, errStatefulSets)
utils.LogWarning(getWorkloadResourceError)
time.Sleep(time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things for this block.

  1. This feels like we should be dropping the else if component. Running through each if block separately should work here.

  2. There's a good deal of repetition in what's being executed, indicating we could probably just refactor this to remove it and, in turn, be more concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored to reduce repetition

Comment on lines 132 to 161
getWorkloadResourceError = ""
// Check the number of unavailable replicas for each workload type
for _, deployment := range deployments {
// Just after rollout, pods from the previous deployment revision may still be in a
// terminating state.
if deployment.Status.UnavailableReplicas > 0 {
unavailableDeployments = append(unavailableDeployments, deploymentNotReady{Name: deployment.Name, Unavailable: deployment.Status.UnavailableReplicas})
unavailableWorkloadResources = append(unavailableWorkloadResources, workloadNotReady{Name: deployment.Name, ResourceType: "Deployment", Unavailable: deployment.Status.UnavailableReplicas})
}
}
for _, daemonSet := range daemonSets {
if daemonSet.Status.NumberUnavailable > 0 {
unavailableWorkloadResources = append(unavailableWorkloadResources, workloadNotReady{Name: daemonSet.Name, ResourceType: "DaemonSet", Unavailable: daemonSet.Status.NumberUnavailable})
}
}
for _, statefulSet := range statefulSets {
// StatefulSet doesn't report unavailable replicas so it is calculated here
unavailableReplicas := statefulSet.Status.Replicas - statefulSet.Status.AvailableReplicas
if unavailableReplicas > 0 {
unavailableWorkloadResources = append(unavailableWorkloadResources, workloadNotReady{Name: statefulSet.Name, ResourceType: "StatefulSet", Unavailable: unavailableReplicas})
}
}
if len(unavailableDeployments) > 0 {
utils.LogInfo(fmt.Sprintf("Wait for %d deployments:", len(unavailableDeployments)))
for _, unavailableDeployment := range unavailableDeployments {
utils.LogInfo(fmt.Sprintf(" - %s with %d unavailable replicas", unavailableDeployment.Name, unavailableDeployment.Unavailable))

// If any pods are unavailable report it and sleep until the next loop
// If everythign is available exit the loop
if len(unavailableWorkloadResources) > 0 {
utils.LogInfo(fmt.Sprintf("Wait for %d workload resources:", len(unavailableWorkloadResources)))
for _, unavailableWorkloadResource := range unavailableWorkloadResources {
utils.LogInfo(fmt.Sprintf(" - %s %s with %d unavailable pods", unavailableWorkloadResource.ResourceType, unavailableWorkloadResource.Name, unavailableWorkloadResource.Unavailable))
}
time.Sleep(time.Second)
} else {
utils.LogInfo(fmt.Sprintf("Finish wait for deployments, --timeout time left %s", time.Until(deadline).String()))
utils.LogInfo(fmt.Sprintf("Finish wait for workload resources, --timeout time left %s", time.Until(deadline).String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The structure here feels like we can abstract out the parsing of these into separate functions and call them.
  2. Perhaps there's an opportunity to use goroutines here to let the parsing of statefulsets, deployments, and daemonsets work independently of each other? WDYT. I think it could do away with the sleeps I'm seeing.

Comment on lines +110 to +112
deployments, errDeployments := listDeployments(k, context, namespace, selector)
daemonSets, errDaemonSets := listDaemonSets(k, context, namespace, selector)
statefulSets, errStatefulSets := listStatefulSets(k, context, namespace, selector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm understanding that we check for these three resources regardless of whether the deployed chart uses them? The idea being that charts that don't use statefulsets/daemonsets would just auto-pass the "readiness" check because none of those workloads exist. Is that a correct understanding?

That's fine, I just want to make sure. Fundamentally, this check seems to be flawed because charts can deploy all sorts of resources. But at the end of the day, this is a better check than nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, charts that don't try do deploy certain resources will just auto-pass since we're not expecting them to be there.

As far as checking for other resources I could add in jobs at least because it seems like the only one we're missing from the list of default workload resources (https://kubernetes.io/docs/concepts/workloads/) but didn't know if i should since it wasn't requested and didn't want to go overboard. Happy to make another PR for that though if it's needed

Copy link
Contributor

@komish komish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcurran90 Thanks for your work on this.

Few comments left. I think we can be a little more concise here, and doing so would preserve the readability of these bits. Let me know if I can help with anything.

@komish
Copy link
Contributor

komish commented Aug 29, 2023

@dcurran90 just a heads up that tests are failing here, and should be reproducible to make test

Copy link
Contributor

@komish komish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mgoerens when you're happy with this, go ahead and merge away.

@mgoerens
Copy link
Contributor

mgoerens commented Sep 1, 2023

/lgtm

@mgoerens mgoerens merged commit ab27c64 into redhat-certification:main Sep 1, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for testing daemon and statefulsets
3 participants