-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/e2e: Add debug of failed pods #2216
base: main
Are you sure you want to change the base?
test/e2e: Add debug of failed pods #2216
Conversation
91ed61e
to
a807a54
Compare
Typically - when I want tests to fail they all pass! Re-running... |
692fef8
to
b25dcff
Compare
https://github.com/confidential-containers/cloud-api-adaptor/actions/runs/12415908135/job/34663884546?pr=2216 shows an example (with deliberately failing tests of it selecting the correct bit of the CAA log related to the pod that fails. |
b25dcff
to
73c63ff
Compare
https://github.com/confidential-containers/cloud-api-adaptor/actions/runs/12417178259/job/34668109916?pr=2216 shows that the CAA log is now trimmed to only show the messages related to the pod that failed. |
73c63ff
to
34d04a6
Compare
can you maybe screen shot an example? I fail to see spot anything in the debug step output, but maybe I'm looking in the wrong place |
I guess the logs can be written to file in a ./fail folder to be printed out in the debug step, but having them in the test output also works |
func GetPodsFromJob(ctx context.Context, t *testing.T, client klient.Client, job *batchv1.Job) (*v1.PodList, error) { | ||
clientset, err := kubernetes.NewForConfig(client.RESTConfig()) | ||
if err != nil { | ||
return nil, fmt.Errorf("GetPodFromJob: get Kubernetes clientSet failed: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @stevenhorsman !
s/GetPodFromJob/GetPodsFromJob/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a PodList, so thats why I called it GetPods as it could return multiple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say that the function's name being printed should be GetPodsFromJob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I didn't look at the line properly!
|
||
pods, err := clientset.CoreV1().Pods(job.Namespace).List(context.TODO(), metav1.ListOptions{LabelSelector: "job-name=" + job.Name}) | ||
if err != nil { | ||
return nil, fmt.Errorf("GetPodFromJob: get pod list failed: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed
} | ||
|
||
// Find the logs starting with the pod | ||
pod_matcher := regexp.MustCompile(`[0-9]{4}\/[0-9]{2}\/[0-9]{2} ([0-1]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9] \[adaptor\/cloud\] create a sandbox [0-9|a-f]* for pod ` + pod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex is pure evil :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'll split it up to be clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. It just caught my attention, I used to enjoy these crazy regex :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've provided an example of what we are trying to match against and broken out the date and time matcher, so I think it should be more understandable now. Just re-testing with a deliberate error again as regex is prone to failures and I don't test my manual testing as I'm on a run of mistakes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stevenhorsman !
If a test fails, log debug info for the pod/job - The log of the pod - The extract of CAA log related to that pod - The warning/error events for that pod Signed-off-by: stevenhorsman <[email protected]>
34d04a6
to
fa47a24
Compare
https://github.com/confidential-containers/cloud-api-adaptor/actions/runs/12434706844/job/34719400000?pr=2216 shows that the CAA pod specific logging is still working, so my regex looks okay |
fa47a24
to
fec1d1e
Compare
If we just hit timeout whilst waiting for pods to get to into the expected state, or during testing/clean-up of resources, then `Error` instead of `Fatal` as we don't want to skip the logging and cleanup. Signed-off-by: stevenhorsman <[email protected]>
57891d9
to
7f48ac3
Compare
Print logs of failed pods and CAA in case of failures to help with debugging in the case of failures