Log cronjob pod diagnostics on failure/hang in CTST by delthas · Pull Request #2456 · scality/Zenko

delthas · 2026-06-30T09:37:46Z

What

Add best-effort pod diagnostics + a heartbeat to createJobAndWaitForCompletion (tests/functional/ctst/steps/utils/kubernetes.ts), the shared helper that triggers and waits for cronjobs (count-items) in quota / utilization / storage-usage CTST scenarios.

Why

When the triggered job fails or hangs, the step previously logged only "Job failed" with the job object — never why (no pod logs, exit code, or BackoffLimitExceeded reason). The count-items pod logs lived only in the kind-logs artifact, which on a cancelled/hung run is frequently truncated or never uploaded, leaving the failure undiagnosable from the run itself. The watch also had no heartbeat, so a stalled job produced dead air until the 20-min Before-hook timeout.

Changes (all best-effort, wrapped so logging never breaks the test)

Heartbeat — every 30 s while waiting, re-read the Job and log its real status (active/succeeded/failed, conditions) + pod state, so a silent stall is visible instead of dead air.
Pod diagnostics on failure — on job failure (and once on first detected container crash/restart), list the job's pods and log their status + container logs (current + previous), so the crash reason lands directly in the GitHub step log.
Lock visibility — log lock acquisition / cross-worker contention at info/warn.

CTST already runs at debug log level in CI, so these surface in the step log with no artifact dependency.

tsc --build and eslint pass on the changed file.

Issue: ZENKO-5308

bert-e · 2026-06-30T09:37:50Z

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options

name	description	privileged	authored
`/after_pull_request`	Wait for the given pull request id to be merged before continuing with the current one.
`/bypass_author_approval`	Bypass the pull request author's approval	⭐
`/bypass_build_status`	Bypass the build and test status	⭐
`/bypass_commit_size`	Bypass the check on the size of the changeset `TBA`	⭐
`/bypass_incompatible_branch`	Bypass the check on the source branch prefix	⭐
`/bypass_jira_check`	Bypass the Jira issue check	⭐
`/bypass_peer_approval`	Bypass the pull request peers' approval	⭐
`/bypass_leader_approval`	Bypass the pull request leaders' approval	⭐
`/approve`	Instruct Bert-E that the author has approved the pull request.		✍️
`/create_pull_requests`	Allow the creation of integration pull requests.
`/create_integration_branches`	Allow the creation of integration branches.
`/no_octopus`	Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
`/unanimity`	Change review acceptance criteria from `one reviewer at least` to `all reviewers`
`/wait`	Instruct Bert-E not to run until further notice.

Available commands

name	description	privileged
`/help`	Print Bert-E's manual in the pull request.
`/status`	Print Bert-E's current status in the pull request `TBA`
`/clear`	Remove all comments from Bert-E from the history `TBA`
`/retry`	Re-start a fresh build `TBA`
`/build`	Re-start a fresh build `TBA`
`/force_reset`	Delete integration branches & pull requests, and restart merge process from the beginning.
`/reset`	Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

bert-e · 2026-06-30T09:40:13Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
2 peers

delthas · 2026-06-30T14:47:27Z

This additional debug logging helped find the s3utils issue.

bert-e · 2026-07-01T09:18:55Z

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

When a triggered cronjob (e.g. count-items) fails or hangs, the step only logged that the job failed, never why - the pod logs lived only in artifacts that were often truncated/missing on cancelled runs. Add, all best-effort and wrapped so logging never breaks the test: - a 30s heartbeat that re-reads the job status + pod state while waiting, so a silent stall is visible instead of dead air until the hook timeout; - on failure (and once on first detected crash/restart), a dump of the pod status and pod logs (current + previous), so the failure cause lands in the GitHub step log itself; - lock-acquisition visibility so cross-worker contention is logged. Issue: ZENKO-5308

delthas marked this pull request as draft June 30, 2026 09:38

scality deleted a comment from bert-e Jun 30, 2026

delthas marked this pull request as ready for review June 30, 2026 14:46

delthas requested review from a team, SylvainSenechal and benzekrimaha and removed request for benzekrimaha June 30, 2026 14:47

SylvainSenechal reviewed Jul 1, 2026

View reviewed changes

Comment thread tests/functional/ctst/steps/utils/kubernetes.ts Outdated

scality deleted a comment from bert-e Jul 1, 2026

SylvainSenechal reviewed Jul 1, 2026

View reviewed changes

Comment thread tests/functional/ctst/steps/utils/kubernetes.ts Outdated

SylvainSenechal reviewed Jul 1, 2026

View reviewed changes

Comment thread tests/functional/ctst/steps/utils/kubernetes.ts Outdated

SylvainSenechal approved these changes Jul 1, 2026

View reviewed changes

delthas force-pushed the improvement/ZENKO-5308/cronjob-pod-diagnostics branch from bd86223 to d61d105 Compare July 1, 2026 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log cronjob pod diagnostics on failure/hang in CTST#2456

Log cronjob pod diagnostics on failure/hang in CTST#2456
delthas wants to merge 1 commit into
development/2.15from
improvement/ZENKO-5308/cronjob-pod-diagnostics

delthas commented Jun 30, 2026

Uh oh!

bert-e commented Jun 30, 2026

Uh oh!

bert-e commented Jun 30, 2026

Uh oh!

delthas commented Jun 30, 2026

Uh oh!

Uh oh!

bert-e commented Jul 1, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

delthas commented Jun 30, 2026

What

Why

Changes (all best-effort, wrapped so logging never breaks the test)

Uh oh!

bert-e commented Jun 30, 2026

Hello delthas,

Uh oh!

bert-e commented Jun 30, 2026

Waiting for approval

Uh oh!

delthas commented Jun 30, 2026

Uh oh!

Uh oh!

bert-e commented Jul 1, 2026

Request integration branches

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants