Stop CTST gracefully at a cucumber-level time budget#2457
Conversation
The GitHub step timeout hard-kills the process, so Cucumber never writes its reports (empty report.xml). Add an in-process deadline instead: the runner script exports a deadline (default 180m) and a marker path; an early Before hook skips any scenario starting past the deadline (so the expensive @Quotas/count-items hooks are short-circuited) and drops the marker. Cucumber then finishes normally, writing report.xml/html/ndjson and running teardown; the script fails the run if the marker is present. The 190m step timeout (ZENKO-5306) remains as a hard backstop. Issue: ZENKO-5309
Hello delthas,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
| # Before hook), so Cucumber finishes normally and writes its reports. The hook | ||
| # drops a marker file; we fail the run if it is present. | ||
| CTST_MAX_RUNTIME_MIN=${CTST_MAX_RUNTIME_MIN:-180} | ||
| export CTST_DEADLINE_EPOCH_MS=$(( $(date +%s%3N) + CTST_MAX_RUNTIME_MIN * 60 * 1000 )) |
There was a problem hiding this comment.
best to minimize content of wrapper script : should do this computation in javascript
| export CTST_DEADLINE_EPOCH_MS=$(( $(date +%s%3N) + CTST_MAX_RUNTIME_MIN * 60 * 1000 )) | |
| export CTST_MAX_RUNTIME_MIN=${CTST_MAX_RUNTIME_MIN:-180} |
There was a problem hiding this comment.
even CTST_MAX_RUNTIME_MIN default value could be set directly in the javascript code?
| --format message:ctst/reports/report.ndjson || rc=$? | ||
|
|
||
| if [ -f "$CTST_TIMEOUT_MARKER" ]; then | ||
| echo "::error::CTST exceeded its ${CTST_MAX_RUNTIME_MIN}-minute time budget; remaining scenarios were skipped." |
There was a problem hiding this comment.
do we need to display a log here ?
we could display the same from within CTST (first time -or everytime- we hit the deadline...), and avoid the whole "marker" handling: which seems overly complex
There was a problem hiding this comment.
no need imo, I would display a log in the hook, for each test skipped, we can access the name of the scenario and just write "scenario xxx skipped because of global timeout"
| fs.writeFileSync(marker, 'timeout'); | ||
| } catch { /* best-effort: marker is advisory */ } | ||
| } | ||
| return 'skipped'; |
There was a problem hiding this comment.
what does this "skipped" do?
these (skipped) tests need to be marked as somewhat failed, to ensure we don't pass CI unexpectedly: maybe "skipped" does that, but it must a skip like "I could not execute it" (=potentially failed) rather than "test was intentionally disabled/skipped" (=ignore)
There was a problem hiding this comment.
I think it's an official cucumber keyword, but yeah this should probably use fail instead 🤔
| if [ -f "$CTST_TIMEOUT_MARKER" ]; then | ||
| echo "::error::CTST exceeded its ${CTST_MAX_RUNTIME_MIN}-minute time budget; remaining scenarios were skipped." | ||
| if [ "$rc" -eq 0 ]; then | ||
| rc=1 |
There was a problem hiding this comment.
instead of wrapper script, there is already some code to "tweak" the return code (and make it pass always), c.f. tests/functional/ctst/cucumber.config.cjs :
if (process.env.CI_PASS_ON_TEST_FAILURE === 'true') {
const _exit = process.exit;
process.exit = function exit(code) {
_exit(code === 1 ? 0 : code);
};
process.on('beforeExit', () => {
if (process.exitCode === 1) {
process.exitCode = 0;
}
});
}
→ instead of this, you could thus tweak the result to actually return something other than 0 if we want to skip further analysis (e.g. test results merging...)
→ however, why do we need to return error 1 in this case? Timeout is likely an issue in the tests, shouldn't we handle this like all other failed tests (i.e. ignore the result and let followup steps check the report for failure)
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command: Alternatively, the |
| // Time budget: scenarios that start after the deadline are skipped so Cucumber | ||
| // finishes normally (writing its reports) before the GitHub step timeout would | ||
| // hard-kill the process. The marker file lets the runner script fail the run. | ||
| Before(() => { |
There was a problem hiding this comment.
I would simplify this :
- hardcode here max duration allowed for all ctst tests
- send as env variable a timestamp of when we started running ctst (+ verify if maybe this variable is already exposed by cucumber)
- just verify time.now < timestampStart + maxDurationAllowed
What
Add an in-process time budget so CTST stops gracefully before the GitHub step timeout would hard-kill it. Touches only
tests/functional/ctst/common/hooks.tsand.github/scripts/end2end/run-e2e-ctst.sh.Why
The only stop for a hung
ctst-end2end-shardedrun is the steptimeout-minutes, which hard-kills the process. cucumber-js v13 installs no signal handler and has no whole-run timeout, so on a kill it never reachestestRunFinished—report.xmlis written empty (Error parsing report.xml: no element found), teardown is skipped, and the archive step has nothing to publish.How
run-e2e-ctst.shexportsCTST_DEADLINE_EPOCH_MS(default 180 min, viaCTST_MAX_RUNTIME_MIN) and a marker path, and clears any stale marker.Beforehook (registered before the expensive@Quotas/count-items hooks) skips any scenario that starts past the deadline — returning'skipped'short-circuits the remaining hooks/steps — and drops the marker file.report.xml/report.html/report.ndjsonand runs teardown. The runner script exits 1 if the marker is present, so a timed-out run is still red while keeping a clean report.When
CTST_DEADLINE_EPOCH_MSis unset (local runs), the hook is a no-op.Relationship to ZENKO-5306
Companion to the step-level timeout. ZENKO-5306 lowers the step
timeout-minutesto 190 as a hard backstop: 180 min (graceful, this PR) < 190 min (hard backstop) < the old 360-min job cap.tsc --build,eslint, andbash -npass.Issue: ZENKO-5309