Fix batch child jobs created as 'processing' instead of 'pending'#860
Merged
Fix batch child jobs created as 'processing' instead of 'pending'#860
Conversation
Child jobs were immediately set to 'processing' at creation time, before Action Scheduler picked them up. This caused recover-stuck to mark them as timed out after 2 hours even though they were just waiting in the AS queue. Fix: move the pending→processing transition to the actual moment of execution: - PipelineBatchScheduler: remove premature start_job() from createChildJob() - ExecuteStepAbility: add start_job() at execute() entry (no-op for parent jobs already in processing, real transition for child jobs) - TaskScheduler: same pattern — remove from scheduleTask(), add to handleTask() Now recover-stuck only catches genuinely stuck jobs (ones that started processing but never finished), not jobs waiting in the AS queue. Closes #858
Homeboy Results —
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #858 — batch fan-out child jobs were immediately set to
processingat creation time, before Action Scheduler picked them up. After 2 hours,recover-stuckwould mark them as timed out even though they were just waiting in the AS queue.Root cause
Same pattern in
TaskScheduler::scheduleTask().Fix
Move the
pending→processingtransition to the actual moment of execution:PipelineBatchScheduler::createChildJob()start_job()aftercreate_job()pendingExecuteStepAbility::execute()start_job()at entry — transitions toprocessingwhen AS firesTaskScheduler::scheduleTask()start_job()aftercreate_job()pendingTaskScheduler::handleTask()start_job()at entry — transitions toprocessingwhen AS firesFor parent jobs, the
start_job()inExecuteStepAbilityis a no-op (alreadyprocessingviaRunFlowAbility). For child jobs, it's the real transition.Result
recover-stuckonly catches genuinely stuck jobs (started processing but never finished)pendingand won't be incorrectly timed outprocessingstatusTesting