-
Notifications
You must be signed in to change notification settings - Fork 751
feat: Added number of spot interruptions to Tower/Platform telemetry #6606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: munishchouhan <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
|
@claude review please |
|
Claude finished @pditommaso's task —— View job Reviewing PR: Added number of reclamations in TraceRecord
|
|
updated pr summary |
plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchTaskHandler.groovy
Outdated
Show resolved
Hide resolved
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy
Outdated
Show resolved
Hide resolved
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
This reverts commit dcb8465.
This reverts commit 6c6f153.
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy
Show resolved
Hide resolved
- Use guard clauses in AWS Batch handler for cleaner flow - Add clarifying comment in Google Batch handler Signed-off-by: Paolo Di Tommaso <[email protected]>

Summary
This PR adds tracking and reporting of spot/preemptible instance interruptions for cloud batch executors (AWS Batch and Google Batch). When tasks are retried due to spot instance interruptions, the number of interruptions is now captured and exposed via the
numSpotInterruptionsfield in trace records.Motivation
Spot/preemptible instances can be reclaimed by cloud providers at any time, causing tasks to retry on new instances. Understanding how often this happens is important for:
Changes
Core Framework
modules/nextflow/src/main/groovy/nextflow/trace/TraceRecord.groovy)numSpotInterruptionstransient field with getter/setter methodsAWS Batch Plugin (
nf-amazon)AwsBatchTaskHandler.groovy
getNumSpotInterruptions(String jobId)method that examines job attempts for spot interruption patternsstatusReasonstarts with "Host EC2"getTraceRecord()to populatenumSpotInterruptionsfieldTests (
AwsBatchTaskHandlerTest.groovy)getNumSpotInterruptions()with various scenarios:Google Batch Plugin (
nf-google)GoogleBatchTaskHandler.groovy
getNumSpotInterruptions(String jobId)method that examines task status eventsgetTraceRecord()to populatenumSpotInterruptionsfieldmaxSpotAttempts()helper using FusionConfig defaults when fusion snapshots enabledTests (
GoogleBatchTaskHandlerTest.groovy)getNumSpotInterruptions()covering multiple scenariosTechnical Details
Detection Mechanisms
AWS Batch:
JobDetail.attempts()listattempt.statusReason()starts with"Host EC2""Host EC2 (instance i-xxx) terminated."Google Batch:
TaskStatus.statusEventsList()exitCode == 50001in task execution eventsImplementation Approach
The
numSpotInterruptionsfield is:.command.tracefiles)getTraceRecord()is calledThis approach queries the cloud provider's job/task status to detect spot interruptions based on provider-specific indicators:
The field will be available to trace observers that consume TraceRecord objects, allowing workflows to track and report spot interruption rates.
Testing