Releases: broadinstitute/cromwell
91
91 Release Notes
Removal of Google LifeSciences backend code
Code related to the Google's Cloud LifeSciences API (Papiv2 or v2Beta) has been removed following Google’s shutdown of the service in July 2025.
Google Batch (batch) is now the supported GCP backend.
GCP Batch
- Task log files are now included in the group of files copied for call cache hits.
- Tweak automatic retry of transient errors: retry if task is SCHEDULED but not RUNNING. This should result in more retries, reducing the number of workflows that fail due to transient Batch issues.
- Fixed an issue that caused WDL tasks to fail when invoking
gcloudorgsutil. Affected tasks returned an error message referencingpython3: not found. - Fixed an issue that could cause a valid WDL using an
Int?value to fail with an error mentioningbootDiskSizeGb. - Increased timeout for logging runnables in response to a low rate of sporadic timeout errors.
- Job IDs will be derived from workflow and call details with a hash generated using call name.
This will allow for better grouping of jobs in the Batch UI and ensure deterministic job IDs to prevent duplicates upon Cromwell restart. Example of job ID:job-e21cbbd3-scatterworkflowmytask-2-1-175f647b. - Jobs that fail with exit code 50002 before even getting to RUNNING state will now be eligible for automatic transient retries.
- Set a timeout of 24 hours for many runnables in Batch jobs. This prevents excess spend when localization or other setup steps hang. User command runnables are not affected.
- Updated cost estimation documentation to make it explicit that the Cloud Billing API must be enabled.
- Added support for cancelling jobs - aborted Batch jobs will now be marked as Cancelled instead of being deleted. This will allow users to view job details even after job is aborted.
AWS Batch
- Pulled in AWS improvements, features, and fixes from henriqueribeiro/cromwell
- Added support for specifying an IAM role for AWS Batch job containers via the
aws_batch_job_role_arnworkflow option. This allows containers to access AWS resources based on the permissions granted to the specified role. - ECR pull-through caches can now be used to access Docker images. See ReadTheDocs for details.
Other changes
- Removed unused code related to Azure cloud services.
90
90 Release Notes
GCP Batch
- Cromwell now supports automatic use of the GAR Dockerhub mirror, see ReadTheDocs for details.
- VM initialization time in now included in estimated cost calculation for jobs.
Bug fixes
- Fixed a concurrency bug that in rare cases caused tasks to never start.
Security fix
- Fixed a vulnerability in the repository's Github Actions.
- We found no evidence of compromise to the source code, so the Cromwell product itself was not impacted.
- Forked Cromwell repositories should update immediately from
develop. - Thank you to Stefano Chierici, Alberto Pellitteri, and Lorenzo Susini for the report.
89
89 Release Notes
Improvements
- Cromwell can now provide estimated costs incurred by a workflow run on GCP, read more in
CostEstimationin ReadTheDocs.
GCP Batch Updates
- Add 30 GB default VM boot disk size to user-requested boot disk size; this ensures the VM has room for large user command Docker images.
- Fix a bug that caused Cromwell to treat immediate preemptions as failures.
- Automatically retry tasks that fail with transient Batch errors before the VM has started running (that is, before the task has cost the user money). These retries do not count against
maxRetries. - Symlink to
/cromwell_root- In LifeSciences, the Cromwell root directory that user scripts are run from is located at/cromwell_root, but in the Batch backend it has moved to/mnt/disk/cromwell_root. To ensure WDLs that rely on the original path don't break when run on the Batch, and to also maintain forward compatibility we have created a symlink between/mnt/disk/cromwell_rootand/cromwell_root. - Fixed a bug that caused Cromwell to overestimate the workflow cost for Batch jobs that used preemptible machines.
- Allocated more memory to the shared memory filesystem (
/dev/shm) proportional to the machine size
Other Changes
- Removes a database index
METADATA_WORKFLOW_IDXthat is now redundant since the introduction ofIX_METADATA_ENTRY_WEU_MK. - The
latesttag will now point to the most recent numerical cromwell release rather than following the develop branch. This means that thelatesttag will not be updated for pre-release versions of Cromwell.
88
88 Release Notes
Important Upgrade Note: Database Schema Change
Cromwell 88 includes a number of database schema changes to support new functionality and improve performance. Users should expect a longer-than-usual database migration due primarily to the IX_METADATA_ENTRY_WEU_MK index added to METADATA_ENTRY. In pre-release testing, this migration proceeded at about 3 million rows per minute. Please plan downtime accordingly.
GCP Batch Updates
- The
genomicsconfiguration entry was renamed tobatch, see ReadTheDocs for more information. - Fixed a bug with not being able to recover jobs on Cromwell restart.
- Fixed machine type selection to match the Google Cloud Life Sciences backend, including default n1 non shared-core machine types and correct handling of
cpuPlatformto select n2 or n2d machine types as appropriate. - Fixed preemption and maxRetries behavior. In particular, once a task has exhausted its allowed preemptible attempts, the task will be scheduled again on a non-preemptible VM.
- Fixed error message reporting for failed jobs.
- Fixed the "retry with more memory" feature.
- Fixed the reference disk feature.
- Fixed pulling Docker image metadata from private GCR repositories.
- Fixed
google_projectandgoogle_compute_service_accountworkflow options not taking effect when using GCP Batch backend - Added a way to use a custom LogsPolicy for the job execution, setting
backend.providers.batch.config.batch.logs-policyto "CLOUD_LOGGING" (default) keeps the current behavior, or, set it to "PATH" to stream the logs to Google Cloud Storage. - When "CLOUD_LOGGING" is used, many more Cromwell / WDL labels for workflow, root workflow, call, shard etc. are now assigned to GCP Batch log entries.
- Fixed subnet selection for networks that use custom subnet creation
- Updated runtime attributes documentation to clarify that the
nvidiaDriverVersionkey is ignored on GCP Batch.
Improvements
- A new optional feature prevents Cromwell from starting new jobs in a group that is currently experiencing cloud quota exhaustion. Jobs will be started once the group's quota becomes available. To enable this feature, set
quota-exhaustion-job-start-control.enabledto true. - Users can now configure which algorithm is used to hash files for call caching purposes. See Configuring page in ReadTheDocs for details. Default behavior is unchanged.
- Cromwell now allows opting into configured soft links on shared file systems such as HPC environments. More details can be found here.
- Users reported cases where Life Sciences jobs failed due to insufficient quota, instead of queueing and waiting until quota is available (which is the expected behavior). Cromwell will now retry under these conditions, which present with errors such as "PAPI error code 9", "no available zones", and/or "quota too low".
- If Cromwell can't determine the size of the user command Docker image, it will increase Lifesciences API boot disk size by 30GB rather than 0. This should reduce incidence of tasks failing due to boot disk filling up.
- Resolved a hotspot in Cromwell to make the
size()engine function perform much faster on file arrays. Common examples of file arrays could include globs or scatter-gather results. - The
IX_WORKFLOW_STORE_ENTRY_WSindex is removed fromWORKFLOW_STORE_ENTRY. The index had low cardinality and workflow pickup is faster without it. - When Cromwell restarts during a workflow that is failing, it no longer reports pending tasks as a reason for that failure.
- As outlined in the WDL Spec, concatenating a string with an empty optional now correctly evaluates to the empty string.
Other Changes
- As of this version, a distribution of Java 17 is required to run Cromwell. Cromwell is developed, tested, and containerized using Eclipse Temurin.
RESTAPI.mddocs have been discontinued. Due to deprecation of the underlying library, Markdown docs will no longer be generated from the Cromwell API Swagger. The recommended alternative is starting a server and viewing the Swagger directly.- Removed obsolete health checks
- Docker Hub: Cromwell's healthcheck requests to Docker Hub were not authenticated, and thus became subject to rate limiting. To eliminate these false alarms, this functionality has been removed. The config key
services.HealthMonitor.config.check-dockerhub
is therefore obsolete. - GCS: Cromwell's health check of GCS has been removed. GCS does not have availability issues of note, and in typical configurations the check does not meaningfully test Cromwell's permissions. The config keys
services.HealthMonitor.config.check-gcsand.gcs-bucket-to-checkare therefore obsolete.
- Docker Hub: Cromwell's healthcheck requests to Docker Hub were not authenticated, and thus became subject to rate limiting. To eliminate these false alarms, this functionality has been removed. The config key
- Code relating to the Google Genomics API (aka
v1Alpha) has been removed since Google has entirely disabled that service. Cloud Life Sciences (akav2Beta, deprecated) and Google Batch (akabatch, recommended) remain the two viable GCP backends. Cloud Life Sciences is expected to be unavailable starting in July 2025 andv2Betasupport will be removed in a future Cromwell release. - Removed support for Nvidia K80 "Kepler" GPUs, which were discontinued by GCP in May 2024.
- Default GPU on Life Sciences is now Nvidia P100
- Default GPU on GCP Batch is now Nvidia T4
87
87 Release Notes
GCP Batch
- Added Nvidia driver install (default 418) (#7235)
- Fixed Docker mounting volumes with extra colon (#7240)
- Fixed issue with multiple zones defined in config (#7240)
- Fixed Batch label regex (#7355)
Progress toward WDL 1.1 Support
WDL 1.1 support is in progress. Users that would like to try out the current partial support can do so by using
WDL version development-1.1. As of Cromwell 87, development-1.1 includes:
- Engine functions:
- Struct literals can be included in WDLs (#7391) (#7402)
- Added
returnCodesruntime attribute (#7389)
upgrade command removed from Womtool
Womtool previously supported a womtool upgrade command for upgrading draft-2 WDLs to 1.0. With WDL 1.1 soon to
become the latest supported version, this functionality is retiring. (#7382)
Replacement of gsutil with gcloud storage
In this release (#7359), all localization functionality on the GCP backend migrates to use the more modern and performant gcloud storage. With sufficiently powerful worker VMs, Cromwell can now localize at up to 1200 MB/s [0][1][2].
In a future release, delocalization will also migrate to gcloud storage. As part of that upcoming change, we are considering turning on parallel composite uploads by default to maximize performance. Delocalized composite objects will no longer have an md5 checksum in their metadata; refer to the matrix below [3]. If you have compatibility concerns for your workflow, please submit an issue.
| Delocalization Strategy | Performance | crc32c | md5 |
|---|---|---|---|
| Classic | Baseline/slow | ✅ | ✅ |
| Parallel Composite | Fast | ✅ | ❌ |
[0] Tested with Intel Ice Lake CPU platform, 16 vCPU, 32 GB RAM, 2500 GB SSD
[1] Throughput scales with vCPU count with a plateau at 16 vCPUs.
[2] Throughput scales with disk size and type with at a plateau at 2.5 TB SSD. Worked example: 1200 MB/s ÷ 0.48 MB/s per GB = 2500 GB.
[3] Cromwell itself uses crc32c hashes for call caching and is not affected
Other Improvements
- In certain cases DRS downloads have been found to hang forever. Cromwell will now time these out. (#7416)
- Increased default Akka
client.parsing.max-response-reason-lengthto 1024 (#7406) - Workflow Completion Callback bodies now include fully-qualified output names (#7234)
- Improved workflow abort error handling (#7245)
- Improved logging for troubleshooting (#7246) (#7253) (#7388)
- Support for Intel Ice Lake chips in Life Sciences backend (#7252)
- Fix workflows getting stuck in Aborting when WDL has a type error (#7385)
- Updates to dependencies to fix security vulnerabilities.
86
86 Release Notes
GCP Batch
Cromwell now supports the GCP Batch backend for running workflows. See Backend in ReadTheDocs for more information.
Workflow Completion Callback
Cromwell can be configured to send a POST request to a specified URL when a workflow completes. The request body includes the workflow ID, terminal state,
and (if applicable) final outputs or error message. See WorkflowCallback in ReadTheDocs for more information.
Other Improvements
- Cromwell will now parallelize the downloads of DRS files that resolve to signed URLs. This significantly reduces the time localization takes in certain situations.
- WDL size engine function now works for HTTP files
- Improved Cromwell's handling of docker manifests. Additional logging information is emitted, and Cromwell will fall back to using OCI manifests if it encounters an error with a Docker Image Manifest V2.
85
85 Release Notes
Migration of PKs to BIGINT
The PK of below tables will be migrated from INT to BIGINT. Also, since ROOT_WORKFLOW_ID in SUB_WORKFLOW_STORE_ENTRY is a FK to WORKFLOW_STORE_ENTRY_ID in WORKFLOW_STORE_ENTRY
it is also being migrated from INT to BIGINT.
- DOCKER_HASH_STORE_ENTRY
- WORKFLOW_STORE_ENTRY
- SUB_WORKFLOW_STORE_ENTRY
Improvement to "retry with more memory" behavior
Cromwell will now retry a task with more memory after it fails with return code 137, provided all
the other requirements for retrying with more memory are met.
DRS Improvements
Support for invoking CromwellDRSLocalizer with manifest file
CromwellDRSLocalizer can now handle multiple file localizations in a single invocation. Users can provide a
manifest file containing multiple (DRS id, local container path) pairs in CSV format, and they will be localized in
sequence, with the program exiting if any fail.
java -jar /path/to/localizer.jar [options] -m /local/path/to/manifest/file.txt
The previous method of passing in a single DRS file and container destination using positional arguments is still
supported.
Improvement to DRS localization in GCP papiv2beta backend
All DRS inputs to a task are now localized in a single PAPI action, which should improve speed and resolve
failures observed when attempting to localize a large number of DRS files.
Allow list for HTTP WDL resolution
Administrators can now configure Cromwell with an allow list that limits the domains from which WDLs can be resolved and imported.
Default behavior is unchanged (Cromwell attempts to resolve WDL files from any URI). Example configuration:
languages {
WDL {
http-allow-list {
enabled: true
allowed-http-hosts: [
"my.wdl.repo.org",
"raw.githubusercontent.com"
]
}
}
}
CWL implementation removed
This release removes the cwl top-level artifact. Some nonfunctional references may remain, and will be addressed over time.
For more information, see the Cromwell 79 release notes.
TES Improvments
-
Tes system errors are are now reported in Cromwell execution logs when the TES backend returns a task error.
-
Cromwell now attempts to translate
disksattributes written for GCP into validdiskattributes for TES. For information on supported conversions, refer to the TES documentation.
Bug Fixes
-
Reference disks are only mounted if configured in the workflow options.
-
Recent docker images of Ubuntu use a new manifest format, ensure that these newer image versions can be pulled from Docker Registry without issue.
-
When converting ValueStore objects to strings for logging, we truncate long values to limit memory usage.
Security Patching
Updates to dependencies to fix security vulnerabilities.
84
84 Release Notes
CromIAM enabled user checks
For Cromwell instances utilizing the optional CromIAM identity and access management component, the following endpoints now verify that the calling user is enabled before forwarding the request.
/api/workflows/v1/backends/api/womtool/v1/describe
This change makes the above endpoints consistent with the existing behavior of all the other endpoints in the /api/ path of CromIAM.
83
83 Release Notes
- Changes the type of several primary key columns in call caching tables from int to bigint. The database migration may be lengthy if your database contains a large amount of call caching data.
82
82 Release Notes
- Restored missing example configuration file
- Upgraded to latest version of the Google Cloud Storage NIO library (0.124.8)
- Cromwell will now finitely retry the following Google Cloud Storage I/O error.
- Response code
400bad request, messageUser project specified in the request is invalid - The default retry count is
5and may be customized withsystem.io.number-of-attempts.
- Response code