refactor: change workflow evaluation to use `tokio::spawn`. #320

peterhuene · 2025-01-30T17:53:11Z

This refactoring primarily affects wdl-engine and how workflows are evaluated. Previously, workflows were evaluated without spawning any tokio tasks. This meant that subgraph evaluation was not concurrent, although WDL tasks were spawned concurrently.

With the upcoming support for integrating Crankshaft, the Backend trait was refactored such that the backend itself can maintain state and notify callers of when a WDL task is spawned and when it completes. This allows for the local execution backend to be rewritten in a way that enables it to respect the cpu and memory requirements of a WDL task.

To support making the evaluator futures Send, we now utilize rowan::AstPtr to send AST nodes between threads performing evaluation. The evaluation graphs are mapped to be Send so that evaluation can keep a shared reference to the graph across concurrently executing tokio tasks.

Also included with these changes:

Use tracing-indicatif to prevent mixing of tracing output and progress bars; the progress bar now consistently shows at the end of tracing output and can also be quieted.
Made wdl_analysis::Document trivially clonable.
Minor changes to internal names in wdl-analysis.
Fixed a missing constraint on the basename function definition for an overload that's only required for WDL 1.2.
Added a Config type to wdl-engine that controls how evaluation is performed; in the future this can be used to support evaluation configuration files.
Expanded the ProgressKind enumeration to allow for tracking how many tasks are "ready" (i.e. waiting to be spawned by the backend) vs. "executing" (i.e. currently running).
Removed the incorrect mapping of host paths to guest paths; this will soon be reimplemented as part of the initial support for a Docker backend.
Added retry logic to task execution.

Before submitting this PR, please make sure:

You have added a few sentences describing the PR here.
You have added yourself or the appropriate individual as the assignee.
You have added at least one relevant code reviewer to the PR.
Your code builds clean without any errors or warnings.
You have added tests (when appropriate).
You have updated the README or other documentation to account for these
changes (when appropriate).
You have added an entry to the relevant CHANGELOG.md (see
"keep a changelog" for more information).
Your commit messages follow the conventional commit style.

This refactoring primarily affects `wdl-engine` and how workflows are evaluated. Previously, workflows were evaluated without spawning any tokio tasks. This meant that subgraph evaluation was not concurrent, although WDL tasks were spawned concurrently. With the upcoming support for integrating Crankshaft, the `Backend` trait was refactored such that the backend itself can maintain state and notify callers of when a WDL task is spawned and when it completes. This allows for the local execution backend to be rewritten in a way that enables it to respect the `cpu` and `memory` requirements of a WDL task. To support making the evaluator futures `Send`, we now utilize `rowan::AstPtr` to send AST nodes between threads performing evaluation. The evaluation graphs are mapped to be `Send` so that evaluation can keep a shared reference to the graph across concurrently executing tokio tasks. Also included with these changes are: * Use `tracing-indicatif` to prevent mixing of tracing output and progress bars; the progress bar now consistently shows at the end of tracing output. * Made `wdl_analysis::Document` trivially clonable. * Minor changes to internal names in `wdl-analysis`. * Fixed a missing constraint on the `basename` function definition for an overload that's only required for WDL 1.2. * Added a `Config` type to `wdl-engine` that controls how evaluation is performed; in the future this can be used to support evaluation configuration files. * Expanded the `ProgressKind` enumeration to allow for tracking how many tasks are "ready" (i.e. waiting to be spawned by the backend) vs. "executing" (i.e. currently running). * Removed the incorrect mapping of host paths to guest paths; this will soon be reimplemented as part of the initial support for a Docker backend. * Added retry logic to task execution.

wdl-engine/src/backend/local.rs

wdl-engine/src/config.rs

wdl-engine/src/eval/v1/task.rs

Co-authored-by: Andrew Frantz <[email protected]>

wdl-engine/src/backend/local.rs

wdl-engine/src/config.rs

wdl-engine/src/eval/v1/workflow.rs

Changed: * Ensured a dependency from the command section to every input in a task evaluation graph; this will be necessary to properly map input files that aren't referenced by the command by otherwise treated as "used" (e.g. a `bai` file input). This is relevant for future backends that need to map files into containers. * Made the working directory attempt-specific so that if a task fails and a retry is made, a new working directory is used while preserving the previous attempt's working directory. * Expanded upon the comment for the scatter `concurrency` setting. * Changed the task `retries` configuration option to mean the "default" number of retries to use if a task does not specify its own max retries requirement, which is also now respected in task execution. * Added `attempt` to the `ProgressKind::TaskExecutionStarted` variant.

a-frantz

Q: what about the temp dir on retries? And stderr/stdout files? Are those overwritten or preserved?

wdl-engine/src/eval/v1/task.rs

This commit fixes the retry logic in task evaluation to re-evaluate the command section before retrying. The change enables a command to see an updated `task` variable in 1.2 and can therefore know when it has failed in a previous attempt. Additionally, this makes it so that we also re-evaluate the hints and requirements sections upon retry; the hope is that in the near future the spec will be updated to allow those sections to have task scope. A test was added to retry a task.

peterhuene · 2025-02-01T01:47:14Z

Looks like a new nightly was released that changed formatting, so need to bump my local nightly and reformat.

peterhuene · 2025-02-01T01:51:08Z

Q: what about the temp dir on retries? And stderr/stdout files? Are those overwritten or preserved?

There are now two temp directories in the directory layout as described in a previous comment: one is where files are created before and after we evaluate the command and one is for the evaluation of the current attempt's command.

So effectively they are preserved from attempt to attempt along with the command/stdout/stderr files.

peterhuene requested review from adthrasher, claymcleod and a-frantz January 30, 2025 17:53

peterhuene self-assigned this Jan 30, 2025

peterhuene force-pushed the refactoring branch from 7883586 to f51246c Compare January 30, 2025 18:29

a-frantz reviewed Jan 30, 2025

View reviewed changes

Update wdl-engine/src/eval/v1/task.rs

c1f45d6

Co-authored-by: Andrew Frantz <[email protected]>

claymcleod reviewed Jan 30, 2025

View reviewed changes

wdl-engine/src/backend/local.rs Show resolved Hide resolved

claymcleod reviewed Jan 30, 2025

View reviewed changes

wdl-engine/src/config.rs Show resolved Hide resolved

claymcleod reviewed Jan 30, 2025

View reviewed changes

wdl-engine/src/config.rs Show resolved Hide resolved

claymcleod reviewed Jan 30, 2025

View reviewed changes

wdl-engine/src/eval/v1/workflow.rs Show resolved Hide resolved

claymcleod approved these changes Jan 30, 2025

View reviewed changes

peterhuene added 3 commits January 31, 2025 13:40

chore: add unit test for Config::validate.

7348e66

chore: update CHANGELOGs.

bbd17c5

peterhuene requested review from a-frantz and claymcleod January 31, 2025 19:04

a-frantz reviewed Jan 31, 2025

View reviewed changes

wdl-engine/src/eval/v1/task.rs Show resolved Hide resolved

peterhuene requested a review from a-frantz February 1, 2025 01:47

chore: update code formatting with new nightly.

78e2326

claymcleod approved these changes Feb 1, 2025

View reviewed changes

peterhuene merged commit aea7e02 into stjude-rust-labs:main Feb 1, 2025
16 checks passed

peterhuene deleted the refactoring branch February 1, 2025 03:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: change workflow evaluation to use `tokio::spawn`. #320

refactor: change workflow evaluation to use `tokio::spawn`. #320

peterhuene commented Jan 30, 2025 •

edited

Loading

a-frantz left a comment

peterhuene commented Feb 1, 2025

peterhuene commented Feb 1, 2025 •

edited

Loading

refactor: change workflow evaluation to use tokio::spawn. #320

refactor: change workflow evaluation to use tokio::spawn. #320

Conversation

peterhuene commented Jan 30, 2025 • edited Loading

a-frantz left a comment

Choose a reason for hiding this comment

peterhuene commented Feb 1, 2025

peterhuene commented Feb 1, 2025 • edited Loading

refactor: change workflow evaluation to use `tokio::spawn`. #320

refactor: change workflow evaluation to use `tokio::spawn`. #320

peterhuene commented Jan 30, 2025 •

edited

Loading

peterhuene commented Feb 1, 2025 •

edited

Loading