Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: change workflow evaluation to use tokio::spawn. #320

Merged
merged 7 commits into from
Feb 1, 2025

Conversation

peterhuene
Copy link
Collaborator

@peterhuene peterhuene commented Jan 30, 2025

This refactoring primarily affects wdl-engine and how workflows are evaluated. Previously, workflows were evaluated without spawning any tokio tasks. This meant that subgraph evaluation was not concurrent, although WDL tasks were spawned concurrently.

With the upcoming support for integrating Crankshaft, the Backend trait was refactored such that the backend itself can maintain state and notify callers of when a WDL task is spawned and when it completes. This allows for the local execution backend to be rewritten in a way that enables it to respect the cpu and memory requirements of a WDL task.

To support making the evaluator futures Send, we now utilize rowan::AstPtr to send AST nodes between threads performing evaluation. The evaluation graphs are mapped to be Send so that evaluation can keep a shared reference to the graph across concurrently executing tokio tasks.

Also included with these changes:

  • Use tracing-indicatif to prevent mixing of tracing output and progress bars; the progress bar now consistently shows at the end of tracing output and can also be quieted.
  • Made wdl_analysis::Document trivially clonable.
  • Minor changes to internal names in wdl-analysis.
  • Fixed a missing constraint on the basename function definition for an overload that's only required for WDL 1.2.
  • Added a Config type to wdl-engine that controls how evaluation is performed; in the future this can be used to support evaluation configuration files.
  • Expanded the ProgressKind enumeration to allow for tracking how many tasks are "ready" (i.e. waiting to be spawned by the backend) vs. "executing" (i.e. currently running).
  • Removed the incorrect mapping of host paths to guest paths; this will soon be reimplemented as part of the initial support for a Docker backend.
  • Added retry logic to task execution.

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have added yourself or the appropriate individual as the assignee.
  • You have added at least one relevant code reviewer to the PR.
  • Your code builds clean without any errors or warnings.
  • You have added tests (when appropriate).
  • You have updated the README or other documentation to account for these
    changes (when appropriate).
  • You have added an entry to the relevant CHANGELOG.md (see
    "keep a changelog" for more information).
  • Your commit messages follow the conventional commit style.

This refactoring primarily affects `wdl-engine` and how workflows are
evaluated. Previously, workflows were evaluated without spawning any tokio
tasks. This meant that subgraph evaluation was not concurrent, although WDL
tasks were spawned concurrently.

With the upcoming support for integrating Crankshaft, the `Backend` trait was
refactored such that the backend itself can maintain state and notify callers
of when a WDL task is spawned and when it completes. This allows for the local
execution backend to be rewritten in a way that enables it to respect the `cpu`
and `memory` requirements of a WDL task.

To support making the evaluator futures `Send`, we now utilize `rowan::AstPtr`
to send AST nodes between threads performing evaluation. The evaluation graphs
are mapped to be `Send` so that evaluation can keep a shared reference to the
graph across concurrently executing tokio tasks.

Also included with these changes are:

* Use `tracing-indicatif` to prevent mixing of tracing output and progress
  bars; the progress bar now consistently shows at the end of tracing output.
* Made `wdl_analysis::Document` trivially clonable.
* Minor changes to internal names in `wdl-analysis`.
* Fixed a missing constraint on the `basename` function definition for an
  overload that's only required for WDL 1.2.
* Added a `Config` type to `wdl-engine` that controls how evaluation is
  performed; in the future this can be used to support evaluation configuration
  files.
* Expanded the `ProgressKind` enumeration to allow for tracking how many tasks
  are "ready" (i.e. waiting to be spawned by the backend) vs. "executing" (i.e.
  currently running).
* Removed the incorrect mapping of host paths to guest paths; this will soon be
  reimplemented as part of the initial support for a Docker backend.
* Added retry logic to task execution.
wdl-engine/src/backend/local.rs Show resolved Hide resolved
wdl-engine/src/backend/local.rs Outdated Show resolved Hide resolved
wdl-engine/src/backend/local.rs Outdated Show resolved Hide resolved
wdl-engine/src/config.rs Show resolved Hide resolved
wdl-engine/src/config.rs Show resolved Hide resolved
wdl-engine/src/eval/v1/task.rs Outdated Show resolved Hide resolved
wdl-engine/src/eval/v1/task.rs Outdated Show resolved Hide resolved
Changed:

* Ensured a dependency from the command section to every input in a task
  evaluation graph; this will be necessary to properly map input files that
  aren't referenced by the command by otherwise treated as "used" (e.g. a `bai`
  file input). This is relevant for future backends that need to map files into
  containers.
* Made the working directory attempt-specific so that if a task fails and a
  retry is made, a new working directory is used while preserving the previous
  attempt's working directory.
* Expanded upon the comment for the scatter `concurrency` setting.
* Changed the task `retries` configuration option to mean the "default" number
  of retries to use if a task does not specify its own max retries requirement,
  which is also now respected in task execution.
* Added `attempt` to the `ProgressKind::TaskExecutionStarted` variant.
Copy link
Member

@a-frantz a-frantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: what about the temp dir on retries? And stderr/stdout files? Are those overwritten or preserved?

wdl-engine/src/eval/v1/task.rs Show resolved Hide resolved
This commit fixes the retry logic in task evaluation to re-evaluate the command
section before retrying.

The change enables a command to see an updated `task` variable in 1.2 and can
therefore know when it has failed in a previous attempt.

Additionally, this makes it so that we also re-evaluate the hints and
requirements sections upon retry; the hope is that in the near future the spec
will be updated to allow those sections to have task scope.

A test was added to retry a task.
@peterhuene
Copy link
Collaborator Author

Looks like a new nightly was released that changed formatting, so need to bump my local nightly and reformat.

@peterhuene peterhuene requested a review from a-frantz February 1, 2025 01:47
@peterhuene
Copy link
Collaborator Author

peterhuene commented Feb 1, 2025

Q: what about the temp dir on retries? And stderr/stdout files? Are those overwritten or preserved?

There are now two temp directories in the directory layout as described in a previous comment: one is where files are created before and after we evaluate the command and one is for the evaluation of the current attempt's command.

So effectively they are preserved from attempt to attempt along with the command/stdout/stderr files.

@peterhuene peterhuene merged commit aea7e02 into stjude-rust-labs:main Feb 1, 2025
16 checks passed
@peterhuene peterhuene deleted the refactoring branch February 1, 2025 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants