Refactor `Task` system to retain `ProtocolDAG`, upload `ProtocolUnitResult`s and `ResultFile`s as they complete #180

dotsdl · 2023-09-19T15:57:12Z

Currently, when a compute service executes a Task, it generates a new ProtocolDAG locally, executes it, and pushes the (successful or failed) ProtocolDAGResult back to the server. This adds the serialized ProtocolDAGResult to the object store, and a ProtocolDAGResultRef to the state store. A Task can have any number of failed ProtocolDAGResultRefs, and (typically) a single successful ProtocolDAGResultRef.

This approach does not currently support ResultFile upload (files produced by ProtocolUnits that are desired for permanent storage, available on-demand to users later), nor does it allow for a ProtocolDAG that successfully executes some ProtocolUnits to be started again from where it left off on another compute service (checkpointing). Our aim is to support both of these in alchemiscale.

This proposal should accomplish both:

instead of storing things in object store by ProtocolDAGResult, we should store them by Task/ProtocolDAG
- this fits in with the idea that the same file storage system can be used to enable partial restarts
a Task gets a ProtocolDAGRef in state store upon creation, serialized ProtocolDAG in object store
as ProtocolDAG is executed on compute service, ProtocolUnitResults and ResultFiles shipped to object store
on success, a complete ProtocolDAGResult shipped to object store, ProtocolDAGResultRef added to state store; same retrieval pattern as before
on failure, same as above but for a failed ProtocolDAGResult
when another compute service picks up a Task, it checks for existence of a ProtocolDAGRef; if present, pulls ProtocolDAG and its associated ProtocolUnitResults from object store
- it then finds the ProtocolUnits in the ProtocolDAG that have not successfully been executed (either failed or not run at all), identifies their dependency ProtocolUnitResults, grabs their ResultFiles if included in outputs, and proceeds with DAG execution

This has some nice properties:

a Task has a single ProtocolDAGRef ever, and this may have any number of failed ProtocolDAGResultRefs and only one successful ProtocolDAGResultRef
we don't have to do odd workarounds to utilize gufe storage system for ResultFiles (see gufe#186 and gufe#234 for current state as of this writing)
we get architectural support for checkpointing for ProtocolDAGs, reducing waste and time to results
still mostly the same system in terms of execution, status model, Task claiming, result retrieval, etc.
gives what is needed to support ResultFile retrieval user-side
gives what is needed to support extends support compute side, where one or more ResultFiles may be needed to extend a ProtocolDAG from a previous ProtocolDAGResult

The text was updated successfully, but these errors were encountered:

dotsdl added this to the Release 0.3.0 - new features, optimizations, targeted refactors milestone Sep 19, 2023

dotsdl linked a pull request Sep 19, 2023 that will close this issue

[WIP] Result path conversion and upload to object store #104

Open

dotsdl added component-compute-api component-compute-service component-objectstore component-compute-client component-statestore labels Sep 19, 2023

dotsdl linked a pull request Sep 19, 2023 that will close this issue

[WIP] Result path conversion and upload to object store #104

Open

dotsdl modified the milestones: Release 0.3.0 - new features, optimizations, targeted refactors, Release 0.4.0 - "living networks" and automated strategies enablement Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `Task` system to retain `ProtocolDAG`, upload `ProtocolUnitResult`s and `ResultFile`s as they complete #180

Refactor `Task` system to retain `ProtocolDAG`, upload `ProtocolUnitResult`s and `ResultFile`s as they complete #180

dotsdl commented Sep 19, 2023 •

edited

Loading

Refactor Task system to retain ProtocolDAG, upload ProtocolUnitResults and ResultFiles as they complete #180

Refactor Task system to retain ProtocolDAG, upload ProtocolUnitResults and ResultFiles as they complete #180

Comments

dotsdl commented Sep 19, 2023 • edited Loading

Refactor `Task` system to retain `ProtocolDAG`, upload `ProtocolUnitResult`s and `ResultFile`s as they complete #180

Refactor `Task` system to retain `ProtocolDAG`, upload `ProtocolUnitResult`s and `ResultFile`s as they complete #180

dotsdl commented Sep 19, 2023 •

edited

Loading