This document loosely follows the ARC42 template for architecture documentation.
It describes the architecture in the context of the Destination Earth platform.
The FAIR Workflow Platform is designed to allow the execution and sharing of machine-actionable workflows. It adheres to the FAIR principles (Findable, Accessible, Interoperable, Reusable). The platform follows an RO-Crate-centric approach. Workflows can be submitted as RO-Crates and their output is available as RO-Crate as well. By leveraging FAIR Signposting, it allows integrating resulting dataset in the greater linked data ecosystem.
- A user-facing frontend for workflow submission and viewing/retrieving datasets.
- Automatic tracking of workflow execution provenance.
- Datasets are available in a standardized format and described with rich metadata. This allows to share them across data spaces and research domains.
The platform integrates the following external services:
- ORCID: For user authentication
- Linked data context providers: Provide JSON-LD context for handling metadata
- S3 Storage: Object storage for digital objects and workflow data.
- External Data Repositories: Workflows can access data from external providers like GBIF or the Copernicus Climate Data Store.
- DEDL HDA Bridge: Workflows can access earth observation data and the Destination Earth Digital Twins through the Harmonized Data Access API.
- External code/container repositories: Workflows can use code or the execution environment description (docker container) stored in external repositories like Github or Docker Hub.
Human users interact with the platform via the frontend, which handles authentication, workflow submission, and dataset retrieval. Users can also access the Digital Object repository directly.
Machine agents can retrieve datasets in a machine-readable format through the frontend and Digital Object repository. FAIR signposting links human-readable frontpages to the machine-readable datasets.
The solution is built with a microservice architecture. This should ensure extensibility and allow for individual building blocks being replaced.
- Frontend: Manages user interactions and submissions. Provides FAIR signposting to machine-readable content.
- Digital Object repository: Stores datasets.
- Submission Service: Acts as a bridge between the frontend, Digital Object repository and the workflow engine.
- Workflow Engine: Executes workflows.
- Allows users to login with ORCID.
- Provides a GUI for submitting workflows as RO-Crates and monitoring their progress.
- Displays datasets and their provenance.
- Dynamically builds RO-Crates for export and allows to download them or retrieve them as "detached" RO-Crate.
- Provides an admin interface for managing registered users.
- Submits workflows and metadata to the submission service.
- Queries the Digital Object repository for digital objects.
- Django frontend with small sprinkles of JavaScript.
- Uses ro-crate-py to build ro crates.
- Stores digital objects for data, workflows and respective metadata.
- Provides data to the frontend.
- Receives data from the submission service.
- A cordra instance.
- Cordra schema closely resembles the profile of Workflow Run RO-Crates.
- Corda hooks written in Java/Kotlin ensure that all metadata documents are valid JSON-LD.
- Backed by an S3 Bucket for storage.
- Orchestrates workflow submission and ingestion of results.
- Retrieves workflow submissions from frontend and queues the workflow to the workflow engine
- Retrieves workflow results and metadata from the workflow engine upon completion
- Ingests workflows and artifacts into the Digital Object repository
- Built with FastAPI
- Annotates workflow submissions with metadata for provenance.
- Reads workflow annotations and stored workflow archives from the engine and ingests them into cordra.
- Executes workflows
- Stored workflow artifacts for submission service
- Notifies the Submission service about finished workflows
- Retrieves workflows from Submission services and notifies on workflow completion
- Stores workflow artifacts
- Supports secure key management via Kubernetes secrets
- Automatically adds exit-handlers to submitted workflows to notify the submission service
- Archives workflow artifacts into an S3 bucket.
Workflow submission consists of multiple phases:
- User authentication: This is a typical SAML authentication flow with orcid as identity provider.
- Workflow upload and validation: The user uploads a Workflow RO-Crate to the frontend. The frontend extracts the workflow file and checks its validity against the Workflow Submission Service. The result is reported back to the user.
- Workflow submission and execution: When the user submits the validated workflow, it is send to the Workflow Submission Service with additional provenance data (i.e. user name). The workflow submission service adds the provenance data as annotations to the workflow and queues the updated workflow to the Workflow Service. The workflow steps get executed according to the workflow definition. Finally, on successful completion, an exit handler is run that notifies the Workflow submission service about completion.
- Ingestion of workflow results: Upon retrieval of the notificaton from the exit handler, the Workflow submission service is responsible for creating a dataset representing this run in the Digital Object Repository. It first retrieves the workflow information which includes a description of the workflow and the provenance data as annotations. The Submission service subsequently builds Digital Objects for this data and adds them to the Digital Object Repository. For all artifacts of the workflow, a Digital Object representing this artifact is created. Finally, a Digital Object for the whole dataset is written, which makes the dataset available for retrieval
Data retrieval involves two primary modes: retrieval of a detached RO-Crate and a zipped RO-Crate containing all data. Both Crates are valid Workflow Run RO-Crates
The user requests an RO-Crate in a detached format. A detached RO-Crate lives on the internet and links to the contenet files of the RO-Crate.
The process involves the following steps:
- The user sends a request to the frontend for a detached RO-Crate.
- The frontend queries the Digital Object Repository (Cordra) for the relevant Digital Objects.
- Cordra constructs the object graph from the stored digital objects and returns it to the frontend.
- The frontend converts the object graph into a detached RO-Crate and delivers it to the user as json.
The user requests an RO-Crate as a zipped package containing both metadata and associated files. The process follows similar steps to 6.2.1 to retrieve the object graph but also streams the files of the RO-Crate into a zip stream respones.
- The user sends a request to the frontend for a zipped RO-Crate.
- The frontend queries the Digital Object Repository (Cordra) for the relevant Digital Objects.
- Cordra constructs the object graph from the stored digital objects and returns it to the frontend.
- The frontend converts the object graph into the RO-Crate format.
- The frontend starts streaming a zip file as HTTP response to the user. The stream includes the RO-Crate metadata as file according to the RO-Crate specification.
- For each digital object in the graph that represents a file, the frontend retrieves the file from Cordra and adds it to the zip archive. The files are added "on-the-fly", meaning they are streamed directly from Cordra, into the zip stream. Therefore files don't need to fit into memory or onto disk for the frontend.