|
| 1 | +# Understanding and Running the DSC Workflow |
| 2 | + |
| 3 | +_This documentation describes the DSC workflow and how to run the application._ |
| 4 | + |
| 5 | +**DISCLAIMER**: While the CLI application is runnable on its own, the DSO Step Function offers a simplified user interface for running the full ETL pipeline. |
| 6 | + |
| 7 | +# The DSC Workflow |
| 8 | + |
| 9 | +The DSC workflow consists of the following key steps: |
| 10 | + |
| 11 | +1. Create a batch |
| 12 | +2. Queue a batch for ingest |
| 13 | +3. Ingest items into DSpace |
| 14 | +4. Inspect ingest results |
| 15 | + |
| 16 | +It's important to note that DSC is not responsible for ingesting items into DSpace; this task is handled by _DSS_. The DSC CLI provides commands for all other steps in the DSC workflow. |
| 17 | + |
| 18 | +What the step function does with each key step.... |
| 19 | + |
| 20 | +# Create a batch |
| 21 | +DSC processes deposits in "batches", a collection of item submissions grouped by a unique identifier. DSC requires that the item submission assets (metadata and bitstream files) are uploaded to a "folder" in S3, named after the batch ID. While some requestors may upload the submission assets to S3 themselves, in other cases, these files need to be retrieved (via API requests) and uploaded during the batch creation step. |
| 22 | + |
| 23 | +At the end of this step: |
| 24 | +* If all item submission assets are complete: |
| 25 | + - A batch folder with complete item submission assets exists in the DSO S3 bucket |
| 26 | + - Each item submission in the batch is recorded in DynamoDB (with `status="batch_created"`) |
| 27 | + - [OPTIONAL] An email is sent reporting the number of created item submissions. The email |
| 28 | + includes a CSV file with the batch records from DynamoDB. |
| 29 | +* If any item submission assets were invalid (missing metadata and/or bitstreams): |
| 30 | + - A batch folder with incomplete item submission assets exists in the DSO S3 bucket |
| 31 | + - [OPTIONAL] An email is sent reporting that zero item submissions were created. The email |
| 32 | + includes a CSV file indicating the failed item submissions and the corresponding error message. |
| 33 | + |
| 34 | +# Queue a batch for ingest |
| 35 | +# Finalize |
| 36 | + |
| 37 | +---- |
| 38 | +Each step is explained in greater detail in the sections below. |
| 39 | +CLI commands are defined to mirror the names of the workflow steps (with the exception of running DSS). The next sections cover each step in more detail. |
| 40 | + |
| 41 | +prepare items for DSpace... |
| 42 | +submit items into DSpace...... |
| 43 | +creating submission packages..... |
| 44 | +DSS ingests the SIPs... |
| 45 | + |
| 46 | +While the CLI is the main entry point for DSC, the workflow modules handles the core functionality invoked by the CLI. |
0 commit comments