Skip to content

Commit f2d3b8b

Browse files
[wip]
1 parent 7d53c54 commit f2d3b8b

File tree

1 file changed

+32
-9
lines changed

1 file changed

+32
-9
lines changed

docs/how_to_run.md

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
_This documentation describes the DSC workflow and how to run the application._
44

5-
**DISCLAIMER**: While the CLI application is runnable on its own, the DSO Step Function offers a simplified user interface for running the full ETL pipeline.
5+
**DISCLAIMER**: While the CLI application is runnable on its own, the DSO Step Function offers a simplified user interface for running the full ETL pipeline. For more details on the DSO Step Function and how to use it, see https://mitlibraries.atlassian.net/wiki/spaces/IN/pages/4690542593/DSpace+Submission+Orchestrator+DSO.
66

7-
# The DSC Workflow
7+
## The DSC Workflow
88

99
The DSC workflow consists of the following key steps:
1010

@@ -17,22 +17,45 @@ It's important to note that DSC is not responsible for ingesting items into DSpa
1717

1818
What the step function does with each key step....
1919

20-
# Create a batch
20+
### Create a batch
2121
DSC processes deposits in "batches", a collection of item submissions grouped by a unique identifier. DSC requires that the item submission assets (metadata and bitstream files) are uploaded to a "folder" in S3, named after the batch ID. While some requestors may upload the submission assets to S3 themselves, in other cases, these files need to be retrieved (via API requests) and uploaded during the batch creation step.
2222

2323
At the end of this step:
2424
* If all item submission assets are complete:
2525
- A batch folder with complete item submission assets exists in the DSO S3 bucket
2626
- Each item submission in the batch is recorded in DynamoDB (with `status="batch_created"`)
27-
- [OPTIONAL] An email is sent reporting the number of created item submissions. The email
28-
includes a CSV file with the batch records from DynamoDB.
27+
- **[OPTIONAL]** An email is sent reporting the number of created item submissions. The email includes a CSV file with the batch records from DynamoDB.
2928
* If any item submission assets were invalid (missing metadata and/or bitstreams):
3029
- A batch folder with incomplete item submission assets exists in the DSO S3 bucket
31-
- [OPTIONAL] An email is sent reporting that zero item submissions were created. The email
32-
includes a CSV file indicating the failed item submissions and the corresponding error message.
30+
- **[OPTIONAL]** An email is sent reporting that zero item submissions were created. The email
31+
includes a CSV file describing the failing item submissions with the corresponding error message.
3332

34-
# Queue a batch for ingest
35-
# Finalize
33+
### Queue a batch for ingest
34+
DSC retrieves the batch records from DynamoDB, and for each item submission, it performs the following steps:
35+
* Determine whether the item submission should be sent to the DSS input queue
36+
* Map/transform the source metadata to follow the Dublin Core schema
37+
* Create and upload a metadata JSON file in the batch folder (under `dspace_metadata/`)
38+
* Send a message to the DSS input queue
39+
40+
Note: The message is structured in accordance with the [Submission Message Specification](https://github.com/MITLibraries/dspace-submission-service/blob/main/docs/specifications/submission-message-specification.md).
41+
42+
At the end of this step:
43+
* Batch records in DynamoDB are updated. Updates are made to the folllowing fields:
44+
- `status`: Indicates submit status
45+
- `status_details`: Set to error messages (if message failed to send)
46+
- `last_run_date`: Set to current run date
47+
- `submit_attempts`: Increments by 1
48+
* **[OPTIONAL]** An email is sent reporting the counts for each submission status. The email includes a CSV file with the batch records from DynamoDB, reflecting the latest information.
49+
50+
### Run DSS
51+
DSS consumes the submission messages from the input queue in SQS. DSS uses a client to interact with DSpace. For each item submission, DSS reads the metadata JSON file and bitstreams from S3, using the information provided in the message, and creates an item with bitstreams in DSpace.
52+
53+
At the end of this step:
54+
* Result messages are written to the output queue for DSC (`dss-output-dsc`).
55+
56+
Note: The message is structured in accordance with the [Result Message Specification](https://github.com/MITLibraries/dspace-submission-service/blob/main/docs/specifications/result-message-specification.md).
57+
58+
### Inspect ingest results
3659

3760
----
3861
Each step is explained in greater detail in the sections below.

0 commit comments

Comments
 (0)