You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/how_to_run.md
+32-9Lines changed: 32 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
_This documentation describes the DSC workflow and how to run the application._
4
4
5
-
**DISCLAIMER**: While the CLI application is runnable on its own, the DSO Step Function offers a simplified user interface for running the full ETL pipeline.
5
+
**DISCLAIMER**: While the CLI application is runnable on its own, the DSO Step Function offers a simplified user interface for running the full ETL pipeline. For more details on the DSO Step Function and how to use it, see https://mitlibraries.atlassian.net/wiki/spaces/IN/pages/4690542593/DSpace+Submission+Orchestrator+DSO.
6
6
7
-
# The DSC Workflow
7
+
##The DSC Workflow
8
8
9
9
The DSC workflow consists of the following key steps:
10
10
@@ -17,22 +17,45 @@ It's important to note that DSC is not responsible for ingesting items into DSpa
17
17
18
18
What the step function does with each key step....
19
19
20
-
# Create a batch
20
+
###Create a batch
21
21
DSC processes deposits in "batches", a collection of item submissions grouped by a unique identifier. DSC requires that the item submission assets (metadata and bitstream files) are uploaded to a "folder" in S3, named after the batch ID. While some requestors may upload the submission assets to S3 themselves, in other cases, these files need to be retrieved (via API requests) and uploaded during the batch creation step.
22
22
23
23
At the end of this step:
24
24
* If all item submission assets are complete:
25
25
- A batch folder with complete item submission assets exists in the DSO S3 bucket
26
26
- Each item submission in the batch is recorded in DynamoDB (with `status="batch_created"`)
27
-
-[OPTIONAL] An email is sent reporting the number of created item submissions. The email
28
-
includes a CSV file with the batch records from DynamoDB.
27
+
-**[OPTIONAL]** An email is sent reporting the number of created item submissions. The email includes a CSV file with the batch records from DynamoDB.
29
28
* If any item submission assets were invalid (missing metadata and/or bitstreams):
30
29
- A batch folder with incomplete item submission assets exists in the DSO S3 bucket
31
-
-[OPTIONAL] An email is sent reporting that zero item submissions were created. The email
32
-
includes a CSV file indicating the failed item submissions and the corresponding error message.
30
+
-**[OPTIONAL]** An email is sent reporting that zero item submissions were created. The email
31
+
includes a CSV file describing the failing item submissions with the corresponding error message.
33
32
34
-
# Queue a batch for ingest
35
-
# Finalize
33
+
### Queue a batch for ingest
34
+
DSC retrieves the batch records from DynamoDB, and for each item submission, it performs the following steps:
35
+
* Determine whether the item submission should be sent to the DSS input queue
36
+
* Map/transform the source metadata to follow the Dublin Core schema
37
+
* Create and upload a metadata JSON file in the batch folder (under `dspace_metadata/`)
38
+
* Send a message to the DSS input queue
39
+
40
+
Note: The message is structured in accordance with the [Submission Message Specification](https://github.com/MITLibraries/dspace-submission-service/blob/main/docs/specifications/submission-message-specification.md).
41
+
42
+
At the end of this step:
43
+
* Batch records in DynamoDB are updated. Updates are made to the folllowing fields:
44
+
-`status`: Indicates submit status
45
+
-`status_details`: Set to error messages (if message failed to send)
46
+
-`last_run_date`: Set to current run date
47
+
-`submit_attempts`: Increments by 1
48
+
***[OPTIONAL]** An email is sent reporting the counts for each submission status. The email includes a CSV file with the batch records from DynamoDB, reflecting the latest information.
49
+
50
+
### Run DSS
51
+
DSS consumes the submission messages from the input queue in SQS. DSS uses a client to interact with DSpace. For each item submission, DSS reads the metadata JSON file and bitstreams from S3, using the information provided in the message, and creates an item with bitstreams in DSpace.
52
+
53
+
At the end of this step:
54
+
* Result messages are written to the output queue for DSC (`dss-output-dsc`).
55
+
56
+
Note: The message is structured in accordance with the [Result Message Specification](https://github.com/MITLibraries/dspace-submission-service/blob/main/docs/specifications/result-message-specification.md).
57
+
58
+
### Inspect ingest results
36
59
37
60
----
38
61
Each step is explained in greater detail in the sections below.
0 commit comments