Skip to content

Commit 7d53c54

Browse files
[wip]
1 parent 3b133c3 commit 7d53c54

File tree

2 files changed

+50
-3
lines changed

2 files changed

+50
-3
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
# dspace-submission-composer
2-
An application for creating messages for the [DSpace Submission Service application](https://github.com/MITLibraries/dspace-submission-service).
2+
DSpace Submission Composer (DSC) is a Python CLI application that prepares items for ingest into DSpace.
33

4-
# Application Description
4+
DSC is a component of the DSpace Submission Orchestrator (DSO), a collection of microservices that form a data pipeline for ingesting items into DSpace repositories. The application's name highlights a key step of the DSC workflow in which it "composes" and sends a message to an SQS queue. These messages follow the specification set by the [DSpace Submission Service (DSS)](https://github.com/MITLibraries/dspace-submission-service), another component of DSO. Together, DSC and DSS follow a message-driven architecture, communicating over message queues in SQS.
55

6-
Description of the app
6+
See additional documentation in the :
7+
* [Understanding and Running the DSC Workflow](docs/how_to_run.md)
78

89
## Development
910

docs/how_to_run.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Understanding and Running the DSC Workflow
2+
3+
_This documentation describes the DSC workflow and how to run the application._
4+
5+
**DISCLAIMER**: While the CLI application is runnable on its own, the DSO Step Function offers a simplified user interface for running the full ETL pipeline.
6+
7+
# The DSC Workflow
8+
9+
The DSC workflow consists of the following key steps:
10+
11+
1. Create a batch
12+
2. Queue a batch for ingest
13+
3. Ingest items into DSpace
14+
4. Inspect ingest results
15+
16+
It's important to note that DSC is not responsible for ingesting items into DSpace; this task is handled by _DSS_. The DSC CLI provides commands for all other steps in the DSC workflow.
17+
18+
What the step function does with each key step....
19+
20+
# Create a batch
21+
DSC processes deposits in "batches", a collection of item submissions grouped by a unique identifier. DSC requires that the item submission assets (metadata and bitstream files) are uploaded to a "folder" in S3, named after the batch ID. While some requestors may upload the submission assets to S3 themselves, in other cases, these files need to be retrieved (via API requests) and uploaded during the batch creation step.
22+
23+
At the end of this step:
24+
* If all item submission assets are complete:
25+
- A batch folder with complete item submission assets exists in the DSO S3 bucket
26+
- Each item submission in the batch is recorded in DynamoDB (with `status="batch_created"`)
27+
- [OPTIONAL] An email is sent reporting the number of created item submissions. The email
28+
includes a CSV file with the batch records from DynamoDB.
29+
* If any item submission assets were invalid (missing metadata and/or bitstreams):
30+
- A batch folder with incomplete item submission assets exists in the DSO S3 bucket
31+
- [OPTIONAL] An email is sent reporting that zero item submissions were created. The email
32+
includes a CSV file indicating the failed item submissions and the corresponding error message.
33+
34+
# Queue a batch for ingest
35+
# Finalize
36+
37+
----
38+
Each step is explained in greater detail in the sections below.
39+
CLI commands are defined to mirror the names of the workflow steps (with the exception of running DSS). The next sections cover each step in more detail.
40+
41+
prepare items for DSpace...
42+
submit items into DSpace......
43+
creating submission packages.....
44+
DSS ingests the SIPs...
45+
46+
While the CLI is the main entry point for DSC, the workflow modules handles the core functionality invoked by the CLI.

0 commit comments

Comments
 (0)