-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
974a719
commit 7e64de3
Showing
6 changed files
with
123 additions
and
59 deletions.
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
DESCRIPTION | ||
=========== | ||
|
||
Service for ingesting datasets into Dataverse via the API. | ||
|
||
Deposit directories | ||
------------------- | ||
|
||
The datasets are prepared as deposit directories in the ingest area. There are the following types of deposit directories: | ||
|
||
### `simple` | ||
|
||
A directory with the following structure: | ||
|
||
```text | ||
<uuid>/ | ||
├── deposit.properties | ||
├── dataset.json | ||
├── files/ | ||
│ ├── file1.txt | ||
│ ├── file2.txt | ||
│ └── subdirectory/ | ||
│ └── file3.txt | ||
``` | ||
|
||
The name of the deposit directory must be a UUID. The deposit directory contains the following files: | ||
|
||
| File | Description | | ||
|----------------------|---------------------------------------------------------------------------------------------------------| | ||
| `deposit.properties` | Contains instructions for `dd-dataverse-ingest` on how to ingest the dataset. | | ||
| `dataset.json` | Contains metadata for the dataset in the Native API format that Dataverse<br> expects. | | ||
| `files/` | Contains the files that are part of the dataset; subdirectories are translated<br>into directoryLabels. | | ||
|
||
### `dans-bag`: TODO | ||
|
||
Processing | ||
---------- | ||
The deposit area is a directory with the following structure: | ||
|
||
```text | ||
imports | ||
├── inbox | ||
│ └── path | ||
│ └── to | ||
│ ├── batch1 | ||
│ │ ├── 0223914e-c053-4ee8-99d8-a9135fa4db4a | ||
│ │ ├── 1b5c1b24-de40-4a40-9c58-d4409672229e | ||
│ │ └── 9a47c5be-58c0-4295-8409-8156bd9ed9e1 | ||
│ └── batch2 | ||
│ ├── 5e42a936-4b90-4cac-b3c1-798b0b5eeb0b | ||
│ └── 9c2ce5a5-b836-468a-89d4-880efb071d9d | ||
└── outbox | ||
└── path | ||
└── to | ||
└── batch1 | ||
├── failed | ||
├── processed | ||
│ └── 7660539b-6ddb-4719-aa31-a3d1c978081b | ||
└── rejected | ||
``` | ||
|
||
The deposits to be processed are to be placed under `inbox`. All the files in it must be readable and writable by the service. | ||
When the service is requested to process a batch, it will do the folowing for each deposit: | ||
|
||
1. Create a dataset in Dataverse using the metadata in `dataset.json`. | ||
2. Upload the files in `files/` to the dataset. | ||
3. Publish the dataset. | ||
4. Wait for the dataset to be published. | ||
5. Move the deposit to `outbox/path/to/batch/processed` if the dataset was published successfully, to | ||
`outbox/path/to/batch/rejected` if the dataset was not valid, or to `outbox/path/to/batch/failed` if some | ||
other error occurred. | ||
|
||
Note that the relative path of the processed files in outbox is the same as in the inbox, except for an extra level | ||
of directories for the status of the deposit. | ||
|
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,69 +1,14 @@ | ||
dd-dataverse-ingest | ||
=========== | ||
|
||
<!-- Remove this comment and extend the descriptions below --> | ||
=================== | ||
|
||
Service for ingesting datasets into Dataverse via the API. | ||
|
||
SYNOPSIS | ||
-------- | ||
|
||
dd-dataverse-ingest { server | check } | ||
|
||
|
||
DESCRIPTION | ||
----------- | ||
|
||
Ingest datasets into Dataverse via the API | ||
|
||
|
||
ARGUMENTS | ||
--------- | ||
|
||
positional arguments: | ||
{server,check} available commands | ||
named arguments: | ||
-h, --help show this help message and exit | ||
-v, --version show the application version and exit | ||
|
||
EXAMPLES | ||
-------- | ||
|
||
<!-- Add examples of invoking this module from the command line or via HTTP other interfaces --> | ||
|
||
|
||
INSTALLATION AND CONFIGURATION | ||
------------------------------ | ||
Currently this project is built as an RPM package for RHEL7/CentOS7 and later. The RPM will install the binaries to | ||
`/opt/dans.knaw.nl/dd-dataverse-ingest` and the configuration files to `/etc/opt/dans.knaw.nl/dd-dataverse-ingest`. | ||
|
||
For installation on systems that do no support RPM and/or systemd: | ||
|
||
1. Build the tarball (see next section). | ||
2. Extract it to some location on your system, for example `/opt/dans.knaw.nl/dd-dataverse-ingest`. | ||
3. Start the service with the following command | ||
``` | ||
/opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml | ||
``` | ||
|
||
BUILDING FROM SOURCE | ||
-------------------- | ||
Prerequisites: | ||
sudo systemctl {start|stop|restart|status} dd-dataverse-ingest | ||
|
||
* Java 11 or higher | ||
* Maven 3.3.3 or higher | ||
* RPM | ||
|
||
Steps: | ||
|
||
git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git | ||
cd dd-dataverse-ingest | ||
mvn clean install | ||
|
||
If the `rpm` executable is found at `/usr/local/bin/rpm`, the build profile that includes the RPM | ||
packaging will be activated. If `rpm` is available, but at a different path, then activate it by using | ||
Maven's `-P` switch: `mvn -Pprm install`. | ||
|
||
Alternatively, to build the tarball execute: | ||
|
||
mvn clean install assembly:single |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
INSTALLATION AND CONFIGURATION | ||
============================== | ||
|
||
Currently, this project is built as an RPM package for RHEL8/Rocky8 and later. The RPM will install the binaries to | ||
`/opt/dans.knaw.nl/dd-dataverse-ingest` and the configuration files to `/etc/opt/dans.knaw.nl/dd-dataverse-ingest`. | ||
|
||
For installation on systems that do no support RPM and/or systemd: | ||
|
||
1. Build the tarball (see next section). | ||
2. Extract it to some location on your system, for example `/opt/dans.knaw.nl/dd-dataverse-ingest`. | ||
3. Start the service with the following command | ||
``` | ||
/opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml | ||
``` | ||
|
||
BUILDING FROM SOURCE | ||
==================== | ||
Prerequisites: | ||
|
||
* Java 17 or higher | ||
* Maven 3.3.3 or higher | ||
* RPM | ||
|
||
Steps: | ||
|
||
git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git | ||
cd dd-dataverse-ingest | ||
mvn clean install | ||
|
||
If the `rpm` executable is found at `/usr/local/bin/rpm`, the build profile that includes the RPM | ||
packaging will be activated. If `rpm` is available, but at a different path, then activate it by using | ||
Maven's `-P` switch: `mvn -Pprm install`. | ||
|
||
Alternatively, to build the tarball execute: | ||
|
||
mvn clean install assembly:single |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters