From 7e64de3218683ecd9c9e92a74cb08ef84cc0edb7 Mon Sep 17 00:00:00 2001 From: Jan van Mansum Date: Sat, 9 Nov 2024 09:44:52 +0100 Subject: [PATCH] Added some starter documentation. --- docs/arch.md | 0 docs/description.md | 75 +++++++++++++++++++++++++++++++++++++++++++++ docs/dev.md | 0 docs/index.md | 61 ++---------------------------------- docs/install.md | 36 ++++++++++++++++++++++ mkdocs.yml | 10 +++++- 6 files changed, 123 insertions(+), 59 deletions(-) create mode 100644 docs/arch.md create mode 100644 docs/description.md create mode 100644 docs/dev.md create mode 100644 docs/install.md diff --git a/docs/arch.md b/docs/arch.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/description.md b/docs/description.md new file mode 100644 index 0000000..c295498 --- /dev/null +++ b/docs/description.md @@ -0,0 +1,75 @@ +DESCRIPTION +=========== + +Service for ingesting datasets into Dataverse via the API. + +Deposit directories +------------------- + +The datasets are prepared as deposit directories in the ingest area. There are the following types of deposit directories: + +### `simple` + +A directory with the following structure: + +```text +/ + ├── deposit.properties + ├── dataset.json + ├── files/ + │ ├── file1.txt + │ ├── file2.txt + │ └── subdirectory/ + │ └── file3.txt +``` + +The name of the deposit directory must be a UUID. The deposit directory contains the following files: + +| File | Description | +|----------------------|---------------------------------------------------------------------------------------------------------| +| `deposit.properties` | Contains instructions for `dd-dataverse-ingest` on how to ingest the dataset. | +| `dataset.json` | Contains metadata for the dataset in the Native API format that Dataverse
expects. | +| `files/` | Contains the files that are part of the dataset; subdirectories are translated
into directoryLabels. | + +### `dans-bag`: TODO + +Processing +---------- +The deposit area is a directory with the following structure: + +```text +imports +├── inbox +│   └── path +│   └── to +│   ├── batch1 +│   │   ├── 0223914e-c053-4ee8-99d8-a9135fa4db4a +│   │   ├── 1b5c1b24-de40-4a40-9c58-d4409672229e +│   │   └── 9a47c5be-58c0-4295-8409-8156bd9ed9e1 +│   └── batch2 +│   ├── 5e42a936-4b90-4cac-b3c1-798b0b5eeb0b +│   └── 9c2ce5a5-b836-468a-89d4-880efb071d9d +└── outbox + └── path + └── to + └── batch1 + ├── failed + ├── processed + │   └── 7660539b-6ddb-4719-aa31-a3d1c978081b + └── rejected +``` + +The deposits to be processed are to be placed under `inbox`. All the files in it must be readable and writable by the service. +When the service is requested to process a batch, it will do the folowing for each deposit: + +1. Create a dataset in Dataverse using the metadata in `dataset.json`. +2. Upload the files in `files/` to the dataset. +3. Publish the dataset. +4. Wait for the dataset to be published. +5. Move the deposit to `outbox/path/to/batch/processed` if the dataset was published successfully, to + `outbox/path/to/batch/rejected` if the dataset was not valid, or to `outbox/path/to/batch/failed` if some + other error occurred. + +Note that the relative path of the processed files in outbox is the same as in the inbox, except for an extra level +of directories for the status of the deposit. + diff --git a/docs/dev.md b/docs/dev.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/index.md b/docs/index.md index e1a359f..0f39de4 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,69 +1,14 @@ dd-dataverse-ingest -=========== - - +=================== +Service for ingesting datasets into Dataverse via the API. SYNOPSIS -------- - dd-dataverse-ingest { server | check } - - -DESCRIPTION ------------ - -Ingest datasets into Dataverse via the API - - -ARGUMENTS ---------- - - positional arguments: - {server,check} available commands - - named arguments: - -h, --help show this help message and exit - -v, --version show the application version and exit - -EXAMPLES --------- - - - - -INSTALLATION AND CONFIGURATION ------------------------------- -Currently this project is built as an RPM package for RHEL7/CentOS7 and later. The RPM will install the binaries to -`/opt/dans.knaw.nl/dd-dataverse-ingest` and the configuration files to `/etc/opt/dans.knaw.nl/dd-dataverse-ingest`. - -For installation on systems that do no support RPM and/or systemd: - -1. Build the tarball (see next section). -2. Extract it to some location on your system, for example `/opt/dans.knaw.nl/dd-dataverse-ingest`. -3. Start the service with the following command - ``` - /opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml - ``` - -BUILDING FROM SOURCE --------------------- -Prerequisites: + sudo systemctl {start|stop|restart|status} dd-dataverse-ingest -* Java 11 or higher -* Maven 3.3.3 or higher -* RPM -Steps: - - git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git - cd dd-dataverse-ingest - mvn clean install -If the `rpm` executable is found at `/usr/local/bin/rpm`, the build profile that includes the RPM -packaging will be activated. If `rpm` is available, but at a different path, then activate it by using -Maven's `-P` switch: `mvn -Pprm install`. -Alternatively, to build the tarball execute: - mvn clean install assembly:single diff --git a/docs/install.md b/docs/install.md new file mode 100644 index 0000000..dd310bb --- /dev/null +++ b/docs/install.md @@ -0,0 +1,36 @@ +INSTALLATION AND CONFIGURATION +============================== + +Currently, this project is built as an RPM package for RHEL8/Rocky8 and later. The RPM will install the binaries to +`/opt/dans.knaw.nl/dd-dataverse-ingest` and the configuration files to `/etc/opt/dans.knaw.nl/dd-dataverse-ingest`. + +For installation on systems that do no support RPM and/or systemd: + +1. Build the tarball (see next section). +2. Extract it to some location on your system, for example `/opt/dans.knaw.nl/dd-dataverse-ingest`. +3. Start the service with the following command + ``` + /opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml + ``` + +BUILDING FROM SOURCE +==================== +Prerequisites: + +* Java 17 or higher +* Maven 3.3.3 or higher +* RPM + +Steps: + + git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git + cd dd-dataverse-ingest + mvn clean install + +If the `rpm` executable is found at `/usr/local/bin/rpm`, the build profile that includes the RPM +packaging will be activated. If `rpm` is available, but at a different path, then activate it by using +Maven's `-P` switch: `mvn -Pprm install`. + +Alternatively, to build the tarball execute: + + mvn clean install assembly:single diff --git a/mkdocs.yml b/mkdocs.yml index 0466e60..0546727 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -22,7 +22,15 @@ repo_name: DANS-KNAW/dd-dataverse-ingest repo_url: https://github.com/DANS-KNAW/dd-dataverse-ingest nav: - - Manual: index.md + - Manual: + - Introduction: index.md + - Description: description.md + # - Examples: examples.md + - Installation: install.md + - Development: + - Overview: dev.md + - Context: arch.md + plugins: - markdownextradata