Skip to content

Commit

Permalink
Added some starter documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
janvanmansum committed Nov 9, 2024
1 parent 974a719 commit 7e64de3
Show file tree
Hide file tree
Showing 6 changed files with 123 additions and 59 deletions.
Empty file added docs/arch.md
Empty file.
75 changes: 75 additions & 0 deletions docs/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
DESCRIPTION
===========

Service for ingesting datasets into Dataverse via the API.

Deposit directories
-------------------

The datasets are prepared as deposit directories in the ingest area. There are the following types of deposit directories:

### `simple`

A directory with the following structure:

```text
<uuid>/
├── deposit.properties
├── dataset.json
├── files/
│ ├── file1.txt
│ ├── file2.txt
│ └── subdirectory/
│ └── file3.txt
```

The name of the deposit directory must be a UUID. The deposit directory contains the following files:

| File | Description |
|----------------------|---------------------------------------------------------------------------------------------------------|
| `deposit.properties` | Contains instructions for `dd-dataverse-ingest` on how to ingest the dataset. |
| `dataset.json` | Contains metadata for the dataset in the Native API format that Dataverse<br> expects. |
| `files/` | Contains the files that are part of the dataset; subdirectories are translated<br>into directoryLabels. |

### `dans-bag`: TODO

Processing
----------
The deposit area is a directory with the following structure:

```text
imports
├── inbox
│   └── path
│   └── to
│   ├── batch1
│   │   ├── 0223914e-c053-4ee8-99d8-a9135fa4db4a
│   │   ├── 1b5c1b24-de40-4a40-9c58-d4409672229e
│   │   └── 9a47c5be-58c0-4295-8409-8156bd9ed9e1
│   └── batch2
│   ├── 5e42a936-4b90-4cac-b3c1-798b0b5eeb0b
│   └── 9c2ce5a5-b836-468a-89d4-880efb071d9d
└── outbox
└── path
└── to
└── batch1
├── failed
├── processed
│   └── 7660539b-6ddb-4719-aa31-a3d1c978081b
└── rejected
```

The deposits to be processed are to be placed under `inbox`. All the files in it must be readable and writable by the service.
When the service is requested to process a batch, it will do the folowing for each deposit:

1. Create a dataset in Dataverse using the metadata in `dataset.json`.
2. Upload the files in `files/` to the dataset.
3. Publish the dataset.
4. Wait for the dataset to be published.
5. Move the deposit to `outbox/path/to/batch/processed` if the dataset was published successfully, to
`outbox/path/to/batch/rejected` if the dataset was not valid, or to `outbox/path/to/batch/failed` if some
other error occurred.

Note that the relative path of the processed files in outbox is the same as in the inbox, except for an extra level
of directories for the status of the deposit.

Empty file added docs/dev.md
Empty file.
61 changes: 3 additions & 58 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,14 @@
dd-dataverse-ingest
===========

<!-- Remove this comment and extend the descriptions below -->
===================

Service for ingesting datasets into Dataverse via the API.

SYNOPSIS
--------

dd-dataverse-ingest { server | check }


DESCRIPTION
-----------

Ingest datasets into Dataverse via the API


ARGUMENTS
---------

positional arguments:
{server,check} available commands
named arguments:
-h, --help show this help message and exit
-v, --version show the application version and exit

EXAMPLES
--------

<!-- Add examples of invoking this module from the command line or via HTTP other interfaces -->


INSTALLATION AND CONFIGURATION
------------------------------
Currently this project is built as an RPM package for RHEL7/CentOS7 and later. The RPM will install the binaries to
`/opt/dans.knaw.nl/dd-dataverse-ingest` and the configuration files to `/etc/opt/dans.knaw.nl/dd-dataverse-ingest`.

For installation on systems that do no support RPM and/or systemd:

1. Build the tarball (see next section).
2. Extract it to some location on your system, for example `/opt/dans.knaw.nl/dd-dataverse-ingest`.
3. Start the service with the following command
```
/opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml
```

BUILDING FROM SOURCE
--------------------
Prerequisites:
sudo systemctl {start|stop|restart|status} dd-dataverse-ingest

* Java 11 or higher
* Maven 3.3.3 or higher
* RPM

Steps:

git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git
cd dd-dataverse-ingest
mvn clean install

If the `rpm` executable is found at `/usr/local/bin/rpm`, the build profile that includes the RPM
packaging will be activated. If `rpm` is available, but at a different path, then activate it by using
Maven's `-P` switch: `mvn -Pprm install`.

Alternatively, to build the tarball execute:

mvn clean install assembly:single
36 changes: 36 additions & 0 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
INSTALLATION AND CONFIGURATION
==============================

Currently, this project is built as an RPM package for RHEL8/Rocky8 and later. The RPM will install the binaries to
`/opt/dans.knaw.nl/dd-dataverse-ingest` and the configuration files to `/etc/opt/dans.knaw.nl/dd-dataverse-ingest`.

For installation on systems that do no support RPM and/or systemd:

1. Build the tarball (see next section).
2. Extract it to some location on your system, for example `/opt/dans.knaw.nl/dd-dataverse-ingest`.
3. Start the service with the following command
```
/opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml
```

BUILDING FROM SOURCE
====================
Prerequisites:

* Java 17 or higher
* Maven 3.3.3 or higher
* RPM

Steps:

git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git
cd dd-dataverse-ingest
mvn clean install

If the `rpm` executable is found at `/usr/local/bin/rpm`, the build profile that includes the RPM
packaging will be activated. If `rpm` is available, but at a different path, then activate it by using
Maven's `-P` switch: `mvn -Pprm install`.

Alternatively, to build the tarball execute:

mvn clean install assembly:single
10 changes: 9 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,15 @@ repo_name: DANS-KNAW/dd-dataverse-ingest
repo_url: https://github.com/DANS-KNAW/dd-dataverse-ingest

nav:
- Manual: index.md
- Manual:
- Introduction: index.md
- Description: description.md
# - Examples: examples.md
- Installation: install.md
- Development:
- Overview: dev.md
- Context: arch.md


plugins:
- markdownextradata
Expand Down

0 comments on commit 7e64de3

Please sign in to comment.