Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a simple type of deposit for performance testing #1

Closed
wants to merge 16 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions debug-init-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,8 @@

echo -n "Pre-creating log..."
TEMPDIR=data
mkdir -p $TEMPDIR/imports/inbox
mkdir -p $TEMPDIR/imports/outbox
mkdir -p $TEMPDIR/temp
touch $TEMPDIR/dd-dataverse-ingest.log
echo "OK"
Empty file added docs/arch.md
Empty file.
75 changes: 75 additions & 0 deletions docs/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
DESCRIPTION
===========

Service for ingesting datasets into Dataverse via the API.

Deposit directories
-------------------

The datasets are prepared as deposit directories in the ingest area. There are the following types of deposit directories:

### `simple`

A directory with the following structure:

```text
<uuid>/
├── deposit.properties
├── dataset.yml
├── files/
│ ├── file1.txt
│ ├── file2.txt
│ └── subdirectory/
│ └── file3.txt
```

The name of the deposit directory must be a UUID. The deposit directory contains the following files:

| File | Description |
|----------------------|-------------------------------------------------------------------------------------------------------------|
| `deposit.properties` | Contains instructions for `dd-dataverse-ingest` on how to ingest the dataset. |
| `dataset.yml` | Contains metadata for the dataset in Yaml compatible with the Native API format that Dataverse<br> expects. |
| `files/` | Contains the files that are part of the dataset; subdirectories are translated<br>into directoryLabels. |

### `dans-bag`: TODO

Processing
----------
The deposit area is a directory with the following structure:

```text
imports
├── inbox
│   └── path
│   └── to
│   ├── batch1
│   │   ├── 0223914e-c053-4ee8-99d8-a9135fa4db4a
│   │   ├── 1b5c1b24-de40-4a40-9c58-d4409672229e
│   │   └── 9a47c5be-58c0-4295-8409-8156bd9ed9e1
│   └── batch2
│   ├── 5e42a936-4b90-4cac-b3c1-798b0b5eeb0b
│   └── 9c2ce5a5-b836-468a-89d4-880efb071d9d
└── outbox
└── path
└── to
└── batch1
├── failed
├── processed
│   └── 7660539b-6ddb-4719-aa31-a3d1c978081b
└── rejected
```

The deposits to be processed are to be placed under `inbox`. All the files in it must be readable and writable by the service.
When the service is requested to process a batch, it will do the folowing for each deposit:

1. Create a dataset in Dataverse using the metadata in `dataset.yml`.
2. Upload the files in `files/` to the dataset.
3. Publish the dataset.
4. Wait for the dataset to be published.
5. Move the deposit to `outbox/path/to/batch/processed` if the dataset was published successfully, to
`outbox/path/to/batch/rejected` if the dataset was not valid, or to `outbox/path/to/batch/failed` if some
other error occurred.

Note that the relative path of the processed files in outbox is the same as in the inbox, except for an extra level
of directories for the status of the deposit.

Empty file added docs/dev.md
Empty file.
61 changes: 3 additions & 58 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,14 @@
dd-dataverse-ingest
===========

<!-- Remove this comment and extend the descriptions below -->
===================

Service for ingesting datasets into Dataverse via the API.

SYNOPSIS
--------

dd-dataverse-ingest { server | check }


DESCRIPTION
-----------

Ingest datasets into Dataverse via the API


ARGUMENTS
---------

positional arguments:
{server,check} available commands

named arguments:
-h, --help show this help message and exit
-v, --version show the application version and exit

EXAMPLES
--------

<!-- Add examples of invoking this module from the command line or via HTTP other interfaces -->


INSTALLATION AND CONFIGURATION
------------------------------
Currently this project is built as an RPM package for RHEL7/CentOS7 and later. The RPM will install the binaries to
`/opt/dans.knaw.nl/dd-dataverse-ingest` and the configuration files to `/etc/opt/dans.knaw.nl/dd-dataverse-ingest`.

For installation on systems that do no support RPM and/or systemd:

1. Build the tarball (see next section).
2. Extract it to some location on your system, for example `/opt/dans.knaw.nl/dd-dataverse-ingest`.
3. Start the service with the following command
```
/opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml
```

BUILDING FROM SOURCE
--------------------
Prerequisites:
sudo systemctl {start|stop|restart|status} dd-dataverse-ingest

* Java 11 or higher
* Maven 3.3.3 or higher
* RPM

Steps:

git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git
cd dd-dataverse-ingest
mvn clean install

If the `rpm` executable is found at `/usr/local/bin/rpm`, the build profile that includes the RPM
packaging will be activated. If `rpm` is available, but at a different path, then activate it by using
Maven's `-P` switch: `mvn -Pprm install`.

Alternatively, to build the tarball execute:

mvn clean install assembly:single
36 changes: 36 additions & 0 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
INSTALLATION AND CONFIGURATION
==============================

Currently, this project is built as an RPM package for RHEL8/Rocky8 and later. The RPM will install the binaries to
`/opt/dans.knaw.nl/dd-dataverse-ingest` and the configuration files to `/etc/opt/dans.knaw.nl/dd-dataverse-ingest`.

For installation on systems that do no support RPM and/or systemd:

1. Build the tarball (see next section).
2. Extract it to some location on your system, for example `/opt/dans.knaw.nl/dd-dataverse-ingest`.
3. Start the service with the following command
```
/opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml
```

BUILDING FROM SOURCE
====================
Prerequisites:

* Java 17 or higher
* Maven 3.3.3 or higher
* RPM

Steps:

git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git
cd dd-dataverse-ingest
mvn clean install

If the `rpm` executable is found at `/usr/local/bin/rpm`, the build profile that includes the RPM
packaging will be activated. If `rpm` is available, but at a different path, then activate it by using
Maven's `-P` switch: `mvn -Pprm install`.

Alternatively, to build the tarball execute:

mvn clean install assembly:single
10 changes: 9 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,15 @@ repo_name: DANS-KNAW/dd-dataverse-ingest
repo_url: https://github.com/DANS-KNAW/dd-dataverse-ingest

nav:
- Manual: index.md
- Manual:
- Introduction: index.md
- Description: description.md
# - Examples: examples.md
- Installation: install.md
- Development:
- Overview: dev.md
- Context: arch.md


plugins:
- markdownextradata
Expand Down
127 changes: 124 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@
limitations under the License.

-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
Expand All @@ -27,7 +26,7 @@
</parent>

<artifactId>dd-dataverse-ingest</artifactId>
<version>0.1.0-SNAPSHOT</version>
<version>0.2.1-SNAPSHOT</version>

<name>Dd Dataverse Ingest</name>
<url>https://github.com/DANS-KNAW/dd-dataverse-ingest</url>
Expand All @@ -36,6 +35,7 @@

<properties>
<main-class>nl.knaw.dans.dvingest.DdDataverseIngestApplication</main-class>
<dd-dataverse-ingest-api.version>0.1.0-SNAPSHOT</dd-dataverse-ingest-api.version>
</properties>

<scm>
Expand All @@ -48,6 +48,72 @@
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-core</artifactId>
</dependency>
<dependency>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-client</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<dependency>
<groupId>nl.knaw.dans</groupId>
<artifactId>dans-dataverse-client-lib</artifactId>
</dependency>
<dependency>
<groupId>nl.knaw.dans</groupId>
<artifactId>dans-java-utils</artifactId>
</dependency>
<dependency>
<groupId>nl.knaw.dans</groupId>
<artifactId>dans-validation-lib</artifactId>
</dependency>
<dependency>
<groupId>org.hsqldb</groupId>
<artifactId>hsqldb</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
</dependency>


<dependency>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-testing</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>

</dependencies>

<build>
Expand All @@ -56,6 +122,61 @@
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration combine.children="override">
<annotationProcessorPaths>
<path>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>${lombok.version}</version>
</path>
</annotationProcessorPaths>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<phase>initialize</phase>
<goals>
<goal>unpack</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>nl.knaw.dans</groupId>
<artifactId>dd-dataverse-ingest-api</artifactId>
<version>${dd-dataverse-ingest-api.version}</version>
<outputDirectory>${project.build.directory}/openapi</outputDirectory>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.openapitools</groupId>
<artifactId>openapi-generator-maven-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>generate</goal>
</goals>
<configuration combine.children="override">
<generatorName>jaxrs-spec</generatorName>
<inputSpec>${project.build.directory}/openapi/dd-dataverse-ingest-api.yml</inputSpec>
<apiPackage>nl.knaw.dans.dvingest.resources</apiPackage>
<modelPackage>nl.knaw.dans.dvingest.api</modelPackage>
<invokerPackage>nl.knaw.dans.dvingest.resources</invokerPackage>
<templateDirectory>${project.basedir}/src/main/resources/openapi-generator-templates</templateDirectory>
</configuration>
</execution>
</executions>
</plugin>

</plugins>
</build>

Expand Down
Loading
Loading