Skip to content

Commit 35d2775

Browse files
committed
Adjust to arche-lib-ingest v3 and allow passing parameters from the command line
1 parent 4a86112 commit 35d2775

13 files changed

+367
-157
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
/composer.lock
22
/vendor
33
/config.yaml
4+
/nbproject

README.md

Lines changed: 63 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,82 @@
11
# A collection of ARCHE ingestion script templates
22

3-
## Usage
3+
The REST API provided by the ARCHE is quite a low-level from the point of view of real-world data ingestions.
4+
To make ingestions simpler, the [arche-lib-ingest](https://github.com/acdh-oeaw/arche-lib-ingest) library has been developed.
5+
While it provides a convenient high-level data ingestion API, it's still only a library which requires you to write your own ingestion script.
46

5-
* Clone this repository.
6-
* Run `composer update` in the repository root directory.
7-
* Fetch the ARCHE instance configuration by downloading `{ARCHE instance base URL}/desribe` (e.g. https://arche.acdh.oeaw.ac.at/api/describe) and save it as `config.yaml`.
8-
* Open and adjust the top section of a `*_sample.php` file of your choice:
9-
* set `$configLocation = './config.yaml';` and `$composerLocation = './';`
10-
* you can also set `$runComposerUpdate = false;` (as you have just did it)
11-
* adjust other options according to your preferences
12-
* Run the file, e.g. `php -f import_metadata_sample.php`.
13-
* Every script will ask you for credentials - you should get them from the ARCHE instance admin.
14-
* If you need to create yourself a user account please take a look at https://github.com/acdh-oeaw/arche-docker-config/blob/arche/initScripts/users.yaml.sample
7+
This repository is aimed at closing this gap - it provides a set of sample data ingestion scripts (using the arche-lib-ingest)
8+
which can be used by people with almost no programming skills.
159

16-
### Sample scripts provided
10+
## Sample scripts provided
1711

1812
* `add_metadata_sample.php` adds metadata triples specified in the ttl file preserving all existing metadata of repository resources
1913
* `delete_metadata_sample.php` removes metadata triples specified in the ttl file (but doesn't remove repository resources)
20-
* `delete_resource_sample.php` removes repository resources
14+
* `delete_resource_sample.php` removes a given repository resource
2115
* `import_binary_sample.php` imports binary data from the disk
22-
* `import_metadata_sample.php` imports metadata from a ttl file
16+
* `import_metadata_sample.php` imports metadata from an RDF file
2317
* `reimport_single_binary.php` reingests a single resource's binary content (to be used when file name and/or location changed)
2418

25-
## Instructions for the [email protected]
19+
## Reporting errors
2620

27-
Skip the instructions above.
21+
Create a subtask of the Redmine issue [#17641](https://redmine.acdh.oeaw.ac.at/issues/17641).
22+
23+
* Provide information on the exact location of the ingestion script location (including the script file itself) and any other information which may be required to replicated the problem.
24+
* Assign Mateusz and Norbert as watchers.
25+
26+
## Usage
27+
28+
There are two usage scenarios:
29+
30+
1. When you want to preserve the settings inside a file (e.g. as a documentation of the ingestion process).
31+
2. When you want to pass setting from the command line while running a given script (e.g. when you run it inside a CI/CD workflow or interactively).
32+
33+
In the first case:
34+
35+
* Clone this repository.
36+
* Run `composer update` in the repository root directory.
37+
* Optionally prepare a configuration file with a list of repositories you want to ingest against (see the `config-sample.yaml` file).
38+
* Choose the `*_sample.php` script you want to use, open it and adjust configuration settings in its top section.
39+
* Run the script with `php -f scriptOfYourChoice`, e.g. `php -f import_metadata_sample.php`.
2840

29-
Copy a current template from this directory into your collection import scripts directory.
30-
It will assure your ingestion script will be correct and up to date.
41+
In the second case:
3142

32-
Then adjust the settings at the top of a file (leave `$configLocation` and `$composerLocation` as they are) and run the file.
43+
* Run `composer require acdh-oeaw/arche-ingest`.
44+
* Choose the script you want to use out of
45+
`bin/arche-import-metadata` (a wrapper for `import_metadata_sample.php`),
46+
`bin/arche-import-binary` (a wrapper for `import_binary_sample.php`) and
47+
`bin/arche-delete-resource` (a wrapper for `delete_resource_sample.php`)
48+
and run it with `vendor/bin/scriptOfYourChoice -- parameters go here`, e.g.
49+
`vendor/bin/arche-import-metadata --concurrency 4 myRdf.ttl https://arche.acdh.oeaw.ac.at/api myLogin myPassword`
50+
* You can check required and optional parameters by running the script with the `-h` parameter, e.g.
51+
`vendor/bin/arche-import-metadata -h`
3352

34-
### Reporting errors
53+
### Running inside GitHub Actions
3554

36-
* Create a subtask of the Redmine issue [#17641](https://redmine.acdh.oeaw.ac.at/issues/17641).
37-
* Provide information on the exact location of the ingestion script location (including the script file itself) and any other information which may be required to replicated the problem.
38-
* Assign Mateusz and Norbert as watchers.
55+
Follow the second scenario described above.
3956

40-
### Running long tasks
57+
Do not store your ARCHE credentials in the workflow configuration file. Use repository secrets instead (see example below).
58+
59+
A fragment of your workflow;s yaml config may look like that:
60+
61+
```yaml
62+
- name: ingestion dependencies
63+
run: |
64+
composer require "acdh-oeaw/arche-ingest
65+
- name: ingest arche
66+
run: |
67+
vendor/bin/arche-import-metadata myRdfFile.ttl https://arche-dev.acdh-dev.oeaw.ac.at/api ${{secrets.ARCHE_LOGIN}} ${{secrets.ARCHE_PASSWORD}}
68+
```
69+
70+
### Runinng under [email protected]
71+
72+
Skip the instructions above.
73+
74+
Copy a current template from this directory into your collection import scripts directory
75+
and follow instructions for the "you want to save the settings inside the script" variant.
76+
77+
When adjusting settings at the top of a file leave `$configLocation` and `$composerLocation` as they are.
78+
79+
#### If the ingestions takes long
4180

4281
* Prepare a file with input data.
4382
In points below it's assumed this file name is `input` and it's stored in the same directory as the import script.
@@ -67,12 +106,3 @@ Then adjust the settings at the top of a file (leave `$configLocation` and `$com
67106
* Leave the `screen` session by hitting `CTRL+a` followed by `d`.
68107
You are now in the host shell and you can track the script execution progress with `tail -f log_file` (in the script's directory).
69108
* To go back to the shell where you script is running run `screen -r yourSessionName`.
70-
71-
## More info
72-
73-
The REST API provided by the ARCHE is quite a low-level from the point of view of real-world data ingestions.
74-
To make ingestions simpler, the [arche-lib-ingest](https://github.com/acdh-oeaw/arche-lib-ingest) library has been developed.
75-
While it provides a convenient high-level data ingestion API, it's still only a library which requires you to write your own ingestion script.
76-
77-
This repository is aimed at closing this gap - it provides a set of sample data ingestion scripts (using the arche-lib-ingest) which can be used by people with almost no programming skills.
78-

add_metadata_sample.php

100644100755
Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,46 @@
1+
#!/usr/bin/php
12
<?php
2-
// This script adds metadata triples to resources preserving all already existing triples
33

4+
// This script adds metadata triples to resources preserving all already existing triples
45
// config
56
$ttlFile = 'add_metadata_sample.ttl';
67

78
// advanced config (generally shouldn't need adjustments)
89
$configLocation = '/ARCHE/config.yaml';
9-
$composerLocation = '/ARCHE'; // directory where you run "composer update"
10+
$composerLocation = '/ARCHE'; // directory where you run "composer update"; if doesn't exist, the script's directory will be used instead
1011
$runComposerUpdate = true; // should `composer update` be run in $composerLocation dir (makes ingestion initialization longer but releases us from remembering about running `composer update` by hand)
11-
1212
// NO CHANGES NEEDED BELOW THIS LINE
1313

14-
if ($runComposerUpdate) {
14+
$composerLocation = file_exists($composerLocation) ? $composerLocation : __DIR__;
15+
if ($runComposerUpdate && count($argv) < 2) {
1516
echo "\n######################################################\nUpdating libraries\n######################################################\n";
16-
system('cd ' . escapeshellarg($composerLocation) . ' && composer update --no-dev');
17+
exec('cd ' . escapeshellarg($composerLocation) . ' && composer update --no-dev');
1718
echo "\n######################################################\nUpdate ended\n######################################################\n\n";
1819
}
1920

20-
use \EasyRdf\Graph;
21-
use \acdhOeaw\arche\lib\Repo;
22-
use \acdhOeaw\arche\lib\RepoResource;
23-
require_once $composerLocation . '/vendor/autoload.php';
21+
use EasyRdf\Graph;
22+
use acdhOeaw\arche\lib\Repo;
23+
use acdhOeaw\arche\lib\RepoResource;
24+
require_once "$composerLocation/vendor/autoload.php";
2425

25-
$cfg = json_decode(json_encode(yaml_parse_file($configLocation)));
26-
$graph = new Graph();
26+
$graph = new Graph();
2727
$graph->parseFile($ttlFile);
28-
$repo = Repo::factoryInteractive($configLocation);
28+
$repo = Repo::factoryInteractive(empty($configLocation) ? null : $configLocation);
2929

3030
foreach ($graph->resources() as $r) {
3131
if (count($r->propertyUris()) > 0) {
3232
echo "Adding metadata to " . $r->getUri() . "\n";
3333
$repo->begin();
3434
try {
35-
$res = $repo->getResourceById($r->getUri());
35+
$res = $repo->getResourceById($r->getUri());
3636
$meta = $res->getMetadata();
3737
foreach ($r->propertyUris() as $p) {
3838
foreach ($r->all($p) as $v) {
3939
$meta->add($p, $v);
4040
}
4141
}
4242
$res->setMetadata($meta);
43-
$res->updateMetadata(RepoResource::UPDATE_OVERWRITE);
43+
$res->updateMetadata(RepoResource::UPDATE_OVERWRITE);
4444

4545
$repo->commit();
4646
} catch (Exception $e) {

bin/arche-delete-resource

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
BDIR=`dirname $0`
3+
BDIR=`realpath "$BDIR"`
4+
5+
if [ "$#" == "0" ] ; then
6+
php -f "$BDIR/../delete_resource_sample.php" -- -h
7+
else
8+
php -f "$BDIR/../delete_resource_sample.php" -- $@
9+
fi

bin/arche-import-binary

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
BDIR=`dirname $0`
3+
BDIR=`realpath "$BDIR"`
4+
5+
if [ "$#" == "0" ] ; then
6+
php -f "$BDIR/../import_binary_sample.php" -- -h
7+
else
8+
php -f "$BDIR/../import_binary_sample.php" -- $@
9+
fi

bin/arche-import-metadata

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
BDIR=`dirname $0`
3+
BDIR=`realpath "$BDIR"`
4+
5+
if [ "$#" == "0" ] ; then
6+
php -f "$BDIR/../import_metadata_sample.php" -- -h
7+
else
8+
php -f "$BDIR/../import_metadata_sample.php" -- $@
9+
fi

composer.json

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,26 @@
11
{
2+
"name": "acdh-oeaw/arche-ingest",
3+
"type": "library",
4+
"description": "A set of sample ARCHE ingestion scripts",
5+
"keywords": [],
6+
"homepage": "https://github.com/acdh-oeaw/arche-ingest",
7+
"license": "MIT",
8+
"authors": [
9+
{
10+
"name": "Mateusz Żółtak",
11+
"email": "[email protected]"
12+
}
13+
],
214
"require": {
3-
"acdh-oeaw/arche-lib-ingest": "^2.0.0-alpha",
4-
"acdh-oeaw/arche-lib": "^2.0.0-alpha"
15+
"acdh-oeaw/arche-lib-ingest": "^3.1",
16+
"zozlak/argparse": "^1"
517
},
618
"require-dev": {
719
"phpstan/phpstan": "*"
8-
}
20+
},
21+
"bin": [
22+
"bin/arche-import-metadata",
23+
"bin/arche-import-binary",
24+
"bin/arche-delete-resource"
25+
]
926
}

config-sample.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
repositories:
2+
- urlBase: https://arche-dev.acdh-dev.oeaw.ac.at
3+
pathBase: /api/
4+
- urlBase: https://arche-curation.acdh-dev.oeaw.ac.at
5+
pathBase: /api/
6+
- urlBase: https://arche.acdh.oeaw.ac.at
7+
pathBase: /api/

delete_metadata_sample.php

100644100755
Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,55 @@
1+
#!/usr/bin/php
12
<?php
2-
// This script removes metadata properties without deleting repository resources
33

4+
// This script removes metadata properties without deleting repository resources
45
// config
56
$ttlFile = 'delete_metadata_sample.ttl';
67

78
// advanced config (generally shouldn't need adjustments)
89
$configLocation = '/ARCHE/config.yaml';
9-
$composerLocation = '/ARCHE'; // directory where you run "composer update"
10+
$composerLocation = '/ARCHE'; // directory where you run "composer update"; if doesn't exist, the script's directory will be used instead
1011
$runComposerUpdate = true; // should `composer update` be run in $composerLocation dir (makes ingestion initialization longer but releases us from remembering about running `composer update` by hand)
11-
1212
// NO CHANGES NEEDED BELOW THIS LINE
1313

14-
if ($runComposerUpdate) {
14+
$composerLocation = file_exists($composerLocation) ? $composerLocation : __DIR__;
15+
if ($runComposerUpdate && count($argv) < 2) {
1516
echo "\n######################################################\nUpdating libraries\n######################################################\n";
16-
system('cd ' . escapeshellarg($composerLocation) . ' && composer update --no-dev');
17+
exec('cd ' . escapeshellarg($composerLocation) . ' && composer update --no-dev');
1718
echo "\n######################################################\nUpdate ended\n######################################################\n\n";
1819
}
1920

20-
use \EasyRdf\Graph;
21-
use \acdhOeaw\arche\lib\Repo;
22-
use \acdhOeaw\arche\lib\RepoResource;
23-
require_once $composerLocation . '/vendor/autoload.php';
21+
use EasyRdf\Graph;
22+
use acdhOeaw\arche\lib\Repo;
23+
use acdhOeaw\arche\lib\RepoResource;
24+
require_once "$composerLocation/vendor/autoload.php";
2425

25-
$cfg = json_decode(json_encode(yaml_parse_file($configLocation)));
26-
$graph = new Graph();
26+
$graph = new Graph();
2727
$graph->parseFile($ttlFile);
28-
$repo = Repo::factoryInteractive($configLocation);
28+
$repo = Repo::factoryInteractive(empty($configLocation) ? null : $configLocation);
2929

3030
foreach ($graph->resources() as $r) {
3131
if (count($r->propertyUris()) > 0) {
3232
echo "Removing metadata from " . $r->getUri() . "\n";
3333
$repo->begin();
3434
try {
35-
$res = $repo->getResourceById($r->getUri());
35+
$res = $repo->getResourceById($r->getUri());
3636
$meta = $res->getMetadata();
3737
foreach ($r->propertyUris() as $p) {
3838
foreach ($r->all($p) as $v) {
3939
$dtLang = '';
4040
if ($v instanceof \EasyRdf\Literal) {
4141
$dtLang = !empty($v->getLang()) ? '@' . $v->getLang() : '';
4242
$dtLang .= !empty($v->getDatatype()) ? '^^' . $v->getDatatype() : '';
43-
} elseif($p !== $repo->getSchema()->id) {
43+
} elseif ($p !== $repo->getSchema()->id) {
4444
$dr = $repo->getResourceById($v->getUri());
45-
$v = $meta->getGraph()->resource($dr->getUri());
45+
$v = $meta->getGraph()->resource($dr->getUri());
4646
}
4747
echo "\tremoving $p " . (string) $v . $dtLang . "\n";
4848
$meta->delete($p, $v);
4949
}
5050
}
5151
$res->setMetadata($meta);
52-
$res->updateMetadata(RepoResource::UPDATE_OVERWRITE);
52+
$res->updateMetadata(RepoResource::UPDATE_OVERWRITE);
5353

5454
$repo->commit();
5555
} catch (Exception $e) {

0 commit comments

Comments
 (0)