|
1 | 1 | # A collection of ARCHE ingestion script templates
|
2 | 2 |
|
3 |
| -## Usage |
| 3 | +The REST API provided by the ARCHE is quite a low-level from the point of view of real-world data ingestions. |
| 4 | +To make ingestions simpler, the [arche-lib-ingest](https://github.com/acdh-oeaw/arche-lib-ingest) library has been developed. |
| 5 | +While it provides a convenient high-level data ingestion API, it's still only a library which requires you to write your own ingestion script. |
4 | 6 |
|
5 |
| -* Clone this repository. |
6 |
| -* Run `composer update` in the repository root directory. |
7 |
| -* Fetch the ARCHE instance configuration by downloading `{ARCHE instance base URL}/desribe` (e.g. https://arche.acdh.oeaw.ac.at/api/describe) and save it as `config.yaml`. |
8 |
| -* Open and adjust the top section of a `*_sample.php` file of your choice: |
9 |
| - * set `$configLocation = './config.yaml';` and `$composerLocation = './';` |
10 |
| - * you can also set `$runComposerUpdate = false;` (as you have just did it) |
11 |
| - * adjust other options according to your preferences |
12 |
| -* Run the file, e.g. `php -f import_metadata_sample.php`. |
13 |
| - * Every script will ask you for credentials - you should get them from the ARCHE instance admin. |
14 |
| - * If you need to create yourself a user account please take a look at https://github.com/acdh-oeaw/arche-docker-config/blob/arche/initScripts/users.yaml.sample |
| 7 | +This repository is aimed at closing this gap - it provides a set of sample data ingestion scripts (using the arche-lib-ingest) |
| 8 | +which can be used by people with almost no programming skills. |
15 | 9 |
|
16 |
| -### Sample scripts provided |
| 10 | +## Sample scripts provided |
17 | 11 |
|
18 | 12 | * `add_metadata_sample.php` adds metadata triples specified in the ttl file preserving all existing metadata of repository resources
|
19 | 13 | * `delete_metadata_sample.php` removes metadata triples specified in the ttl file (but doesn't remove repository resources)
|
20 |
| -* `delete_resource_sample.php` removes repository resources |
| 14 | +* `delete_resource_sample.php` removes a given repository resource |
21 | 15 | * `import_binary_sample.php` imports binary data from the disk
|
22 |
| -* `import_metadata_sample.php` imports metadata from a ttl file |
| 16 | +* `import_metadata_sample.php` imports metadata from an RDF file |
23 | 17 | * `reimport_single_binary.php` reingests a single resource's binary content (to be used when file name and/or location changed)
|
24 | 18 |
|
25 |
| -## Instructions for the [email protected] |
| 19 | +## Reporting errors |
26 | 20 |
|
27 |
| -Skip the instructions above. |
| 21 | +Create a subtask of the Redmine issue [#17641](https://redmine.acdh.oeaw.ac.at/issues/17641). |
| 22 | + |
| 23 | +* Provide information on the exact location of the ingestion script location (including the script file itself) and any other information which may be required to replicated the problem. |
| 24 | +* Assign Mateusz and Norbert as watchers. |
| 25 | + |
| 26 | +## Usage |
| 27 | + |
| 28 | +There are two usage scenarios: |
| 29 | + |
| 30 | +1. When you want to preserve the settings inside a file (e.g. as a documentation of the ingestion process). |
| 31 | +2. When you want to pass setting from the command line while running a given script (e.g. when you run it inside a CI/CD workflow or interactively). |
| 32 | + |
| 33 | +In the first case: |
| 34 | + |
| 35 | +* Clone this repository. |
| 36 | +* Run `composer update` in the repository root directory. |
| 37 | +* Optionally prepare a configuration file with a list of repositories you want to ingest against (see the `config-sample.yaml` file). |
| 38 | +* Choose the `*_sample.php` script you want to use, open it and adjust configuration settings in its top section. |
| 39 | +* Run the script with `php -f scriptOfYourChoice`, e.g. `php -f import_metadata_sample.php`. |
28 | 40 |
|
29 |
| -Copy a current template from this directory into your collection import scripts directory. |
30 |
| -It will assure your ingestion script will be correct and up to date. |
| 41 | +In the second case: |
31 | 42 |
|
32 |
| -Then adjust the settings at the top of a file (leave `$configLocation` and `$composerLocation` as they are) and run the file. |
| 43 | +* Run `composer require acdh-oeaw/arche-ingest`. |
| 44 | +* Choose the script you want to use out of |
| 45 | + `bin/arche-import-metadata` (a wrapper for `import_metadata_sample.php`), |
| 46 | + `bin/arche-import-binary` (a wrapper for `import_binary_sample.php`) and |
| 47 | + `bin/arche-delete-resource` (a wrapper for `delete_resource_sample.php`) |
| 48 | + and run it with `vendor/bin/scriptOfYourChoice -- parameters go here`, e.g. |
| 49 | + `vendor/bin/arche-import-metadata --concurrency 4 myRdf.ttl https://arche.acdh.oeaw.ac.at/api myLogin myPassword` |
| 50 | + * You can check required and optional parameters by running the script with the `-h` parameter, e.g. |
| 51 | + `vendor/bin/arche-import-metadata -h` |
33 | 52 |
|
34 |
| -### Reporting errors |
| 53 | +### Running inside GitHub Actions |
35 | 54 |
|
36 |
| -* Create a subtask of the Redmine issue [#17641](https://redmine.acdh.oeaw.ac.at/issues/17641). |
37 |
| - * Provide information on the exact location of the ingestion script location (including the script file itself) and any other information which may be required to replicated the problem. |
38 |
| - * Assign Mateusz and Norbert as watchers. |
| 55 | +Follow the second scenario described above. |
39 | 56 |
|
40 |
| -### Running long tasks |
| 57 | +Do not store your ARCHE credentials in the workflow configuration file. Use repository secrets instead (see example below). |
| 58 | + |
| 59 | +A fragment of your workflow;s yaml config may look like that: |
| 60 | + |
| 61 | +```yaml |
| 62 | + - name: ingestion dependencies |
| 63 | + run: | |
| 64 | + composer require "acdh-oeaw/arche-ingest |
| 65 | + - name: ingest arche |
| 66 | + run: | |
| 67 | + vendor/bin/arche-import-metadata myRdfFile.ttl https://arche-dev.acdh-dev.oeaw.ac.at/api ${{secrets.ARCHE_LOGIN}} ${{secrets.ARCHE_PASSWORD}} |
| 68 | +``` |
| 69 | +
|
| 70 | +### Runinng under [email protected] |
| 71 | +
|
| 72 | +Skip the instructions above. |
| 73 | +
|
| 74 | +Copy a current template from this directory into your collection import scripts directory |
| 75 | +and follow instructions for the "you want to save the settings inside the script" variant. |
| 76 | +
|
| 77 | +When adjusting settings at the top of a file leave `$configLocation` and `$composerLocation` as they are. |
| 78 | + |
| 79 | +#### If the ingestions takes long |
41 | 80 |
|
42 | 81 | * Prepare a file with input data.
|
43 | 82 | In points below it's assumed this file name is `input` and it's stored in the same directory as the import script.
|
@@ -67,12 +106,3 @@ Then adjust the settings at the top of a file (leave `$configLocation` and `$com
|
67 | 106 | * Leave the `screen` session by hitting `CTRL+a` followed by `d`.
|
68 | 107 | You are now in the host shell and you can track the script execution progress with `tail -f log_file` (in the script's directory).
|
69 | 108 | * To go back to the shell where you script is running run `screen -r yourSessionName`.
|
70 |
| - |
71 |
| -## More info |
72 |
| - |
73 |
| -The REST API provided by the ARCHE is quite a low-level from the point of view of real-world data ingestions. |
74 |
| -To make ingestions simpler, the [arche-lib-ingest](https://github.com/acdh-oeaw/arche-lib-ingest) library has been developed. |
75 |
| -While it provides a convenient high-level data ingestion API, it's still only a library which requires you to write your own ingestion script. |
76 |
| - |
77 |
| -This repository is aimed at closing this gap - it provides a set of sample data ingestion scripts (using the arche-lib-ingest) which can be used by people with almost no programming skills. |
78 |
| - |
|
0 commit comments