-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add directory crawler populator to yield STAC Collections and Items + CLI utility #31
Conversation
…+ working STAC Collection/Items dir iter loading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to test this PR locally but having problems running the host server.
make complains about an existing pyessv-archive directory:
git clone "https://github.com/ES-DOC/pyessv-archive" ~/.esdoc/pyessv-archive
fatal: destination path '/home/david/.esdoc/pyessv-archive' already exists and is not an empty directory.
make: *** [Makefile:21: setup-pyessv-archive] Error 128
and if I go into the docker directory and do docker compose up
, I'm getting errors like:
(stac) david@it-282:~/src/stac-populator/docker$ docker compose up
[+] Running 2/1
✔ Network docker_default Created 0.1s
✔ Volume "docker_stac-db" Created 0.0s
⠋ Container stac-populator-test-db Creating 0.0s
Error response from daemon: Conflict. The container name "/stac-populator-test-db" is already in use by container "f701152d5d735273fc26e499b75a2bf00de839e08d18ef8f3a12c85efb5b7fda". You have to remove (or rename) that container to be able to reuse that name
I tried to remove that container, but then I get the same error with another hash.
@huard If you delete the directory (recursive) before |
Does adding |
Co-authored-by: David Huard <[email protected]>
I think I'd prefer something like this (not sure it's actually working):
Not sure I understand where to put the --rm. |
I think something similar could work.
My bad. I confused with |
I think what was needed was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to run the ingestion of the eurosat entries.
## Overview Provide a way to host local data that STAC API can refer to for use/download. Currently, any STAC Asset that is referenced within responses by STAC-API Collections/Items must either be already hosted by another service of the stack (eg: CMIP6 netCDF in THREDDS), or point at some other external resource not on the server. Instead of having a custom config and mount point for each node, this optional component defines a standard way to define it. ## Changes **Non-breaking changes** - `optional-components/stac-data-proxy`: add a new feature to allow hosting of local STAC assets. The new component defines variables `STAC_DATA_PROXY_DIR_PATH` (default `${DATA_PERSIST_ROOT}/stac-data`) and `STAC_DATA_PROXY_URL_PATH` (default `/data/stac`) that are aliased (mapped) under `nginx` to provide a URL where locally hosted STAC assets can be downloaded from. This allows a server node to be a proper data provider, where its STAC-API can return Catalog, Collection and Item definitions that points at these local assets available through the `STAC_DATA_PROXY_URL_PATH` endpoint. When enabled, this component can be combined with `optional-components/secure-data-proxy` to allow per-resource access control of the contents under `STAC_DATA_PROXY_DIR_PATH` by setting relevant Magpie permissions under service `secure-data-proxy` for children resources that correspond to `STAC_DATA_PROXY_URL_PATH`. Otherwise, the path and all of its contents are publicly available, in the same fashion that WPS outputs are managed without `optional-components/secure-data-proxy`. More details are provided in https://github.com/bird-house/birdhouse-deploy/blob/stac-data-proxy/birdhouse/optional-components/README.rst#provide-a-proxy-for-local-stac-asset-hosting **Breaking changes** - n/a ## Related Issue / Discussion - Relates to crim-ca/stac-populator#31 - Relates to contents in https://github.com/ai-extensions/stac-data-loader/tree/main/data/EuroSAT/stac - Relates to https://github.com/ai-extensions/stac-data-loader/blob/main/notebooks/stac_eurosat.ipynb STAC metadata generated from above notebook (see subset for example), will be able to use a location such as `https://${PAVICS_FQDN_PUBLIC}${STAC_DATA_PROXY_URL_PATH}/EuroSAT/...` instead of the temporary raw-GitHub content URLs. The STAC populator (with `DirectoryLoading` implementation), will be able to push the STAC Collection/Items toward that instances. The STAC Assets that they refer to will be placed under `${STAC_DATA_PROXY_DIR_PATH}/EuroSAT` to make them accessible externally.
Changes
session
keyword to all request-related functions and populator methods to allow sharing a common setof settings (
auth
, SSLverify
,cert
) across requests toward the STAC Catalog.DirectoryLoader
that allows populating a STAC Catalog with Collections and Items loaded from a crawled directoryhierarchy that contains
collection.json
files and other.json
/.geojson
items.stac-populator
that can be called to run populator implementations directlyusing command
stac-populator run <implementation> [impl-args]
.verify=False
to requests calls.If needed for testing purposes, users should use a custom
requests.sessions.Session
withverify=False
passed tothe populator, or alternatively, employ the CLI argument
--no-verify
that will accomplish the same behavior.Testing
Run the following commands:
Results for reference: