Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge NPT into GPT; Release GPT. #5

Open
2 tasks
chbrandt opened this issue May 11, 2023 · 1 comment
Open
2 tasks

Merge NPT into GPT; Release GPT. #5

chbrandt opened this issue May 11, 2023 · 1 comment
Assignees
Milestone

Comments

@chbrandt
Copy link
Owner

chbrandt commented May 11, 2023

NPT developed in parallel and diverged from GPT. It has a cleaner interface for ODE and is stable in the data reduction pipeline. The specific code blocks to be merged are not clear yet.

Tasks:

  • test npt
  • select the code blocks
    • isissh/sh
    • mosaic
@chbrandt chbrandt self-assigned this May 12, 2023
@chbrandt chbrandt added this to the Release v1 milestone May 12, 2023
@chbrandt
Copy link
Owner Author

chbrandt commented May 12, 2023

Refactor API

The idea is to a more OO interface focused on data stores, managing data products and data sets in and out those stores.

The primary functionality of a data store is a "search" function. Ultimately we want to get data products to we can analyse however necessary. To get products "X, Y, Z", we need first to know about their existence; Hence, the "search" function.
Data stores will organize their products in datasets; Datasets don't have to be searched, rather listed directly.

Back in the days, this library was developed from the store, then went down to the (data) products.
Today, let's start from the product and then move up to the store.
The reason is to give more attention to products' functionalities.

Data Store

A (spatial) data product is composed by at least one data file, besides the metadata.
In the metadata attributes, geometry is always present; The geometry may be a simple "Point", or a "Multi-Polygon", and anything in between.

The structure of the data product -- i.e., type and quantity of files and metadata schema -- varies from dataset to dataset.
The structure, definition linking to ancillary data is done by the metadata fields.
Data products in the same dataset are expected to share the same structure.

How to handle the metadata/data set is the task of the data store, which defines methods/actions to manage the product(s)

In terms of implementation, we have a Data Store, with one or more Datasets, with one or more Data Products.

  • Data stores are responsible for CRUD operations, although not all operations are available to every store, some stores are read-only (eg, USGS ODE). The concept of update per se is different: you don't update a data products but you transform it into another product into another dataset.
  • Datasets are defined under data-stores, we can copy datasets between data-stores. This is what you do when downloading data from a remote data store (eg, USGS ODE) to the local filesytem -- through the local data store.
  • Data products are composed by a set of metadata information, and typically one of more ancillary data files. In some cases, there are no data files or databases and the whole of data fits in the "metadata" table (eg, many EPN-VESPA resources). There is always, though, spatial information (geometry, bounding-box, coordinates) associated to each product (either directly in the metadata or an accompanying, eg, shapefile).

Methods data stores should implement:

  • download: download data from a web server to a local store (filesystem).
  • load: read from a local resource (disk, nfs, webdav).
  • transform: process the data/metadata into another product (eg, data reduction, feature extraction).
  • write: write to a local resource

The writing method is implicit in any data move, for instance, when downloading or transforming data.
We can call those methods "actions".

Let's go through a typical workflow when handling images from NASA planetary remote-sensing missions.

Let's consider the Mars Reconnaissance Orbiter (MRO) Context camera's (CTX) Experiment Data Record (EDR) dataset.
NASA Planetary Data System (PDS) provides the Orbital Data Explorer (ODE) interface to access data from different planets and satellites (eg, Mars, the Moon).
ODE provides a REST interface for programmatic access, which is an example of a read-only data store.

ODE data products are usually composed by multiple ancillary files: images, shapefiles, other/more metadata

  1. Search ODE REST for Mars images in MRO/CTX/EDR dataset. If successful, receives a JSON payload of results with metadata about the dataset and for each data product found.
  2. Download browse (.JPG) and data (.IMG) files associated with each data product.
    • Save metadata of each data product next to the related files (as a JSON document).
  3. (For each product) Transform IMG image into a light/space calibrated GeoTiFF image.
    • Save the new (.cog) image in a new directory, part of a new "Science-Ready" dataset.
    • Save a new preview (browse) image from the new data image.
    • Save the new metadata set -- an updated version of "EDR" -- next to the related files.

In this workflow we are working with two data stores, "ODE" and "Local"; And two datasets, "EDR" and "Science-Ready".
The communication with the data stores -- and actions taken upon the data products -- is done through a handlers.

Data store interface

import api

List of available data stores:

stores = api.datastores.list()    # list of available data stores
print(stores)
['ode']

Connect to a data store:

ode_ds = api.datastores.connect('ode')
ode_ds.info()
# (information about ODE)

List datasets:

datasets = ode_ds.datasets.list('mars')    # list of available datasets. ODE demands a target body
print(datasets)
[...
mro/ctx/edr
mro/crism/trdr
...]

Create a handler for CTX (EDR):

ctx_edr = ode_ds.dataset('mars', 'mro/ctx/edr')
ctx_edr.info()
# (information about mro/ctx/edr)

Search CTX data products:

products = ctx_edr.search(*args, **kwargs)    # return table (geodataframe) with matching results/products
print(products)
# print products metadata

Download CTX products:

# Create a local data store
local_ds = api.datastores.connect('./data')

ctx_images = products.ds.download(local_ds, assets='image')

At this point, "images" from the CTX/EDR dataset are downloaded to the local data store at ./data.
The local data store will write the downloaded images together with any mandatory asset/ancillary file under ode/mro/ctx/edr (under ./data).
The metadata associated to each product -- updated accordingly to point to the local filesystem instead of the
remote (ode) data store -- is written next to the image.
The same metadata set, containing all data products of the respective dataset (ie, MRO/CTX/EDR), is merge into
the dataset's global table, as the dataset index.

Suppose that when we did the search we got one product as result, product "XYZ", with an associated image "XYZ.IMG".
The structure of the local data store at this point is:

./data/
  `- ode/
    `- ctx/
      |- index.csv
      `- products/
        |- XYZ.IMG
        `- XYZ.json

  • :: "List available data-stores" --> List (list of set/available data-stores)
  • Data-Store-X :: "list datasets" --> List (list of datasets' unique ids)
  • Data-Store-X :: Dataset-Y :: "describe" --> JSON/Series (dataset metadata)
  • Data-Store-X :: Dataset-Y :: "search data products" --> JSON-Array/Table (data products metadata)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant