Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose band resolution metadata at dataset level #1683

Open
robbibt opened this issue Dec 9, 2024 · 2 comments
Open

Expose band resolution metadata at dataset level #1683

robbibt opened this issue Dec 9, 2024 · 2 comments

Comments

@robbibt
Copy link
Contributor

robbibt commented Dec 9, 2024

Issue

It would be extremely useful to be able to easily obtain information about the resolution of each satellite band/measurement in a datacube dataset, particularly for products like Sentinel-2 which can contain bands with many resolutions (e.g. 10m, 20m, 60m).

However, this information is currently difficult to obtain. To identify the resolution of a measurement, a user is forced to cross-reference the grid listed against a measurement listed by dss.measurements, against the list of "grids" in the dataset (dss.metadata_doc["grids"]), handling cases where a measurement uses the default grid. This is excessively complex.

Suggested feature

Add an automatically calculated resolution or gsd key to the dictionary returned by dss.measurements. For example, instead of:

>>> dss.measurements

{'oa_fmask': {'grid': 'g20m', 'path': ...},
'nbart_red': {'path': ...}, 
'oa_s2cloudless_prob': {'grid': 'g60m', 'path': ...}}

Do this:

{'oa_fmask': {'resolution': 20, 'grid': 'g20m', 'path': ...},
'nbart_red': {'resolution': 10, 'path': ...},
'oa_s2cloudless_prob': {'resolution': 60, 'grid': 'g60m', 'path': ...}}
@robbibt
Copy link
Contributor Author

robbibt commented Dec 9, 2024

Some example code that might be helpful:

# Load a single dataset
dss = dc.find_datasets(product="ga_s2am_ard_3", limit=1)[0]

# Extract grids used across dataset, and resolution from grid transform
grid_dict = {k:int(v["transform"][0]) for k, v in dss.metadata_doc["grids"].items()}

# For each band, cross-reference to grid dataset, using "default" grid if not available 
band_resolutions = []
for band_name in product_df.name:
    grid_name = dss.measurements[band_name].get("grid", "default")
    band_resolutions.append(grid_dict[grid_name])
band_resolutions

@Kirill888
Copy link
Member

Just wanted to add that:

  1. only EO3 datasets have that information
  2. this should really be Product level concern, but data model doesn't require all datasets to have the same resolution for the same band, so best you can get are load hints, but these are not defined per-band.
  3. Python side of metadata classes is lacking in usability
    • no html repr for work in the notebook
    • awkward constructor API for Dataset class (product as a first parameter, even though it CAN be auto-detected from the dataset metadata, at least in EO3 format)
    • lack of sanity checks beyond basic json schema
    • lack of reasonable interrogation methods (give me URL for band X for example)

And the thing is - we already HAVE html representation of dataset in the explorer. But to be honest with STAC this becomes less and less useful and less and less likely to be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants