Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added link to the THREDDS Catalog to the Collection object. #23

Closed
wants to merge 5 commits into from

Conversation

huard
Copy link
Collaborator

@huard huard commented Sep 28, 2023

No description provided.

@huard huard requested a review from Nazim-crim September 28, 2023 20:09
Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include the following extension as well?

STACpopulator/stac_utils.py Outdated Show resolved Hide resolved
@fmigneault
Copy link
Collaborator

fmigneault commented Sep 29, 2023

@huard
Looks good.
Another issue I faced trying to port changes into https://github.com/crim-ca/ncml2stac is that https://github.com/Ouranosinc/stac-populator/tree/collection_link/implementations is not under the source directory, so it cannot be imported by other utilities.


link = pystac.Link(rel="source",
target=url,
media_type="text/html",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of the STAC Item that represents a NetCDF, source should probably refer to application/x-netcdf with an URL of the actual file? This URL should be indicated at the top level of the NCML file as @location using Dataset.to_cf_dict.

If this change is done, instead of looking for /catalog/ in the URL, the /dodsC/ must be considered.

This code can be reused though, to inject source in the STAC Collection, which would be a XML THREDDS catalog URL.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found that the NCML has a id attribute in THREDDSMetadata with value birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc.
This seems more reliable than parsing the URL ourselves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The href is not the URL of the actual netCDF file, but of its catalog entry, so not sure the mime should be application/netcdf. The netcdf link is listed among the Assets.
  2. Yes, I'm using that ID already.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When loading in xncml.Dataset, the "@location" in returned attributes directly at the root is the NetCDF file URL.

This is the same location attribute directly at the top of:
https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/ncml/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc?catalog=https%3A%2F%2Fpavics.ouranos.ca%2Ftwitcher%2Fows%2Fproxy%2Fthredds%2Fcatalog%2Fbirdhouse%2Ftestdata%2Fxclim%2Fcmip6%2Fcatalog.html&dataset=birdhouse%2Ftestdata%2Fxclim%2Fcmip6%2Fsic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc

The link actually returned uses dodsC service, so the response from THREDDS (https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc) that I get is in fact a 400 HTML 😞

What we could do is employ the fileServer endpoint instead, where that would be the real source in application/x-netcdf (https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc).

I think the NCML XML href and the THREDDS Catalog HTML href would be relevant in links as rel=describedby and rel=alternate respectively. The NCML href could also use convertedfrom actually!

The rel=collection href could also be inserted since you have it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/crim-ca/ncml2stac/blob/main/notebooks/ncml2stac.ipynb contains the updated item_link implementation using @location attribute

Comment on lines 106 to 107
attrs["access_urls"] = ds.access_urls

attrs["catalog_url"] = self.catalog_head.catalog_url
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huard

For some reason, ds.access_urls causes an AttributeError on my side. Do you have an idea why that could be?

When I skip this step to avoid the error, my resulting STAC Item JSON (see bottom of output in https://github.com/crim-ca/ncml2stac/pull/3/files#diff-09442021b34927015b7a3703f6ed65611059a807c957feb4dfe2cd9955ccd68e) has the assets references, but the names are different:

  "assets": {
    "httpserver_service": { ... },
    "opendap_service": { ... },
    ...
  }

Seems to be caused by this logic: https://github.com/crim-ca/stac-populator/pull/23/files#diff-ea68b4da6eb5897aa6022852045603d912245daf3f0b044cb9abbd176e758455R196-R199

@fmigneault
Copy link
Collaborator

Should (?) be fixed via #25.

@fmigneault fmigneault closed this Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants