-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added link to the THREDDS Catalog to the Collection object. #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to include the following extension as well?
@huard |
|
||
link = pystac.Link(rel="source", | ||
target=url, | ||
media_type="text/html", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the STAC Item that represents a NetCDF, source
should probably refer to application/x-netcdf
with an URL of the actual file? This URL should be indicated at the top level of the NCML file as @location
using Dataset.to_cf_dict
.
If this change is done, instead of looking for /catalog/
in the URL, the /dodsC/
must be considered.
This code can be reused though, to inject source
in the STAC Collection, which would be a XML THREDDS catalog URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found that the NCML has a id
attribute in THREDDSMetadata
with value birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc
.
This seems more reliable than parsing the URL ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The href is not the URL of the actual netCDF file, but of its catalog entry, so not sure the mime should be application/netcdf. The netcdf link is listed among the Assets.
- Yes, I'm using that ID already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When loading in xncml.Dataset
, the "@location"
in returned attributes directly at the root is the NetCDF file URL.
This is the same location
attribute directly at the top of:
https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/ncml/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc?catalog=https%3A%2F%2Fpavics.ouranos.ca%2Ftwitcher%2Fows%2Fproxy%2Fthredds%2Fcatalog%2Fbirdhouse%2Ftestdata%2Fxclim%2Fcmip6%2Fcatalog.html&dataset=birdhouse%2Ftestdata%2Fxclim%2Fcmip6%2Fsic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc
The link actually returned uses dodsC
service, so the response from THREDDS (https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc) that I get is in fact a 400 HTML 😞
What we could do is employ the fileServer
endpoint instead, where that would be the real source
in application/x-netcdf
(https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/testdata/xclim/cmip6/sic_SImon_CCCma-CanESM5_ssp245_r13i1p2f1_2020.nc).
I think the NCML XML href and the THREDDS Catalog HTML href would be relevant in links
as rel=describedby
and rel=alternate
respectively. The NCML href could also use convertedfrom
actually!
The rel=collection
href could also be inserted since you have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/crim-ca/ncml2stac/blob/main/notebooks/ncml2stac.ipynb contains the updated item_link
implementation using @location
attribute
STACpopulator/input.py
Outdated
attrs["access_urls"] = ds.access_urls | ||
|
||
attrs["catalog_url"] = self.catalog_head.catalog_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, ds.access_urls
causes an AttributeError
on my side. Do you have an idea why that could be?
When I skip this step to avoid the error, my resulting STAC Item JSON (see bottom of output in https://github.com/crim-ca/ncml2stac/pull/3/files#diff-09442021b34927015b7a3703f6ed65611059a807c957feb4dfe2cd9955ccd68e) has the assets references, but the names are different:
"assets": {
"httpserver_service": { ... },
"opendap_service": { ... },
...
}
Seems to be caused by this logic: https://github.com/crim-ca/stac-populator/pull/23/files#diff-ea68b4da6eb5897aa6022852045603d912245daf3f0b044cb9abbd176e758455R196-R199
Should (?) be fixed via #25. |
No description provided.