Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially incorrect representation of variables in data cube extension #52

Closed
dchandan opened this issue Feb 23, 2024 · 8 comments · Fixed by #54
Closed

Potentially incorrect representation of variables in data cube extension #52

dchandan opened this issue Feb 23, 2024 · 8 comments · Fixed by #54

Comments

@dchandan
Copy link
Collaborator

Consider the example at: https://redoak.cs.toronto.edu/stac/collections/CMIP6_UofT/items/CMIP_EC-Earth-Consortium_EC-Earth3_historical_r21i1p1f1_Amon_clt_gr

The cube variables listed are:

"cube:variables": {
      "clt": {
        "type": "data",
        "unit": "%",
        "dimensions": [
          "time",
          "lat",
          "lon"
        ],
        "description": "Total Cloud Fraction"
      },
      "lat_bnds": {
        "type": "data",
        "unit": "",
        "dimensions": [
          "lat",
          "bnds"
        ],
        "description": ""
      },
      "lon_bnds": {
        "type": "data",
        "unit": "",
        "dimensions": [
          "lon",
          "bnds"
        ],
        "description": ""
      },
      "time_bnds": {
        "type": "data",
        "unit": "days since 1850-01-01",
        "dimensions": [
          "time",
          "bnds"
        ],
        "description": ""
      }
    },

But, I see two problems:

  1. Variables like time, lat, lon are missing. I think this has to do with these lines. I don't think this is correct.
  2. Shouldn't all the bounds variables be listed as auxiliary variables (as per CF terminology) rather than data variables? Fixes to datacube extension helper #51 partly address this.

@huard you wrote the data cube extension helper codes, what are your thoughts on this?

@huard
Copy link
Collaborator

huard commented Feb 26, 2024

CF-xarray parses the file as:

Coordinates:
             CF Axes: * X: ['lon']
                      * Y: ['lat']
                      * T: ['time']
                        Z: n/a

      CF Coordinates: * longitude: ['lon']
                      * latitude: ['lat']
                      * time: ['time']
                        vertical: n/a

       Cell Measures:   area, volume: n/a

      Standard Names: * latitude: ['lat']
                      * longitude: ['lon']
                      * time: ['time']

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   cloud_area_fraction: ['clt']

              Bounds:   T: ['time_bnds']
                        X: ['lon_bnds']
                        Y: ['lat_bnds']
                        lat: ['lat_bnds']
                        latitude: ['lat_bnds']
                        lon: ['lon_bnds']
                        longitude: ['lon_bnds']
                        time: ['time_bnds']

       Grid Mappings:   n/a

@dchandan
Copy link
Collaborator Author

dchandan commented Feb 26, 2024

You're saying CF-xarray is able to parse things correctly? I mean with regards to the bounds.

What are your thoughts on the missing variables I mentioned above? Should we remove the lines I pointed to as the likely culprit?

@huard
Copy link
Collaborator

huard commented Feb 26, 2024

yes, I got that output by doing

l = "https://redoak.cs.toronto.edu/twitcher/ows/proxy/thredds/dodsC/datasets/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r2i1p1f1/Amon/clt/gr/v20201215/clt_Amon_EC-Earth3_historical_r2i1
    ...: p1f1_gr_185001-201412.nc"
ds = xr.open_dataset(l)
ds.cf

Need to look into it more.

@huard
Copy link
Collaborator

huard commented Feb 27, 2024

Took a bit of time to review this and refresh my memory.

Variables

I decided to put time, lon, lat in the dimensions attributes instead of the variables. I mean, they could be in both, but I didn't see how this would be useful from a catalogue perspective, where searching for variables is the primary usage. Open to counter arguments.

Bounds

Agree with the second point. I looked at how CF-xarray does it and will prepare a small PR to port the logic here, and add a test.

@huard huard linked a pull request Feb 28, 2024 that will close this issue
@dchandan
Copy link
Collaborator Author

I think we can leave lat, lon and time as both dimensions and variables, in case a future use case needs this comprehensive description provided by the data cube extension.

@dchandan
Copy link
Collaborator Author

But doesn't matter to me. David, you choose.

@huard
Copy link
Collaborator

huard commented Feb 28, 2024

I'll ask around to see what people think.

@huard
Copy link
Collaborator

huard commented Feb 29, 2024

For reference the Microsoft collection lists lat, lon time as dimensions only.
https://planetarycomputer-staging.microsoft.com/api/stac/v1/collections/nasa-nex-gddp-cmip6/items/UKESM1-0-LL.ssp585.2100

@huard huard closed this as completed in #54 Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants