-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep attributes for "bounds" variables #8924
base: main
Are you sure you want to change the base?
Conversation
Issue pydata#2921 is about mismatching time units between a time variable and its "bounds" companion. However, pydata#2965 does more than fixing pydata#2921, it removes all double attributes from "bounds" variables which has the undesired side effect that there is currently no way to save them to netcdf with xarray. Since the mentioned link is a recommendation and not a hard requirement for CF compliance, these attributes should be left to the caller to prepare the dataset variables appropriately if required. Reduces the amount of surprise that attributes are not written to disk and fixes pydata#8368.
@@ -794,24 +794,4 @@ def cf_encoder(variables: T_Variables, attributes: T_Attrs): | |||
|
|||
new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} | |||
|
|||
# Remove attrs from bounds variables (issue #2921) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a bit more complex.
We treat the variable, and the associated bounds variable, independently in Line 795. So the approach was to set the units on both if not present, do the encoding, then delete the units etc. on bounds variables.
I think we should still delete these attributes if the user didn't set them. That means we'll have to detect these variable names in _update_bounds_encoding
and then modify this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a bit more complex.
We treat the variable, and the associated bounds variable, independently in Line 795. So the approach was to set the units on both if not present, do the encoding, then delete the units etc. on bounds variables.
Why do you need to set them if they are not set by the user?
I think we should still delete these attributes if the user didn't set them. That means we'll have to detect these variable names in
_update_bounds_encoding
and then modify this loop.
They shouldn't be there if the user did not set them.
Maybe I am a bit confused here, could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to set them if they are not set by the user?
To ensure that the two variables are encoded similarly as the standard suggests.
Our encode/decode logic treats all variables independently, so we need to do this to treat the time
and time_bounds
variables identically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to set them if they are not set by the user?
To ensure that the two variables are encoded similarly as the standard suggests.
Our encode/decode logic treats all variables independently, so we need to do this to treat the
time
andtime_bounds
variables identically.
I am only talking about writing the dataset to disk, so the encoding side of it before writing. I might see the point for time variables, but this could all be handled internally in .encoding
(which is used for time variables anyway).
I still don't quite understand the logic for all the others. The standard suggets that if the attributes exist, they should be identical. If they don't exist, well then just don't bother and leave it to the reader to infer the encoding from the parent variable. I don't see the necessitiy to pollute .attrs
for that.
After encoding these and removing the removal of attributes from time bounds variables, these should be in the attributes. Removes the checks that they aren't.
Makes the `time_bounds` calendar really different from `time` and checks that they are correctly encoded to be non-CF compliant.
for more information, see https://pre-commit.ci
Issue #2921 is about mismatching time units between a time variable and its "bounds" companion.
However, #2965 does more than fixing #2921, it removes all double attributes from "bounds" variables which has the undesired side effect that there is currently no way to save them to netcdf with xarray. Since the mentioned link is a recommendation and not a hard requirement for CF compliance, these attributes should be left to the caller to prepare the dataset variables appropriately if required. Reduces the amount of surprise that attributes are not written to disk and fixes #8368.
whats-new.rst
(not sure about "user visibility" here)