Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-1891 - Aggregated time dimensions only include input values. #41

Merged
merged 3 commits into from
Aug 1, 2023

Conversation

owenlittlejohns
Copy link
Member

Description

This PR updates the way the NetCDF-to-Zarr service performs temporal aggregation. Previously it would ensure the aggregated temporal dimension was regularly gridded. However, some collections have 1 granule per month, and that granule is not even taken at the same time of day, so we were getting very sparse outputs. Additionally, the performance for such collections was prohibitively poor. Instead, the service will now produce a temporal dimension in the aggregated Zarr store that only contains values corresponding to those in the input granules.

Jira Issue ID

DAS-1891

Local Test Steps

  • Build the service image from this branch.
  • Rebuild Harmony-in-a-Box with this service.
  • Run the DAS-1891-local-verification Jupyter notebook attached to the Jira ticket (GitHub won't let me attach a notebook to the PR).

Note the test cases in the notebook do not use the PO.DAAC collection with reported issues, as we do not have a mirror of that in EEDTEST. But the behaviour can still be confirmed with GPM_3IMERGHH.

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • Tests added/updated and passing.
  • Documentation updated (if needed).
  • version.txt and CHANGELOG.md updated (if publishing a new release).

CHANGELOG.md Outdated
@@ -1,3 +1,10 @@
## v1.1.2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence about the semantic version number to use. This feels like a pretty significant change, but it's not changing the input parameters or anything, only the output. Open to thoughts here, for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I have no vote... I would vote for at least a minor level bump to v1.2.0 since someone using the zarr store would possibly need to change their code (and the change to zarr is a change in behavior, not a bug fix like with most patch version updates).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris beat me to this, we didn't break the api, but we did change the output so that a user has to do something to adapt their code. +1 for at least a minor. I could be convinced of a major.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both. I think I might opt for a minor version number change because, while this update does change the nature of the values in the temporal dimension, we're not adding/removing dimensions in the final Zarr store. Does that sound like reasonable/sane logic?

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, but need to decide what version it should be. Nice work.

CHANGELOG.md Outdated
@@ -1,3 +1,10 @@
## v1.1.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris beat me to this, we didn't break the api, but we did change the output so that a user has to do something to adapt their code. +1 for at least a minor. I could be convinced of a major.

```
return self._get_output_dimension(dimension_name, all_input_values,
output_dimension_units)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good information, is there a better place for it? Is there a ticket to implement User's choice of aggregation methods later? Not actually sure what I'd want to see so I'm just putting down whatever hits my brain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - this is a good question. Honestly, I'm not sure where else to put this. We don't have a ticket for allowing a user to decide for a continuous output dimension vs just using the input values.

There's the commit history, but I assume that only you or I would know to dig back into that.

tests/unit/test_mosaic_utilities.py Outdated Show resolved Hide resolved
@owenlittlejohns owenlittlejohns merged commit de47e5d into main Aug 1, 2023
1 check passed
@owenlittlejohns owenlittlejohns deleted the DAS-1891-aggregation-rework branch August 1, 2023 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants