Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xarray percentile - support non-dask input and percentiles in a new #248

Merged
merged 12 commits into from
Sep 21, 2021

Conversation

kieranricardo
Copy link
Contributor

@kieranricardo kieranricardo commented Jul 9, 2021

This PR does a couple of things and addresses #209:

  • renames the old xr_percentile function to xr_quantile_bands
  • adds a new function named xr_quantile that outputs the quantiles as a dimension
  • adds support for non dask input in xr_quantiles and xr_quantiles_bands

@robbibt is this what you had in mind? :)

@kieranricardo kieranricardo requested a review from robbibt July 9, 2021 04:45
@robbibt
Copy link
Contributor

robbibt commented Jul 12, 2021

@kieranricardo This looks brilliant! Thanks so much for updating this - I'm testing it now and should have some feedback early this week!

Copy link
Contributor

@robbibt robbibt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Kieran, thanks so much for this, and apologies for the delay in reviewing it! The revised function works amazingly quickly and the quantile dimension is exactly what I had in mind - will make it much more compatible with code that used to user the slower .quantile methods. A few bits of feedback - very happy for some of these to be pushed back to future work depending on complexity!

  1. It's likely users will want to apply this function to data that doesn't necessarily have a time dimension (e.g. I currently have data that is labelled by "year"). From what I can tell the function does actually work for different dimension names - is it currently based on the first dimension in the list? If so, I think at minimum we should update the docstring to communicate this.
  2. Related to the 1. above: one of the most powerful features of xarray is being able to specify the dim over which to apply functions, e.g. dim=['x', 'y'] or dim='time' to reduce data spatially or temporally. It would be amazing if this function could support this kind of functionality rather than being hard-coded to time, but if we think it's best to leave it as a specialised time dimension reducing function, it would be good to emphasise this in the docstring too and point users to the slower funcs if they want to do more complicated things.
  3. The current nodata behavior could perhaps be improved if this value was read directly from the nodata attribute on the input xr.DataArray data, e.g. ds.nbart_red.nodata, with a fallback to NaN or a custom supplied value if this doesn't exist (I think this is what other odc.tools funcs do). I can see that there's a requirement for the nodata value to be the smallest/largest/NaN, so perhaps this might require a warning if this isn't the case.
  4. Rename the quantile arg to q for compatibility with the numpy and xarray implementation?

@robbibt robbibt self-requested a review September 2, 2021 06:18
Copy link
Contributor

@robbibt robbibt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to start using this func for some upcoming coastal work, so I think the best plan might be to merge this as-is, and I can raise the extra suggestions below as an issue for future work. Thanks again @kieranricardo!

@Kirill888
Copy link
Member

@kieranricardo can you please rebase against develop rather than merge, and also cleanup commit history a bit, maybe merge some commits, or even all of them.

@kieranricardo
Copy link
Contributor Author

@Kirill888 will a squash and merge be sufficient?

@codecov
Copy link

codecov bot commented Sep 21, 2021

Codecov Report

Merging #248 (0cb8a75) into develop (1339295) will increase coverage by 0.16%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #248      +/-   ##
===========================================
+ Coverage    58.61%   58.78%   +0.16%     
===========================================
  Files           79       79              
  Lines         6643     6670      +27     
===========================================
+ Hits          3894     3921      +27     
  Misses        2749     2749              
Impacted Files Coverage Δ
libs/algo/odc/algo/__init__.py 100.00% <100.00%> (ø)
libs/algo/odc/algo/_percentile.py 100.00% <100.00%> (ø)
libs/stats/odc/stats/_fc_percentiles.py 87.69% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1339295...0cb8a75. Read the comment docs.

@Kirill888
Copy link
Member

@kieranricardo I don't know is there one commit worth of changes in here? You still need to pull in changes from develop too

@kieranricardo kieranricardo merged commit 75285fe into develop Sep 21, 2021
@kieranricardo kieranricardo deleted the enhancement/xr-percentiles branch September 21, 2021 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants