Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future proof weighted pandas functionality #275

Open
williamjameshandley opened this issue Mar 23, 2023 · 1 comment
Open

Future proof weighted pandas functionality #275

williamjameshandley opened this issue Mar 23, 2023 · 1 comment
Milestone

Comments

@williamjameshandley
Copy link
Collaborator

I would much rather have the explicit args and kwargs, and that would actually be the pythonic way, but we can't maintain a full copy of pandas and keep it up to date with all the little changes going on there. Hence it would have been much nicer to have GroupBy automatically point to the corresponding methods, without needing to create a GroupBy method for every WeightedDataFrame method that we want to use...

For any anesthetic-own functions I would strive to be explicit, but when it comes to inheritance from pandas, I would try to keep things as independent and robust to changes in pandas as possible.

Originally posted by @lukashergt in #272 (comment)

As discussed by @lukashergt and @Ormorod, we ideally would have a more robust and more informative set of *args, **kwargs for our weighted functions.

@williamjameshandley
Copy link
Collaborator Author

One option we could explore is re-factoring the weighted calculations to leverage the existing infrastructure.

As an example, at the moment, WeightedSeries.mean has the signature and implementation:

def mean(self, skipna=True):                                             
    null = self.isnull() & skipna                                                         
    return np.average(masked_array(self, null), weights=self.get_weights())    

Pandas 1.5.3 has

Series.mean(axis=_NoDefault.no_default, skipna=True, level=None, numeric_only=None, **kwargs)

A future-proof way of writing this calculation would be:

def mean(self, *args, **kwargs):
      return (self.get_weights()*self).mean(*args, **kwargs) / (
              self.get_weights()*~self.isna()).mean(*args, **kwargs)                        

This exploits the fact that the weighted mean of f by w can be written as <wf>/<w>, where <.> is the unweighted mean. With care, one can write similar expressions for other weighted functions. The advantage of this is that this will forward on all of the other kwargs correctly to the underlying pandas mean function.

This does not resolve the "informative kwargs", but would ensure that we should expect the docstring and signature to be in alignment with the underlying pandas help.

@williamjameshandley williamjameshandley added this to the 2.1.0 milestone Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant