WeightedGroupBy #272

AdamOrmondroyd · 2023-03-10T16:51:06Z

Description

Statistics of WeightedDataFrame.groupby such as mean() ignore weights. Since my ambitious attempt to fix this in pandas itself is looking very unlikely to get through, I have attempted to fix it here. However, they have suggested adding a self._gb_cls attribute to Series and DataFrame, which .groupby() returns. For now, I have copied the methods.

Fixes #260

Checklist:

I have performed a self-review of my own code
My code is PEP8 compliant (flake8 anesthetic tests)
My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
New and existing unit tests pass locally with my changes (python -m pytest)
I have added tests that prove my fix is effective or that my feature works
I have appropriately incremented the semantic version number in both README.rst and anesthetic/_version.py
Get the mean().mean() case working (test currently commented out). This seems like it would require significantly more work...

codecov · 2023-03-10T17:08:06Z

Codecov Report

Merging #272 (85fa6ae) into master (1779dad) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #272   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           30        30           
  Lines         2555      2643   +88     
=========================================
+ Hits          2555      2643   +88

Impacted Files	Coverage Δ
anesthetic/_version.py	`100.00% <100.00%> (ø)`
anesthetic/samples.py	`100.00% <100.00%> (ø)`
anesthetic/utils.py	`100.00% <100.00%> (ø)`
anesthetic/weighted_pandas.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

AdamOrmondroyd · 2023-03-10T17:08:36Z

I'm aware coverage won't be complete as the test only considers the mean. Currently trying to think of a more efficient way of writing the tests

lukashergt · 2023-03-10T19:09:19Z

Since my ambitious attempt to fix this in pandas itself is looking very unlikely to get through

The latest comment sounded in favour of the fix, so I'd still have hope...

What does the transform in udf.groupby('group').transform('mean') do? Is that something we could use/leverage?

AdamOrmondroyd · 2023-03-13T14:38:19Z

What does the transform in udf.groupby('group').transform('mean') do? Is that something we could use/leverage?

pandas.core.groupby.DataFrameGroupBy.transform(func) takes func and applies it to each group. What we might be able to use/create is something like .transform('weight') to calculate the total weight of each group, which might be useable for the mean of means...

anesthetic/weighted_pandas.py

AdamOrmondroyd · 2023-03-14T12:22:05Z

@lukashergt The failing tests are to do with building documentation, which I'll be honest I could do with some pointers of what you might like to see here.

Edit: sticking underscores in front of everything hasn't fixed everything, and I probably should actually take care with the documentation for groupby

…tion" This reverts commit 244cd1f.

…erencing pandas is a pain, some of its classes lack docs

* `GroupBy` does not have its own docs. - Its initialisation signature looks like a core dump, hence implementing our own initialisation function. - Trying to cross-reference as ``:class:`pandas.core.groupby.GroupBy` `` will fail, hence dropping the link attempt. Same goes for `SeriesGroupBy` and `DataFrameGroupBy`. * Dropping `kurtosis` from `WeightedGroupBy`, since it is not implemented in `pandas.core.croupby.GroupBy`. Leave that to tacke once/if/when we really need it. * Add docstring adjustments for `WeightedDataFrameGroupBy` and `WeightedSeriesGroupBy` to the end of `weighted_pandas` in the same way as previously done for `WeightedDataFrame` and `WeightedSeries`.

…ince they have essentially no documentation anyhow

AdamOrmondroyd · 2023-03-30T13:35:11Z

Even in native pandas they have different formatting

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.]})
df.hist('Max Speed')

df['Max Speed'].plot.hist()

…here...

AdamOrmondroyd · 2023-03-31T15:29:14Z

I guess I should probably test all the plot types...

lukashergt · 2023-04-04T20:22:37Z

anesthetic/weighted_pandas.py

+    def kurt(self, *args, **kwargs):  # noqa: D102
+        return self._add_weights("kurt", *args, **kwargs)
+
+    def kurtosis(self, *args, **kwargs):  # noqa: D102
+        return self._add_weights("kurtosis", *args, **kwargs)


Why do we need to define kurt and kurtosis here, but don't need to define skew? Yet skew still passes the tests...?

I first thought this had to do with pandas GroupBy not overriding skew as opposed to mean, median, std and var. However, the same would go for kurt and kurtosis, so now I wonder what is different/special about skew?

fyi I've found that skew needs adding for pandas 2 (otherwise it fails the is weighted test), so could add it in here now

A little bit of an odd one -- pandas 1.5 doesn't actually implement groupby kurt and kurtosis (despite the fact I believe it could). @AdamOrmondroyd does it need adding for pandas 2? My instinct is to leave it in, as it doesn't do any harm.

tests/test_samples.py

williamjameshandley · 2023-04-05T09:47:31Z

Is there anything else which needs to be done?

AdamOrmondroyd · 2023-04-06T20:44:34Z

Is there anything else which needs to be done?

I was working my way through testing all the plots, but so far I haven't found any issues with the plots themselves

…inite

AdamOrmondroyd · 2023-04-07T17:14:49Z

Had a fruitless look into why it's failing on pip but not conda, isn't as simple as numpy 1.23 vs 1.24

lukashergt · 2023-04-07T18:58:46Z

Had a fruitless look into why it's failing on pip but not conda, isn't as simple as numpy 1.23 vs 1.24

I think it was a numerical/floating point thing where the covariance matrix turned out positive definite despite having a linear dependent parameter. Anyhow, more helped more (85fa6ae)...

AdamOrmondroyd · 2023-04-07T18:59:13Z

@lukashergt if you're happy with cov/GelmanRubin then I reckon we're good to go

lukashergt

Ok, with the Gelman--Rubin statistic fixed thanks to the functioning groupby I am happy now. If you are happy too, feel free to squash and merge.

Thanks, @AdamOrmondroyd, @williamjameshandley

AdamOrmondroyd added 9 commits March 10, 2023 16:01

first pass at WeightedGroupBy

ab63486

correct cov

e4e834e

remove duplicate cov

94aecb9

give up on cov for now

ada20d7

use Lukas' test

3abca58

remove unecessary import

626bce3

remove currently unused lines from tests

ca31333

version bump

9578b4b

sort out docstrings

8531a56

AdamOrmondroyd changed the title ~~Groupby~~ WeightedGroupby Mar 10, 2023

AdamOrmondroyd changed the title ~~WeightedGroupby~~ WeightedGroupBy Mar 10, 2023

fix indentation

9598eac

AdamOrmondroyd added 2 commits March 10, 2023 17:39

tests using cobaya chains

192bffe

test formatting

de9b4f7

AdamOrmondroyd added 2 commits March 14, 2023 12:07

reinstate median test

55e9f4e

change numeric_only to None in median

dfc4c3b

AdamOrmondroyd commented Mar 14, 2023

View reviewed changes

anesthetic/weighted_pandas.py Outdated Show resolved Hide resolved

stick underscores in front to see if this fixes the documentation

244cd1f

AdamOrmondroyd and others added 7 commits March 14, 2023 12:26

Revert "stick underscores in front to see if this fixes the documenta…

e3badbb

…tion" This reverts commit 244cd1f.

add missing no cover to WeightedSeries.groupby()

ca00255

remove :show-inheritance: for weighted_pandas autodocs, cross ref…

d97b760

…erencing pandas is a pain, some of its classes lack docs

drop WeightedGroupBy.kurtosis also from tests

6703ca7

make WeightedDataFramGroupBy and WeightedSeriesGroupBy private, s…

40cd004

…ince they have essentially no documentation anyhow

make WeightedGroupBy.grouper private

0f2104e

AdamOrmondroyd added 2 commits March 30, 2023 16:29

add test for groupby().hist()

5113b61

add test for groupby().plot.hist(), not happy with the janky slicing …

918986c

…here...

AdamOrmondroyd added 7 commits March 31, 2023 16:50

add test for groupby().plot.kde()

b13b0a2

add tests for hist_1d and kde_1d

9709b38

test for fastkde_1d

338fc8a

test for hist_2d

0e2676a

plt.close('all')

9589921

test for kde_2d

4e8e50f

test for fastkde_2d

a631d60

AdamOrmondroyd mentioned this pull request Apr 3, 2023

Changes for Pandas 2 #270

Closed

11 tasks

williamjameshandley and others added 2 commits April 4, 2023 17:40

Reinstated init function to get documentation to work

132d80f

complete test coverage for explicit weight checks

369c49c

lukashergt reviewed Apr 4, 2023

View reviewed changes

williamjameshandley requested a review from lukashergt April 5, 2023 13:26

williamjameshandley added 2 commits April 5, 2023 14:29

Readme correction following handley-lab#217

122bf2b

Merge branch 'groupby' of github.com:Ormorod/anesthetic into groupby

6d686f6

lukashergt added 2 commits April 6, 2023 22:57

fix GelmanRubin method now that groupby is fixed

5a5f106

add test for LinAlgError when covariance matrix is not positive def…

9d10ce5

…inite

make linear dependence more blatant in check for LinAlgError

85fa6ae

lukashergt approved these changes Apr 7, 2023

View reviewed changes

AdamOrmondroyd merged commit 17f1ae4 into handley-lab:master Apr 7, 2023

This was referenced Apr 9, 2023

Reorganisation into wpandas and lpandas #280

Closed

Changes for pandas 2.0.0 #282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeightedGroupBy #272

WeightedGroupBy #272

AdamOrmondroyd commented Mar 10, 2023 •

edited by williamjameshandley

Loading

codecov bot commented Mar 10, 2023 •

edited

Loading

AdamOrmondroyd commented Mar 10, 2023 •

edited

Loading

lukashergt commented Mar 10, 2023

AdamOrmondroyd commented Mar 13, 2023 •

edited

Loading

AdamOrmondroyd commented Mar 14, 2023 •

edited

Loading

AdamOrmondroyd commented Mar 30, 2023

AdamOrmondroyd commented Mar 31, 2023

lukashergt Apr 4, 2023

AdamOrmondroyd Apr 4, 2023

williamjameshandley Apr 5, 2023

williamjameshandley commented Apr 5, 2023

AdamOrmondroyd commented Apr 6, 2023

AdamOrmondroyd commented Apr 7, 2023

lukashergt commented Apr 7, 2023

AdamOrmondroyd commented Apr 7, 2023

lukashergt left a comment

WeightedGroupBy #272

WeightedGroupBy #272

Conversation

AdamOrmondroyd commented Mar 10, 2023 • edited by williamjameshandley Loading

Description

Checklist:

codecov bot commented Mar 10, 2023 • edited Loading

Codecov Report

AdamOrmondroyd commented Mar 10, 2023 • edited Loading

lukashergt commented Mar 10, 2023

AdamOrmondroyd commented Mar 13, 2023 • edited Loading

AdamOrmondroyd commented Mar 14, 2023 • edited Loading

AdamOrmondroyd commented Mar 30, 2023

AdamOrmondroyd commented Mar 31, 2023

lukashergt Apr 4, 2023

Choose a reason for hiding this comment

AdamOrmondroyd Apr 4, 2023

Choose a reason for hiding this comment

williamjameshandley Apr 5, 2023

Choose a reason for hiding this comment

williamjameshandley commented Apr 5, 2023

AdamOrmondroyd commented Apr 6, 2023

AdamOrmondroyd commented Apr 7, 2023

lukashergt commented Apr 7, 2023

AdamOrmondroyd commented Apr 7, 2023

lukashergt left a comment

Choose a reason for hiding this comment

AdamOrmondroyd commented Mar 10, 2023 •

edited by williamjameshandley

Loading

codecov bot commented Mar 10, 2023 •

edited

Loading

AdamOrmondroyd commented Mar 10, 2023 •

edited

Loading

AdamOrmondroyd commented Mar 13, 2023 •

edited

Loading

AdamOrmondroyd commented Mar 14, 2023 •

edited

Loading