Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeightedGroupBy #272

Merged
merged 76 commits into from
Apr 7, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
ab63486
first pass at WeightedGroupBy
AdamOrmondroyd Mar 10, 2023
e4e834e
correct cov
AdamOrmondroyd Mar 10, 2023
94aecb9
remove duplicate cov
AdamOrmondroyd Mar 10, 2023
ada20d7
give up on cov for now
AdamOrmondroyd Mar 10, 2023
3abca58
use Lukas' test
AdamOrmondroyd Mar 10, 2023
626bce3
remove unecessary import
AdamOrmondroyd Mar 10, 2023
ca31333
remove currently unused lines from tests
AdamOrmondroyd Mar 10, 2023
9578b4b
version bump
AdamOrmondroyd Mar 10, 2023
8531a56
sort out docstrings
AdamOrmondroyd Mar 10, 2023
9598eac
fix indentation
AdamOrmondroyd Mar 10, 2023
192bffe
tests using cobaya chains
AdamOrmondroyd Mar 10, 2023
de9b4f7
test formatting
AdamOrmondroyd Mar 10, 2023
55e9f4e
reinstate median test
AdamOrmondroyd Mar 14, 2023
dfc4c3b
change numeric_only to None in median
AdamOrmondroyd Mar 14, 2023
244cd1f
stick underscores in front to see if this fixes the documentation
AdamOrmondroyd Mar 14, 2023
e3badbb
Revert "stick underscores in front to see if this fixes the documenta…
AdamOrmondroyd Mar 14, 2023
ca00255
add missing no cover to WeightedSeries.groupby()
AdamOrmondroyd Mar 14, 2023
d97b760
remove `:show-inheritance:` for `weighted_pandas` autodocs, cross ref…
lukashergt Mar 15, 2023
bfa2647
fix autodocs for `weighted_pandas`
lukashergt Mar 15, 2023
6703ca7
drop `WeightedGroupBy.kurtosis` also from tests
lukashergt Mar 15, 2023
40cd004
make `WeightedDataFramGroupBy` and `WeightedSeriesGroupBy` private, s…
lukashergt Mar 15, 2023
0f2104e
make `WeightedGroupBy.grouper` private
lukashergt Mar 15, 2023
87e77c4
Merge branch 'master' into groupby
AdamOrmondroyd Mar 16, 2023
9636d5f
version bump
AdamOrmondroyd Mar 16, 2023
6e86823
Merge branch 'master' into groupby
AdamOrmondroyd Mar 20, 2023
0a83fe0
version bump
AdamOrmondroyd Mar 20, 2023
8c907f1
Removed hard-coded numeric_only arguments
williamjameshandley Mar 22, 2023
eac682a
version bump
williamjameshandley Mar 22, 2023
2721c19
Merge branch 'master' into groupby
williamjameshandley Mar 22, 2023
c495362
Updated weighted samples
williamjameshandley Mar 22, 2023
7fa1cec
Completed coverage
williamjameshandley Mar 22, 2023
99a52e3
add missing space before inline comment
AdamOrmondroyd Mar 22, 2023
2e6d54e
joint call of column name and label
AdamOrmondroyd Mar 22, 2023
e45aaa6
formatting
AdamOrmondroyd Mar 22, 2023
5725ddf
additional chains.get_group(chains) tests
AdamOrmondroyd Mar 22, 2023
788fa84
added kurtosis, kurt, skew, mad, sem
williamjameshandley Mar 23, 2023
f22a24c
fix docs for weighted groupby sample methods
lukashergt Mar 24, 2023
d2215a5
complete coverage by adding test for `WeightedSeriesGroupBy.sample`
lukashergt Mar 24, 2023
866b3b3
fix groupby test for `WeightedSeriesGroupBy.sample`
lukashergt Mar 24, 2023
57a1c1f
add quantile
AdamOrmondroyd Mar 27, 2023
aff1455
add tests for corr, line 1441 causing invalid value warning
AdamOrmondroyd Mar 27, 2023
83e2c4d
add test for cov
AdamOrmondroyd Mar 27, 2023
0654d98
move quantile to end
AdamOrmondroyd Mar 27, 2023
43f0882
add test for corrwith
AdamOrmondroyd Mar 27, 2023
f1c966d
change `i` to `mask` to make it clearer that this is not a single ind…
lukashergt Mar 27, 2023
142740c
add tests that check whether `groupby` results from `mean`, `std`, `c…
lukashergt Mar 27, 2023
7b0a8e1
add groupby tests for `mad`, `corr`, `cov` and `corrwith` that check …
lukashergt Mar 27, 2023
911f54e
add tests for groupby that explicitly check that the methods return t…
lukashergt Mar 27, 2023
23f2d3d
Added some cleaner tests for get_group
williamjameshandley Mar 28, 2023
c5391a5
Merge branch 'groupby' of github.com:Ormorod/anesthetic into groupby
williamjameshandley Mar 28, 2023
d6423aa
partial completion of covariance
williamjameshandley Mar 29, 2023
706d759
Now using rather than
williamjameshandley Mar 29, 2023
bf07118
Added a wrapper for cov, corr, corrwith
williamjameshandley Mar 29, 2023
17d4332
corr and cov now working
williamjameshandley Mar 29, 2023
a71151e
reduced code repetition
williamjameshandley Mar 29, 2023
2935434
corrwith
williamjameshandley Mar 29, 2023
93b06a0
Corrections to two extra functions
williamjameshandley Mar 29, 2023
0655a9c
skipna no longer available for cov
williamjameshandley Mar 29, 2023
33dd6e0
Completed coverage with new nan
williamjameshandley Mar 30, 2023
5ec1fec
Increase coverage
williamjameshandley Mar 30, 2023
5113b61
add test for groupby().hist()
AdamOrmondroyd Mar 30, 2023
918986c
add test for groupby().plot.hist(), not happy with the janky slicing …
AdamOrmondroyd Mar 30, 2023
b13b0a2
add test for groupby().plot.kde()
AdamOrmondroyd Mar 31, 2023
9709b38
add tests for hist_1d and kde_1d
AdamOrmondroyd Mar 31, 2023
338fc8a
test for fastkde_1d
AdamOrmondroyd Mar 31, 2023
0e2676a
test for hist_2d
AdamOrmondroyd Mar 31, 2023
9589921
plt.close('all')
AdamOrmondroyd Mar 31, 2023
4e8e50f
test for kde_2d
AdamOrmondroyd Mar 31, 2023
a631d60
test for fastkde_2d
AdamOrmondroyd Mar 31, 2023
132d80f
Reinstated init function to get documentation to work
williamjameshandley Apr 4, 2023
369c49c
complete test coverage for explicit weight checks
lukashergt Apr 4, 2023
122bf2b
Readme correction following #217
williamjameshandley Apr 5, 2023
6d686f6
Merge branch 'groupby' of github.com:Ormorod/anesthetic into groupby
williamjameshandley Apr 5, 2023
5a5f106
fix `GelmanRubin` method now that `groupby` is fixed
lukashergt Apr 7, 2023
9d10ce5
add test for `LinAlgError` when covariance matrix is not positive def…
lukashergt Apr 7, 2023
85fa6ae
make linear dependence more blatant in check for `LinAlgError`
lukashergt Apr 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
anesthetic: nested sampling post-processing
===========================================
:Authors: Will Handley and Lukas Hergt
:Version: 2.0.0-beta.22
:Version: 2.0.0-beta.23
:Homepage: https://github.com/williamjameshandley/anesthetic
:Documentation: http://anesthetic.readthedocs.io/

Expand Down
2 changes: 1 addition & 1 deletion anesthetic/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '2.0.0b22'
__version__ = '2.0.0b23'
120 changes: 120 additions & 0 deletions anesthetic/weighted_pandas.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,60 @@
"""Pandas DataFrame and Series with weighted samples."""

import warnings
from inspect import signature
import numpy as np
from pandas import Series, DataFrame, concat, MultiIndex
from pandas.core.groupby import GroupBy, SeriesGroupBy, DataFrameGroupBy
from pandas._libs import lib
from pandas._libs.lib import no_default
from pandas.util._exceptions import find_stack_level
from pandas.util import hash_pandas_object
from numpy.ma import masked_array
from anesthetic.utils import (compress_weights, channel_capacity, quantile,
temporary_seed, adjust_docstrings)


class WeightedGroupBy(GroupBy):
"""Weighted version of :class:`pandas.core.groupby.GroupBy`."""

def mean(self, numeric_only=False): # noqa: D102
result = self.agg(lambda df: self.obj._constructor(df).mean(
numeric_only=numeric_only))
return result.__finalize__(self.obj, method="groupby")

def std(self, numeric_only=False): # noqa: D102
result = self.agg(lambda df: self.obj._constructor(df).std(
numeric_only=numeric_only))
return result.__finalize__(self.obj, method="groupby")
williamjameshandley marked this conversation as resolved.
Show resolved Hide resolved

def kurtosis(self, numeric_only=False): # noqa: D102
result = self.agg(lambda df: self.obj._constructor(df).kurtosis(
numeric_only=numeric_only))
return result.__finalize__(self.obj, method="groupby")

def median(self, numeric_only=None): # noqa: D102
williamjameshandley marked this conversation as resolved.
Show resolved Hide resolved
result = self.agg(lambda df: self.obj._constructor(df).median(
numeric_only=numeric_only))
return result.__finalize__(self.obj, method="groupby")

def var(self, numeric_only=False): # noqa: D102
result = self.agg(lambda df: self.obj._constructor(df).var(
numeric_only=numeric_only))
return result.__finalize__(self.obj, method="groupby")


class WeightedSeriesGroupBy(WeightedGroupBy, SeriesGroupBy):
"""Weighted version of :class:`pandas.core.groupby.SeriesGroupBy`."""

pass


class WeightedDataFrameGroupBy(WeightedGroupBy, DataFrameGroupBy):
"""Weighted version of :class:`pandas.core.groupby.DataFrameGroupBy`."""

pass


class _WeightedObject(object):
"""Common methods for `WeightedSeries` and `WeightedDataFrame`.

Expand Down Expand Up @@ -204,6 +250,35 @@ def _constructor(self):
def _constructor_expanddim(self):
return WeightedDataFrame

def groupby(
self,
by=None,
axis=0,
level=None,
as_index=True,
sort=True,
group_keys=True,
observed=False,
dropna=True,
): # noqa: D102
if level is None and by is None:
raise TypeError("You have to supply one of 'by' and 'level'")
if not as_index:
raise TypeError("as_index=False only valid with DataFrame")
axis = self._get_axis_number(axis)

return WeightedSeriesGroupBy(
obj=self,
keys=by,
axis=axis,
level=level,
as_index=as_index,
sort=sort,
group_keys=group_keys,
observed=observed,
dropna=dropna,
)


class WeightedDataFrame(_WeightedObject, DataFrame):
"""Weighted version of :class:`pandas.DataFrame`."""
Expand Down Expand Up @@ -405,6 +480,51 @@ def _constructor_sliced(self):
def _constructor(self):
return WeightedDataFrame

def groupby(
self,
by=None,
axis=no_default,
level=None,
as_index: bool = True,
sort: bool = True,
group_keys: bool = True,
observed: bool = False,
dropna: bool = True,
): # pragma: no cover # noqa: D102
if axis is not lib.no_default:
axis = self._get_axis_number(axis)
if axis == 1:
warnings.warn(
"DataFrame.groupby with axis=1 is deprecated. Do "
"`frame.T.groupby(...)` without axis instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
else:
warnings.warn(
"The 'axis' keyword in DataFrame.groupby is deprecated "
"and will be removed in a future version.",
FutureWarning,
stacklevel=find_stack_level(),
)
else:
axis = 0

if level is None and by is None:
raise TypeError("You have to supply one of 'by' and 'level'")

return WeightedDataFrameGroupBy(
obj=self,
keys=by,
axis=axis,
level=level,
as_index=as_index,
sort=sort,
group_keys=group_keys,
observed=observed,
dropna=dropna,
)


for cls in [WeightedDataFrame, WeightedSeries]:
adjust_docstrings(cls, r'\bDataFrame\b', 'WeightedDataFrame')
Expand Down
20 changes: 20 additions & 0 deletions tests/test_samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -1332,3 +1332,23 @@ def test_old_gui():
make_2d_axes(['x0', 'y0'], tex={'x0': '$x_0$', 'y0': '$y_0$'})
with pytest.raises(NotImplementedError):
make_1d_axes(['x0', 'y0'], tex={'x0': '$x_0$', 'y0': '$y_0$'})


def test_groupby_stats():
mcmc = read_chains('./tests/example_data/cb')
chains = mcmc.groupby(('chain', '$n_\\mathrm{chain}$'), group_keys=False)
assert np.all(np.isclose(mcmc.loc[mcmc['chain'] == 1].mean()
.to_numpy()[:-1],
chains.mean().iloc[0, :].to_numpy()))
assert np.all(np.isclose(mcmc.loc[mcmc['chain'] == 1].std()
.to_numpy()[:-1],
chains.std().iloc[0, :].to_numpy()))
assert np.all(np.isclose(mcmc.loc[mcmc['chain'] == 1].kurtosis()
.dropna().to_numpy(),
chains.kurtosis().iloc[0, :].dropna().to_numpy()))
assert np.all(np.isclose(mcmc.loc[mcmc['chain'] == 1].median()
.to_numpy()[:-1],
chains.median().iloc[0, :].to_numpy()))
assert np.all(np.isclose(mcmc.loc[mcmc['chain'] == 1].var()
.to_numpy()[:-1],
chains.var().iloc[0, :].to_numpy()))