-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganisation into wpandas and lpandas #280
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #280 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 30 25 -5
Lines 2643 1892 -751
==========================================
- Hits 2643 1892 -751
... and 4 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
I was already going to restart #270 after #272, as I will need to rethink the fixes for Hopefully this will also make rethinking labels more straightforward, so e.g. |
Realised I'd missed the brackets in |
@lukashergt what do you think about this suggestion:
Is there any other reorganisation you have considered in the past? |
Looking through it at the moment... On a general level, I like it, as it cleans up how things are built up on pandas and matplotlib...
What type of scripts will break? The core functionalities using the I think it will mostly break for anybody building on |
I was more referring to if we e.g. rename |
fbf4a7a gives a revertable example of this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments inline.
wpandas/core.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to start mirroring pandas more by separating the classes in here and putting them in the matching module names?
Then WeightedSeries
and WheightedDataFrame
would go in a file frame.py
, and WeightedGroupBy
would go in a subfolder groupby
in a file groupby.py
, and WeightedSeriesGroupBy
and WeightedDataFrameGroupBy
would go in the same subfolder groupby
in a file generic.py
...
The advantage would be that it makes clearer how/where to compare this to in pandas.
The downside is that it makes the architecture more (possibly too?) complicated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, e9fba58 begins this procedure for wpandas. Before I go and do the same for both lpandas, anesthetic, and all the documentation related changes, are we happy with this level of reorganisation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was skeptical, but it does keep the files nice and small, the old weightedpandas.py was getting a bit large for my tastes.
However, if you really want to match the pandas layout, the tests are actually included within the pandas package under pandas/tests/, so I guess we should have anesthetic/tests/, wpandas/tests/ and lpandas/tests/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm on the fence. This does blow things up quite a bit. Find it hard to judge what will be easier to maintain. Do you think the matching file structure will make it easier to adapt to changes in pandas?
I'm not sure whether the same kind of matching is needed for lpandas. The weights are such a substantial addition that they really carry through everything, but I don't think the same goes for labels.
For anesthetic I think I'd prefer keeping things simple.
Sorry, I don't have a very clear line on this, and can be easily swayed either way. I think for wpandas it might help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lpandas structure should match (separate frame.py, series.py etc) but perhaps leave the tests where they are?
wpandas/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this becomes its proper subpackage, it would probably be good to give this a file docstring.
tests/test_weighted_pandas.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think the
wpandas
subpackage should be completely independent fromanesthetic
, so we should make sure its tests do not rely on anesthetic imports.- Should
channel_capacity
be awpandas
function?
- Should
- Should we have test subfolders splitting tests for
anesthetic
andwpandas
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, regarding the first point, channel_capacity
is already moved to wpandas
but imported in anesthetic.utils
. We should use wpandas.utils
in these tests instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have test subfolders splitting tests for anesthetic and wpandas?
Yes. I will do this (after we've decided on the degree of reorganisation)
Should channel_capacity be a wpandas function?
We could just use the scipy function np.exp(-entropy(weights))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just found out that Kish and entropy method are actually related, which is pretty cool and helped me understand where the cryptic channel capacity was actually coming from... See #285 (comment).
I was having a go at playing with the docs to see if I could get |
There is an issue in that we now have multiple parallel packages in one, which makes the auto-generated sphinx documentation not as straightforward... Where previously we would need to run sphinx-apidoc -fM -t docs/templates/ -o docs/source/ anesthetic/ this won't work anymore, we now have This makes me question whether this is really what we want... Maybe this is all overkill. If we do want the separation, then the proper way might be to create a new repository for wpandas, completely independent from anesthetic. Thoughts? |
I think a separate wpandas repo really is overkill. |
… make sure all three are in modules.rst
@williamjameshandley @lukashergt are we still keen for this reorganisation? #282 depends on this, and I'm wary of falling behind pandas. |
I agree.
Coming back to this I'm now also on the fence. @lukashergt do you want to cast a vote? |
In many ways weighted, labelled data frames would deserve their own repository (or deserve being a pandas own thing), separate from anesthetic, but I agree that that is overkill for us. However, a wpandas subpackage as designed in this PR is on some levels (notably automated documentation) more complicated than a separate repository, and certainly more complicated than a wpandas module. I am quite happy with the current |
OK, let's leave it for now, and press ahead with the anesthetic 2.0.0 release. |
Just a couple of thoughts: To butcher a classic quote: "premature modularity is the root of all evil". I agree that Modularity allows code to be reused. I think it is very unlikely that someone outside our circle would think or want to use |
Description
This arises from a conversation last summer with @lukashergt . If it is something that we still want to do, we should do it before anesthetic 2.0.0 release.
The idea here is to separate out the weighted pandas functionality from the anesthetic functionality.
The benefit of this is that it makes it easier to see (particularly when it comes to pandas plotting overrides) what is modifying pandas by adding weights (
wpandas
) or labels (lpandas
), and what then adds to this to provide new plotting tools (anesthetic
).This PR would also be an opportunity to do any other general reorganisation, for example renaming
anesthetic.samples
toanesthetic.core
, and movinganesthetic.{plot,boundary,kde}
intoanesthetic.plotting
.The downside is that by doing this tidy-up we will have to get used to new locations/names, and will break scripts that relied on
anesthetic.samples
, which is allowed when updating a major version, but not ideal when we've been in beta for so long. This will likely also cause grief for the in-progress #270, but I'm happy to help @AdamOrmondroyd merge/redo that if we agree that this PR is in principle a good idea.Checklist:
flake8 anesthetic tests
)pydocstyle --convention=numpy anesthetic
)python -m pytest
)