Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganisation into wpandas and lpandas #280

Closed
wants to merge 23 commits into from

Conversation

williamjameshandley
Copy link
Collaborator

Description

This arises from a conversation last summer with @lukashergt . If it is something that we still want to do, we should do it before anesthetic 2.0.0 release.

The idea here is to separate out the weighted pandas functionality from the anesthetic functionality.

The benefit of this is that it makes it easier to see (particularly when it comes to pandas plotting overrides) what is modifying pandas by adding weights (wpandas) or labels (lpandas), and what then adds to this to provide new plotting tools (anesthetic).

This PR would also be an opportunity to do any other general reorganisation, for example renaming anesthetic.samples to anesthetic.core, and moving anesthetic.{plot,boundary,kde} into anesthetic.plotting.

The downside is that by doing this tidy-up we will have to get used to new locations/names, and will break scripts that relied on anesthetic.samples, which is allowed when updating a major version, but not ideal when we've been in beta for so long. This will likely also cause grief for the in-progress #270, but I'm happy to help @AdamOrmondroyd merge/redo that if we agree that this PR is in principle a good idea.

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works
  • I have appropriately incremented the semantic version number in both README.rst and anesthetic/_version.py

@codecov
Copy link

codecov bot commented Apr 9, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (08905f4) 100.00% compared to head (536a56f) 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##            master      #280    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           30        25     -5     
  Lines         2643      1892   -751     
==========================================
- Hits          2643      1892   -751     
Impacted Files Coverage Δ
anesthetic/convert.py 100.00% <ø> (ø)
anesthetic/gui/plot.py 100.00% <ø> (ø)
anesthetic/read/chain.py 100.00% <ø> (ø)
anesthetic/__init__.py 100.00% <100.00%> (ø)
anesthetic/_version.py 100.00% <100.00%> (ø)
anesthetic/core.py 100.00% <100.00%> (ø)
anesthetic/examples/perfect_ns.py 100.00% <100.00%> (ø)
anesthetic/plot.py 100.00% <100.00%> (ø)
anesthetic/plotting/__init__.py 100.00% <100.00%> (ø)
anesthetic/plotting/_matplotlib/__init__.py 100.00% <100.00%> (ø)
... and 8 more

... and 4 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@AdamOrmondroyd
Copy link
Collaborator

I was already going to restart #270 after #272, as I will need to rethink the fixes for cov.

Hopefully this will also make rethinking labels more straightforward, so e.g. Samples.plot.hist() will use the proper labels, rather than the tuple looking things we have at the moment, though this may now require both lpandas/plotting/ and wpandas/plotting/

@AdamOrmondroyd
Copy link
Collaborator

Realised I'd missed the brackets in super() from #235, so have also corrected here.

@williamjameshandley williamjameshandley added this to the 2.0.0 milestone Apr 10, 2023
@AdamOrmondroyd AdamOrmondroyd mentioned this pull request Apr 11, 2023
15 tasks
@williamjameshandley
Copy link
Collaborator Author

@lukashergt what do you think about this suggestion:

This PR would also be an opportunity to do any other general reorganisation, for example renaming anesthetic.samples to anesthetic.core, and moving anesthetic.{plot,boundary,kde} into anesthetic.plotting.

Is there any other reorganisation you have considered in the past?

@lukashergt
Copy link
Collaborator

lukashergt commented Apr 11, 2023

@lukashergt what do you think about this suggestion:

Looking through it at the moment...

On a general level, I like it, as it cleans up how things are built up on pandas and matplotlib...

The downside is that by doing this tidy-up we will have to get used to new locations/names, and will break scripts that relied on anesthetic.samples,

What type of scripts will break? The core functionalities using the NestedSamples methods should continue working as expected, right? After all, weighted_pandas was mostly hidden to the user under the hood of NestedSamples...

I think it will mostly break for anybody building on weighted_pandas directly, but for that type of user these changes should probably improve things in the long run...

@williamjameshandley
Copy link
Collaborator Author

What type of scripts will break?

I was more referring to if we e.g. rename anesthetic.samples to anesthetic.core, anybody who accesses e.g. merge_nested_samples from samples would now find this failing. I guess this could be accomplished by aliasing and raising a deprecation warning for anybody importing from anesthetic.samples.

@williamjameshandley
Copy link
Collaborator Author

I guess this could be accomplished by aliasing and raising a deprecation warning for anybody importing from anesthetic.samples.

fbf4a7a gives a revertable example of this.

Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments inline.

wpandas/core.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to start mirroring pandas more by separating the classes in here and putting them in the matching module names?

Then WeightedSeries and WheightedDataFrame would go in a file frame.py, and WeightedGroupBy would go in a subfolder groupby in a file groupby.py, and WeightedSeriesGroupBy and WeightedDataFrameGroupBy would go in the same subfolder groupby in a file generic.py...

The advantage would be that it makes clearer how/where to compare this to in pandas.

The downside is that it makes the architecture more (possibly too?) complicated...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, e9fba58 begins this procedure for wpandas. Before I go and do the same for both lpandas, anesthetic, and all the documentation related changes, are we happy with this level of reorganisation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was skeptical, but it does keep the files nice and small, the old weightedpandas.py was getting a bit large for my tastes.

However, if you really want to match the pandas layout, the tests are actually included within the pandas package under pandas/tests/, so I guess we should have anesthetic/tests/, wpandas/tests/ and lpandas/tests/ ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm on the fence. This does blow things up quite a bit. Find it hard to judge what will be easier to maintain. Do you think the matching file structure will make it easier to adapt to changes in pandas?

I'm not sure whether the same kind of matching is needed for lpandas. The weights are such a substantial addition that they really carry through everything, but I don't think the same goes for labels.

For anesthetic I think I'd prefer keeping things simple.

Sorry, I don't have a very clear line on this, and can be easily swayed either way. I think for wpandas it might help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lpandas structure should match (separate frame.py, series.py etc) but perhaps leave the tests where they are?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this becomes its proper subpackage, it would probably be good to give this a file docstring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think the wpandas subpackage should be completely independent from anesthetic, so we should make sure its tests do not rely on anesthetic imports.
    • Should channel_capacity be a wpandas function?
  • Should we have test subfolders splitting tests for anesthetic and wpandas?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, regarding the first point, channel_capacity is already moved to wpandas but imported in anesthetic.utils. We should use wpandas.utils in these tests instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have test subfolders splitting tests for anesthetic and wpandas?

Yes. I will do this (after we've decided on the degree of reorganisation)

Should channel_capacity be a wpandas function?

We could just use the scipy function np.exp(-entropy(weights))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htjb is considering a PR which updates neff to have both the kish and entropy 'method' options. Perhaps it would be better to simply replace channel_capacity with a less cryptically named function.

Copy link
Collaborator

@lukashergt lukashergt Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found out that Kish and entropy method are actually related, which is pretty cool and helped me understand where the cryptic channel capacity was actually coming from... See #285 (comment).

anesthetic/samples.py Outdated Show resolved Hide resolved
@htjb htjb mentioned this pull request Apr 20, 2023
6 tasks
@AdamOrmondroyd
Copy link
Collaborator

I was having a go at playing with the docs to see if I could get wpandas to work, just to get my head around sphinx. but I can't work out how to easily show the contents of e.g. wpandas/init.py, so that the api is clear.

@lukashergt
Copy link
Collaborator

I was having a go at playing with the docs to see if I could get wpandas to work, just to get my head around sphinx. but I can't work out how to easily show the contents of e.g. wpandas/init.py, so that the api is clear.

There is an issue in that we now have multiple parallel packages in one, which makes the auto-generated sphinx documentation not as straightforward... Where previously we would need to run

sphinx-apidoc -fM -t docs/templates/ -o docs/source/ anesthetic/

this won't work anymore, we now have anesthetic/ and wpandas/, two packages in parallel.

This makes me question whether this is really what we want... Maybe this is all overkill. If we do want the separation, then the proper way might be to create a new repository for wpandas, completely independent from anesthetic. Thoughts?

@htjb htjb mentioned this pull request Apr 21, 2023
6 tasks
@AdamOrmondroyd
Copy link
Collaborator

This makes me question whether this is really what we want... Maybe this is all overkill. If we do want the separation, then the proper way might be to create a new repository for wpandas, completely independent from anesthetic. Thoughts?

I think a separate wpandas repo really is overkill.

@AdamOrmondroyd
Copy link
Collaborator

@williamjameshandley @lukashergt are we still keen for this reorganisation? #282 depends on this, and I'm wary of falling behind pandas.

@williamjameshandley
Copy link
Collaborator Author

I think a separate wpandas repo really is overkill.

I agree.

@williamjameshandley @lukashergt are we still keen for this reorganisation?

Coming back to this I'm now also on the fence. @lukashergt do you want to cast a vote?

@lukashergt
Copy link
Collaborator

I think a separate wpandas repo really is overkill.

I agree.

@williamjameshandley @lukashergt are we still keen for this reorganisation?

Coming back to this I'm now also on the fence. @lukashergt do you want to cast a vote?

In many ways weighted, labelled data frames would deserve their own repository (or deserve being a pandas own thing), separate from anesthetic, but I agree that that is overkill for us. However, a wpandas subpackage as designed in this PR is on some levels (notably automated documentation) more complicated than a separate repository, and certainly more complicated than a wpandas module.

I am quite happy with the current weighted_pandas.py module, which has already come a long way from anesthetic 1.0.0, so my vote would go to leaving things the way they are for now. We can reflect on this again for a future anesthetic 3.0.0.

@williamjameshandley
Copy link
Collaborator Author

I am quite happy with the current weighted_pandas.py module, which has already come a long way from anesthetic 1.0.0, so my vote would go to leaving things the way they are for now. We can reflect on this again for a future anesthetic 3.0.0.

OK, let's leave it for now, and press ahead with the anesthetic 2.0.0 release.

@AdamOrmondroyd
Copy link
Collaborator

Just a couple of thoughts:

To butcher a classic quote: "premature modularity is the root of all evil". I agree that wpandas and lpandas, if they exist, would require their own repos. However, this would require three PRs every time e.g. pandas, matplotlib updates break something, and it's already a reasonable effort to keep up. I can also see it being confusing for new users to identify the cause of a bug if it lies in l/wpandas.

Modularity allows code to be reused. I think it is very unlikely that someone outside our circle would think or want to use l/wpandas, so until we have another project that would benefit from using them (perhaps the new C++ PolyChord could benefit from wpandas?), we should leave it as is.

@williamjameshandley williamjameshandley deleted the wpandas_reorg branch June 14, 2023 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants