-
Notifications
You must be signed in to change notification settings - Fork 11
Refactor beeswarm #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor beeswarm #254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the beeswarm functionality to improve performance and scalability by replacing the per-point collision detection with a vectorized, histogram-based overlap reduction technique. Key changes include a revised API for beeswarm plotting, updated documentation reflecting the new parameters, and modified tests to validate the new implementation.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
ultraplot/tests/test_1dplots.py | Updated test inputs to match the new vectorized beeswarm API |
ultraplot/axes/plot.py | Refactored beeswarm implementation and docstring with new parameters, including vectorized binning and parameter renaming |
Comments suppressed due to low confidence (1)
ultraplot/axes/plot.py:3510
- The marker size parameter is named 'ss' in _apply_beeswarm, yet the tests provide a 'sizes' argument. Please standardize the parameter naming and update the documentation to ensure consistency across the API.
ss: float | np.ndarray = None,
Note to self I may want to change the max lineage to be slightly less than half the level width. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks ok but the unit tests are failing.
Unittests are a bit different visually but act similarly. |
That is, I was not expecting them to behave the same as on main I computed the colours based on the height, which is now handled implicitly through scatter functionality. |
No there is an actual error
|
Ah ok will fix. |
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
Should be good now. When merged I will draft a release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the test. The colors were first inferred and now delegated to scatter so it follow the prop cycle. |
Will need to change the bottom left but the rest looks fine to me. |
Did you check that the new and old code give consistent results? Changing the code and the test means that we cannot tell. |
I can check it; the reason why i changed is to ensure that the prop cycle is used as it's more consistent with the rest of the api. I can temp change it to confirm. |
Drafting a release now introducing the new feature. |
This PR is a major rework of the
beeswarm
functionality that was recently merged to main. While testing the newly added implementation with examples from the SHAP library, I discovered that our current approach doesn't integrate well with their plotting workflows and becomes computationally expensive for larger datasets. The former implementation processed each point individually to reduce overlap, which scales poorly as dataset size increases.This rework improves performance by vectorizing the overlap reduction function, replacing the previous O(n²) per-point approach with vectorized operations that scale better with large datasets. Our implementation remains intentionally barebones compared to the SHAP library's more sophisticated feature clustering capabilities. For users requiring advanced clustering features, we recommend the specialized SHAP library, which can achieve similar visual results with additional functionality.
The docstring is also corrected to be in line with the signature of the function. Lastly, the tests are updated to reflect the changes in the api and rely more on internal functions.
snippet