Refactor beeswarm #254

cvanelteren · 2025-06-11T01:03:46Z

This PR is a major rework of the beeswarm functionality that was recently merged to main. While testing the newly added implementation with examples from the SHAP library, I discovered that our current approach doesn't integrate well with their plotting workflows and becomes computationally expensive for larger datasets. The former implementation processed each point individually to reduce overlap, which scales poorly as dataset size increases.

This rework improves performance by vectorizing the overlap reduction function, replacing the previous O(n²) per-point approach with vectorized operations that scale better with large datasets. Our implementation remains intentionally barebones compared to the SHAP library's more sophisticated feature clustering capabilities. For users requiring advanced clustering features, we recommend the specialized SHAP library, which can achieve similar visual results with additional functionality.

The docstring is also corrected to be in line with the signature of the function. Lastly, the tests are updated to reflect the changes in the api and rely more on internal functions.

snippet

import ultraplot as uplt, numpy as np
import xgboost, shap

# # # train XGBoost model
X, y = shap.datasets.adult()
model = xgboost.XGBClassifier().fit(X, y)

# # # compute SHAP values
explainer = shap.Explainer(model, X)
shap_values = explainer(X)
# %%
print(shap_values.data.shape)
c = shap_values.data
c = (c - c.min(0)[None]) / (c.max(0) - c.min(0))[None]

fig, ax = uplt.subplots()
ax.beeswarm(shap_values.values, feature_values=c, cmap="burd", ss=1)
ax.format(
    xlabel="SHAP values",
    yticks=np.arange(c.shape[1]),
    yticklabels=shap_values.feature_names,
)

uplt.show(block=1)

Copilot

Pull Request Overview

This PR refactors the beeswarm functionality to improve performance and scalability by replacing the per-point collision detection with a vectorized, histogram-based overlap reduction technique. Key changes include a revised API for beeswarm plotting, updated documentation reflecting the new parameters, and modified tests to validate the new implementation.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
ultraplot/tests/test_1dplots.py	Updated test inputs to match the new vectorized beeswarm API
ultraplot/axes/plot.py	Refactored beeswarm implementation and docstring with new parameters, including vectorized binning and parameter renaming

Comments suppressed due to low confidence (1)

ultraplot/axes/plot.py:3510

The marker size parameter is named 'ss' in _apply_beeswarm, yet the tests provide a 'sizes' argument. Please standardize the parameter naming and update the documentation to ensure consistency across the API.

ss: float | np.ndarray = None,

cvanelteren · 2025-06-11T01:07:39Z

For reference this is the plot generated from the same dataset in the shap library:

cvanelteren · 2025-06-11T07:30:02Z

Note to self I may want to change the max lineage to be slightly less than half the level width.

beckermr

code looks ok but the unit tests are failing.

cvanelteren · 2025-06-11T17:38:21Z

Unittests are a bit different visually but act similarly.

cvanelteren · 2025-06-11T17:39:45Z

That is, I was not expecting them to behave the same as on main I computed the colours based on the height, which is now handled implicitly through scatter functionality.

beckermr · 2025-06-11T20:03:00Z

No there is an actual error

FAILED ultraplot/tests/test_1dplots.py::test_beeswarm - AttributeError: PathCollection.set() got an unexpected keyword argument 'size'

cvanelteren · 2025-06-12T06:04:26Z

Ah ok will fix.

codecov · 2025-06-12T08:16:02Z

Codecov Report

Attention: Patch coverage is 96.15385% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
ultraplot/axes/plot.py	95.12%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

cvanelteren · 2025-06-12T09:17:38Z

Should be good now. When merged I will draft a release.

beckermr

Something is definitely not working with the new code. See the plot difference:

cvanelteren · 2025-06-12T14:35:06Z

I changed the test. The colors were first inferred and now delegated to scatter so it follow the prop cycle.

cvanelteren · 2025-06-12T14:37:01Z

Will need to change the bottom left but the rest looks fine to me.

beckermr · 2025-06-12T14:41:02Z

Did you check that the new and old code give consistent results? Changing the code and the test means that we cannot tell.

cvanelteren · 2025-06-12T14:45:48Z

I can check it; the reason why i changed is to ensure that the prop cycle is used as it's more consistent with the rest of the api. I can temp change it to confirm.

cvanelteren · 2025-06-12T16:03:43Z

The test is visualized next to the mocked top left plot using this PR. The original has slightly more jitter since we were modifying the x and y value randomly from the current position, here we just modify the x or y value (depending on the orientation, for this example just x).

I will adjust the test to mimic the original and call it a day.

cvanelteren · 2025-06-13T09:45:41Z

Drafting a release now introducing the new feature.

cvanelteren added 8 commits June 11, 2025 01:51

Update doc string

3c06d95

vectorize function

c0e1d86

vectorize function

d446b69

update signatures

5643b11

change order of keywords according to use

3a395f3

Clean-up internals

65e1aae

adjust test

ae2282a

mv logic away from plotting

Loading
Loading status checks…

4389cc4

cvanelteren requested a review from Copilot June 11, 2025 01:03

Copilot AI reviewed Jun 11, 2025

View reviewed changes

cvanelteren requested a review from beckermr June 11, 2025 01:08

cvanelteren added this to the v1.52 milestone Jun 11, 2025

cvanelteren self-assigned this Jun 11, 2025

cvanelteren added the enhancement label Jun 11, 2025

beckermr requested changes Jun 11, 2025

View reviewed changes

change keyword in tests

Loading
Loading status checks…

13302f0

add small space between the layers

Loading
Loading status checks…

1f93c75

cvanelteren requested a review from beckermr June 12, 2025 09:17

cvanelteren added 2 commits June 12, 2025 11:18

do the space in fractions

Loading
Loading status checks…

f922c48

Merge branch 'main' into hotfix-beeswarm

Loading
Loading status checks…

a0f878b

beckermr requested changes Jun 12, 2025

View reviewed changes

Merge branch 'main' into hotfix-beeswarm

Loading
Loading status checks…

19067bc

simplify test

Loading
Loading status checks…

271f949

cvanelteren enabled auto-merge (squash) June 12, 2025 18:09

beckermr disabled auto-merge June 12, 2025 19:34

beckermr merged commit 44df33a into Ultraplot:main Jun 12, 2025
10 of 14 checks passed

Refactor beeswarm #254

Refactor beeswarm #254

Conversation

cvanelteren commented Jun 11, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

cvanelteren commented Jun 11, 2025

Uh oh!

cvanelteren commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beckermr left a comment

Choose a reason for hiding this comment

Uh oh!

cvanelteren commented Jun 11, 2025

Uh oh!

cvanelteren commented Jun 11, 2025

Uh oh!

beckermr commented Jun 11, 2025

Uh oh!

cvanelteren commented Jun 12, 2025

Uh oh!

Uh oh!

codecov bot commented Jun 12, 2025

Codecov Report

Uh oh!

Uh oh!

cvanelteren commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

beckermr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cvanelteren commented Jun 12, 2025

Uh oh!

cvanelteren commented Jun 12, 2025

Uh oh!

beckermr commented Jun 12, 2025

Uh oh!

cvanelteren commented Jun 12, 2025

Uh oh!

cvanelteren commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

cvanelteren commented Jun 13, 2025

Uh oh!

cvanelteren commented Jun 11, 2025 •

edited

Loading