Vectorize continuous color computation in pl.nodes() for large speedup (#48)

colganwi · claude · web-flow · commit 883a6ff262a1 · 2026-02-25T15:04:30.000-05:00
* Vectorize continuous color computation in pl.nodes()

The previous implementation applied the colormap per-element in a Python
list comprehension (O(n) interpreter overhead), then passed a list of RGBA
tuples as `color=` to `ax.scatter()`. For a tree with ~2000 nodes this loop
alone took ~70 ms.

Replace with a vectorized approach using `pd.Series.reindex` to align
values with the plotting order, `np.ma.masked_invalid` to handle missing
nodes, and a single bulk colormap call. This is semantically identical:
missing nodes still receive `na_color`, present nodes receive the same RGBA
values, and the colorbar legend is unchanged.

Benchmark on a balanced binary tree with 2047 nodes (1024 leaves):
  color computation:  70.5 ms → 0.22 ms  (317× faster)
  color computation (internal nodes only, n=1023): 34.5 ms → 0.64 ms (54× faster)

The fix applies via `_get_colors`, which is shared with `pl.branches`, so
branch coloring benefits as well.

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;

* Update test_get_colors_numeric for vectorized ndarray return type

_get_colors now returns an N×4 numpy array for continuous data instead of
a list of per-element tuples/strings. Update the test assertions accordingly:
- isinstance check: list → np.ndarray
- na_color check: string equality → np.testing.assert_allclose against mcolors.to_rgba

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/src/pycea/pl/_utils.py b/src/pycea/pl/_utils.py
@@ -365,7 +365,10 @@ def _get_colors(
     if data.dtype.kind in ["i", "f"]:  # Numeric
         norm = _get_norm(vmin=vmin, vmax=vmax, data=data)
         color_map = plt.get_cmap(cmap)
-        colors = [color_map(norm(data[i])) if i in data.index else na_color for i in indicies]
+        # Vectorized: reindex to align with indicies (NaN for missing), then apply colormap in bulk
+        values = data.reindex(indicies)
+        color_map.set_bad(na_color)
+        colors = color_map(norm(np.ma.masked_invalid(values.values.astype(float))))
         legend = _cbar_legend(key, color_map, norm)
         n_categories = 0
     else:  # Categorical
diff --git a/tests/test_plot_utils.py b/tests/test_plot_utils.py
@@ -240,9 +240,9 @@ def test_get_colors_numeric():
     data = pd.Series([0, 1, 2], index=["a", "b", "c"])
     indices = ["a", "b", "c", "d"]
     colors, legend, ncat = _get_colors(tdata, "num", data, indices, cmap="viridis")
-    assert isinstance(colors, list)
+    assert isinstance(colors, np.ndarray)
     assert len(colors) == 4
-    assert colors[-1] == "lightgrey"
+    np.testing.assert_allclose(colors[-1], mcolors.to_rgba("lightgrey"))
     assert ncat == 0
     assert isinstance(legend, dict)