Redundancy / correlation in feature set #1116

nmoynihan · 2026-01-28T13:50:48Z

nmoynihan
Jan 28, 2026

Hi all,

I’m using PySR on a problem where I expect the true relationship (known) to have a structured form like

$$y = \sum_i f_i(x) g_i(x)$$

where the g_i are computed feature columns and f_i depend on a small set of derived variables. The issue is that the feature library is highly redundant: many g_i columns are strongly correlated / nearly linearly dependent across samples due to built-in constraints, so there are lots of near-equivalent representations of y.

Empirically, PySR finds a clean expression quickly if I hand it a minimal feature set, but with a larger “unbiased” redundant library it tends to wander into complicated expressions or not converge.

Is this kind of problem well suited to PySR? If so, how could I improve things? If I chuck it on a cluster for a week is that likely to yield good results?

I've tested various amounts of scaling/normalization, pruning via correlation/SVD, and using ExpressionSpec/templates to enforce linearity in a subset of features, but convergence still seems difficult.

Thanks!

MilesCranmer · 2026-01-28T18:54:30Z

MilesCranmer
Jan 28, 2026
Maintainer

One idea is to do a shapley value analysis with some faster model like XGBoost, or maybe use XGBoost and find a minimal set of features such that loss still matches the loss given all the features, and then run SR on the features selected from that?

You can in principle use PySR on this directly, it's just harder. It's a combinatorics problem after all. So maybe increase the search size (number of populations, size of the populations, maxsize, ncycles_per_iteration, etc.) and run for longer.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundancy / correlation in feature set #1116

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Redundancy / correlation in feature set #1116

Uh oh!

nmoynihan Jan 28, 2026

Replies: 1 comment

Uh oh!

Uh oh!

MilesCranmer Jan 28, 2026 Maintainer

nmoynihan
Jan 28, 2026

MilesCranmer
Jan 28, 2026
Maintainer