Growing the basis using BIC #23

casv2 · 2023-09-20T13:25:01Z

Due to the simplicity of BIC and from experience it seems there's always a nice minimum. In terms of basis optimisation I think it may make sense to start from low polynomial degree and increase such that when the BIC increases near the minimum we terminate and use BIC optimal basis. Does this make sense?

I think it simplifies and speeds up the optimisation considerably and naturally adds complexity by growing the ACE basis as a function of data.

@bernstei Thoughts?

cortner · 2023-09-20T13:36:38Z

I am VERY interested in the basis growing thing and can help with that at the Julia end. Basically I could provide a function roughly like this:

newbasis = grow(oldbasis, ikeep, steps=2)

This would take the old basis, first reduce it to oldbasis[ikeep] then find all first and second neighbours (step = 2) in the lattice of basis functions and add them to the basis, create the newbasis and return it.

This would not be a big job for me, but I never had anybody to test it.

bernstei · 2023-09-20T13:48:02Z

I'm happy to try different optimization strategies.

wcwitt · 2023-09-20T14:02:20Z

Yuri mentioned they do something a bit like this. Underneath I think they create the big basis all at once and then iteratively unveil it

casv2 · 2023-09-23T14:56:38Z

@wcwitt "Unveiling iteratively" sounds an awful lot like forward stepwise regression? You start with the 'null' model and add features based on your favourite criterion (AIC, BIC, R²) and run until convergence. For us I think going the other way, backward stepwise regression, could be more effective. We typically have a fair bit of collinearity and recovering signal by going backwards appears to be more robust as you get to "decide" between features.

However, both methods appear to be quite dated and XGBoost seems to be the way to go nowadays. I'll experiment with it a bit more and see how it does. Using XGBoost we don't get to sample committees for free nor do we get the "zero mean" smoothness prior native to BRR/ARD. I think this means we'll have to rely more on our own smoothness priors. UQ predictions in XGBoost do seem a bit poor though, probably because it doesn't provide direct access to a mean and variance like in the Bayesian methods.

Perhaps a good middle ground is using the UQ estimates from the Bayesian solvers during training database assembly, and then use XGBoost to fit a final model.

bernstei · 2023-09-23T15:29:20Z

Is it clear the that boost ensemble doesn't provide good UQ?

cortner · 2023-09-23T15:49:23Z

For us I think going the other way, backward stepwise regression, could be more effective

I agree with that in principle. But in the past we found occasionally that "unveiling" in a physically inspired way led to smoother fits.

But I'm interested in growing rather than unveiling. This has two reasons: (1) performance especially for nonlinear models and iterative solvers and (2) because I want to learn the best way to truncate and this could mean growing far more deeply into some directions in basis space than others, which would not be covered by our sparse selection .

In practice right now I think shrinkage may well be the right thing to do.

casv2 · 2023-09-23T15:58:41Z

@bernstei No, I meant that UQ doesn't come natural to XGBoost unlike in the Bayesian methods (no analytical description of uncertainty). There are some extensions to XGBoost providing UQ estimates typically using some uncertainty calibration first. This may work very well for us, I guess we'll have to try out.

But then there is also the occasional source claiming that ARD outperforms XGBoost in terms of raw (test) accuracy. It's probably likely that you can always find counterexamples depending on the database and how long you're willing to fiddle.

@cortner I agree that growing sounds much more appealing, especially considering nonlinear fitting costs. I do wonder how deeply we get to grow the (current) basis in practice though. We're already exploring fairly high degree polynomials (14-16) using our current "backward" methods and I'm not sure there's a lot to gain by going higher. I think that 'message-passing'-like features (hopping basically) actually carry much more "relevance" in defining the PES.

cortner · 2023-09-23T16:01:21Z

We’re you kn any of the grace discussions? The festure selection is much more complex there and the same principle applies. I’m thinking beyond our simple linear models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Growing the basis using BIC #23

Growing the basis using BIC #23

casv2 commented Sep 20, 2023 •

edited

Loading

cortner commented Sep 20, 2023

bernstei commented Sep 20, 2023

wcwitt commented Sep 20, 2023

casv2 commented Sep 23, 2023 •

edited

Loading

bernstei commented Sep 23, 2023

cortner commented Sep 23, 2023 •

edited

Loading

casv2 commented Sep 23, 2023 •

edited

Loading

cortner commented Sep 23, 2023

Growing the basis using BIC #23

Growing the basis using BIC #23

Comments

casv2 commented Sep 20, 2023 • edited Loading

cortner commented Sep 20, 2023

bernstei commented Sep 20, 2023

wcwitt commented Sep 20, 2023

casv2 commented Sep 23, 2023 • edited Loading

bernstei commented Sep 23, 2023

cortner commented Sep 23, 2023 • edited Loading

casv2 commented Sep 23, 2023 • edited Loading

cortner commented Sep 23, 2023

casv2 commented Sep 20, 2023 •

edited

Loading

casv2 commented Sep 23, 2023 •

edited

Loading

cortner commented Sep 23, 2023 •

edited

Loading

casv2 commented Sep 23, 2023 •

edited

Loading