Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Growing the basis using BIC #23

Open
casv2 opened this issue Sep 20, 2023 · 8 comments
Open

Growing the basis using BIC #23

casv2 opened this issue Sep 20, 2023 · 8 comments

Comments

@casv2
Copy link
Collaborator

casv2 commented Sep 20, 2023

Due to the simplicity of BIC and from experience it seems there's always a nice minimum. In terms of basis optimisation I think it may make sense to start from low polynomial degree and increase such that when the BIC increases near the minimum we terminate and use BIC optimal basis. Does this make sense?

I think it simplifies and speeds up the optimisation considerably and naturally adds complexity by growing the ACE basis as a function of data.

@bernstei Thoughts?

@cortner
Copy link
Member

cortner commented Sep 20, 2023

I am VERY interested in the basis growing thing and can help with that at the Julia end. Basically I could provide a function roughly like this:

newbasis = grow(oldbasis, ikeep, steps=2)

This would take the old basis, first reduce it to oldbasis[ikeep] then find all first and second neighbours (step = 2) in the lattice of basis functions and add them to the basis, create the newbasis and return it.

This would not be a big job for me, but I never had anybody to test it.

@bernstei
Copy link
Collaborator

I'm happy to try different optimization strategies.

@wcwitt
Copy link

wcwitt commented Sep 20, 2023

Yuri mentioned they do something a bit like this. Underneath I think they create the big basis all at once and then iteratively unveil it

@casv2
Copy link
Collaborator Author

casv2 commented Sep 23, 2023

@wcwitt "Unveiling iteratively" sounds an awful lot like forward stepwise regression? You start with the 'null' model and add features based on your favourite criterion (AIC, BIC, R²) and run until convergence. For us I think going the other way, backward stepwise regression, could be more effective. We typically have a fair bit of collinearity and recovering signal by going backwards appears to be more robust as you get to "decide" between features.

However, both methods appear to be quite dated and XGBoost seems to be the way to go nowadays. I'll experiment with it a bit more and see how it does. Using XGBoost we don't get to sample committees for free nor do we get the "zero mean" smoothness prior native to BRR/ARD. I think this means we'll have to rely more on our own smoothness priors. UQ predictions in XGBoost do seem a bit poor though, probably because it doesn't provide direct access to a mean and variance like in the Bayesian methods.

Perhaps a good middle ground is using the UQ estimates from the Bayesian solvers during training database assembly, and then use XGBoost to fit a final model.

@bernstei
Copy link
Collaborator

Is it clear the that boost ensemble doesn't provide good UQ?

@cortner
Copy link
Member

cortner commented Sep 23, 2023

For us I think going the other way, backward stepwise regression, could be more effective

I agree with that in principle. But in the past we found occasionally that "unveiling" in a physically inspired way led to smoother fits.

But I'm interested in growing rather than unveiling. This has two reasons: (1) performance especially for nonlinear models and iterative solvers and (2) because I want to learn the best way to truncate and this could mean growing far more deeply into some directions in basis space than others, which would not be covered by our sparse selection .

In practice right now I think shrinkage may well be the right thing to do.

@casv2
Copy link
Collaborator Author

casv2 commented Sep 23, 2023

@bernstei No, I meant that UQ doesn't come natural to XGBoost unlike in the Bayesian methods (no analytical description of uncertainty). There are some extensions to XGBoost providing UQ estimates typically using some uncertainty calibration first. This may work very well for us, I guess we'll have to try out.

But then there is also the occasional source claiming that ARD outperforms XGBoost in terms of raw (test) accuracy. It's probably likely that you can always find counterexamples depending on the database and how long you're willing to fiddle.

@cortner I agree that growing sounds much more appealing, especially considering nonlinear fitting costs. I do wonder how deeply we get to grow the (current) basis in practice though. We're already exploring fairly high degree polynomials (14-16) using our current "backward" methods and I'm not sure there's a lot to gain by going higher. I think that 'message-passing'-like features (hopping basically) actually carry much more "relevance" in defining the PES.

@cortner
Copy link
Member

cortner commented Sep 23, 2023

We’re you kn any of the grace discussions? The festure selection is much more complex there and the same principle applies. I’m thinking beyond our simple linear models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants