-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Growing the basis using BIC #23
Comments
I am VERY interested in the basis growing thing and can help with that at the Julia end. Basically I could provide a function roughly like this:
This would take the old basis, first reduce it to This would not be a big job for me, but I never had anybody to test it. |
I'm happy to try different optimization strategies. |
Yuri mentioned they do something a bit like this. Underneath I think they create the big basis all at once and then iteratively unveil it |
@wcwitt "Unveiling iteratively" sounds an awful lot like forward stepwise regression? You start with the 'null' model and add features based on your favourite criterion (AIC, BIC, R²) and run until convergence. For us I think going the other way, backward stepwise regression, could be more effective. We typically have a fair bit of collinearity and recovering signal by going backwards appears to be more robust as you get to "decide" between features. However, both methods appear to be quite dated and XGBoost seems to be the way to go nowadays. I'll experiment with it a bit more and see how it does. Using XGBoost we don't get to sample committees for free nor do we get the "zero mean" smoothness prior native to BRR/ARD. I think this means we'll have to rely more on our own smoothness priors. UQ predictions in XGBoost do seem a bit poor though, probably because it doesn't provide direct access to a mean and variance like in the Bayesian methods. Perhaps a good middle ground is using the UQ estimates from the Bayesian solvers during training database assembly, and then use XGBoost to fit a final model. |
Is it clear the that boost ensemble doesn't provide good UQ? |
I agree with that in principle. But in the past we found occasionally that "unveiling" in a physically inspired way led to smoother fits. But I'm interested in growing rather than unveiling. This has two reasons: (1) performance especially for nonlinear models and iterative solvers and (2) because I want to learn the best way to truncate and this could mean growing far more deeply into some directions in basis space than others, which would not be covered by our sparse selection . In practice right now I think shrinkage may well be the right thing to do. |
@bernstei No, I meant that UQ doesn't come natural to XGBoost unlike in the Bayesian methods (no analytical description of uncertainty). There are some extensions to XGBoost providing UQ estimates typically using some uncertainty calibration first. This may work very well for us, I guess we'll have to try out. But then there is also the occasional source claiming that ARD outperforms XGBoost in terms of raw (test) accuracy. It's probably likely that you can always find counterexamples depending on the database and how long you're willing to fiddle. @cortner I agree that growing sounds much more appealing, especially considering nonlinear fitting costs. I do wonder how deeply we get to grow the (current) basis in practice though. We're already exploring fairly high degree polynomials (14-16) using our current "backward" methods and I'm not sure there's a lot to gain by going higher. I think that 'message-passing'-like features (hopping basically) actually carry much more "relevance" in defining the PES. |
We’re you kn any of the grace discussions? The festure selection is much more complex there and the same principle applies. I’m thinking beyond our simple linear models. |
Due to the simplicity of BIC and from experience it seems there's always a nice minimum. In terms of basis optimisation I think it may make sense to start from low polynomial degree and increase such that when the BIC increases near the minimum we terminate and use BIC optimal basis. Does this make sense?
I think it simplifies and speeds up the optimisation considerably and naturally adds complexity by growing the ACE basis as a function of data.
@bernstei Thoughts?
The text was updated successfully, but these errors were encountered: