-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][R] Drop support for CV. #11027
Conversation
cc @david-cortes @mayer79 Please let me know what do you think |
I actually love |
@mayer79 Do you think the cv split (resampling) in |
I'm open to suggestions. We can keep the CV as it is and start out a new one if needed. Would love to learn more about your opinions. |
Side remark: An example how {mlr3} seems to tackle grid search CV with early stopping. I don't know yet whether it is much slower than library(mlr3verse)
library(mlr3tuning)
library(shapviz)
set.seed(2)
task <- as_task_regr(iris[1:4], target="Sepal.Length")
lrn_xgb <- lrn("regr.xgboost")
split <- partition(task, ratio = 0.8)
search_space = ps(
eta = p_dbl(lower = 0.05, upper = 0.2),
min_child_weight = p_dbl(lower = 1, upper = 10),
subsample = p_dbl(lower = 0.7, upper = 1),
colsample_bylevel = p_dbl(lower = 0.7, upper = 1),
nrounds = p_int(lower = 1, upper = 1000)
)
at = auto_tuner(
tuner = tnr("random_search", batch_size = 1),
learner = lrn_xgb,
resampling = rsmp("cv", folds = 5),
measure = msr("regr.mse"),
search_space = search_space,
terminator = trm("stagnation")
)
at$train(task, row_ids = split$train)
# colsample_bylevel=0.9876, eta=0.06583, min_child_weight=4.939, nrounds=508, nthread=1, subsample=0.7116,
at$learner
xvars <- at$learner$model$feature_names
sv <- shapviz(at$learner$model, data.matrix(iris[xvars]))
sv_dependence(sv, v = xvars) |
I'd say there's no need to drop it.
It's also quite useful if one wants to use it with xgboost-specific functionality, such as a custom objectives, callbacks, base scores, etc. which are not easy to integrate in higher-level frameworks. It would be ideal to have a new more idiomatic CV function working with R objects like data frames, but it shouln't phase out |
Thank you for the suggestions, will close this PR. |
Thanks to the revised R interface (#9810), we now have a significantly improved training interface. I think we should restrict the new R package to that interface to reduce the scope of this package for the next release. There are excellent choices like the
mlr3
package, which does a far better job than the built-in CV function in XGBoost. Also, XGBoost already enjoys a good integration with it through its learner repository.I plan to add the following features to the mlr3 integration after the next release is officially out (I might put a WIP PR there before we make it to CRAN).
factor
.device
parameter.In addition, I will put a vignette in XGBoost to share the integration with wider audiences.
This PR drops the implementation of the CV function in XGBoost. This function hasn't been the main maintenance target for a while. We can bring it back in the future with a revised interface and support for sharing quantile cuts. But for the next release, it's best if we can reduce the number of legacy interfaces to the minimum, giving us more room to develop the new interface. It's really difficult to make any change to a package once it's submitted to CRAN.