[WIP][R] Drop support for CV. #11027

trivialfis · 2024-11-29T08:18:06Z

Thanks to the revised R interface (#9810), we now have a significantly improved training interface. I think we should restrict the new R package to that interface to reduce the scope of this package for the next release. There are excellent choices like the mlr3 package, which does a far better job than the built-in CV function in XGBoost. Also, XGBoost already enjoys a good integration with it through its learner repository.

I plan to add the following features to the mlr3 integration after the next release is officially out (I might put a WIP PR there before we make it to CRAN).

QuantileDMatrix
Quantile regression
Support for factor.
The device parameter.

In addition, I will put a vignette in XGBoost to share the integration with wider audiences.

This PR drops the implementation of the CV function in XGBoost. This function hasn't been the main maintenance target for a while. We can bring it back in the future with a revised interface and support for sharing quantile cuts. But for the next release, it's best if we can reduce the number of legacy interfaces to the minimum, giving us more room to develop the new interface. It's really difficult to make any change to a package once it's submitted to CRAN.

trivialfis · 2024-11-29T08:18:31Z

cc @david-cortes @mayer79 Please let me know what do you think

mayer79 · 2024-11-29T08:56:51Z

I actually love xgb.cv() and would greatly miss it. I use it throughout all my lecture notes and tutorials. Let's see what David means.

trivialfis · 2024-11-29T09:08:58Z

@mayer79 Do you think the cv split (resampling) in mlr3 can be a suitable replacement? If not, do you mind if I make some breaking changes to the cv function?

trivialfis · 2024-11-29T09:16:17Z

I'm open to suggestions. We can keep the CV as it is and start out a new one if needed. Would love to learn more about your opinions.

mayer79 · 2024-11-29T15:23:43Z

Side remark: An example how {mlr3} seems to tackle grid search CV with early stopping. I don't know yet whether it is much slower than xgb.cv().

library(mlr3verse)
library(mlr3tuning)
library(shapviz)

set.seed(2)

task <- as_task_regr(iris[1:4], target="Sepal.Length")
lrn_xgb <- lrn("regr.xgboost")
split <- partition(task, ratio = 0.8)

search_space = ps(
  eta = p_dbl(lower = 0.05, upper = 0.2),
  min_child_weight = p_dbl(lower = 1, upper = 10),
  subsample = p_dbl(lower = 0.7, upper = 1),
  colsample_bylevel = p_dbl(lower = 0.7, upper = 1),
  nrounds = p_int(lower = 1, upper = 1000)
)

at = auto_tuner(
  tuner = tnr("random_search", batch_size = 1),
  learner = lrn_xgb,
  resampling = rsmp("cv", folds = 5),
  measure = msr("regr.mse"),
  search_space = search_space,
  terminator = trm("stagnation")
)

at$train(task, row_ids = split$train)

# colsample_bylevel=0.9876, eta=0.06583, min_child_weight=4.939, nrounds=508, nthread=1, subsample=0.7116,
at$learner
xvars <- at$learner$model$feature_names

sv <- shapviz(at$learner$model, data.matrix(iris[xvars]))
sv_dependence(sv, v = xvars)

david-cortes · 2024-11-29T19:36:52Z

I'd say there's no need to drop it.

xgb.cv works with xgboost's DMatrix objects and supports functionalities which are not in the scope of frameworks like mlr3, such as learning-to-rank.

It's also quite useful if one wants to use it with xgboost-specific functionality, such as a custom objectives, callbacks, base scores, etc. which are not easy to integrate in higher-level frameworks.

It would be ideal to have a new more idiomatic CV function working with R objects like data frames, but it shouln't phase out xgb.cv, just like xgboost() doesn't phase out xgb.train, as they aren't 1-to-1 replacements.

trivialfis · 2024-12-02T08:47:40Z

Thank you for the suggestions, will close this PR.

[WIP][R] Drop support for CV.

4f4b478

trivialfis marked this pull request as draft November 29, 2024 08:18

trivialfis closed this Dec 2, 2024

trivialfis deleted the r-drop-cv branch December 2, 2024 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][R] Drop support for CV. #11027

[WIP][R] Drop support for CV. #11027

trivialfis commented Nov 29, 2024

trivialfis commented Nov 29, 2024

mayer79 commented Nov 29, 2024

trivialfis commented Nov 29, 2024 •

edited

Loading

trivialfis commented Nov 29, 2024

mayer79 commented Nov 29, 2024 •

edited

Loading

david-cortes commented Nov 29, 2024

trivialfis commented Dec 2, 2024

[WIP][R] Drop support for CV. #11027

[WIP][R] Drop support for CV. #11027

Conversation

trivialfis commented Nov 29, 2024

trivialfis commented Nov 29, 2024

mayer79 commented Nov 29, 2024

trivialfis commented Nov 29, 2024 • edited Loading

trivialfis commented Nov 29, 2024

mayer79 commented Nov 29, 2024 • edited Loading

david-cortes commented Nov 29, 2024

trivialfis commented Dec 2, 2024

trivialfis commented Nov 29, 2024 •

edited

Loading

mayer79 commented Nov 29, 2024 •

edited

Loading