Port to MLJ? #9

azev77 · 2021-06-07T18:08:42Z

Hey and thank you for this package!
I've been hoping for a CatBoost interface for a while!!

Have you considered porting this to MLJ.jl?
This would be an awesome addition as they currently support XGBoost & LightGBM.
@ablaom @tlienart

BTW, I noticed some Julia wrappers wrap ML models in high level code (like Python/R).
Other wrappers wrap the underlying low level code (eg GLMNet.jl wraps the Fortran code from glmnet.R).
Wrapping the underlying CatBoost code would prob be a pain, but would there be a performance difference?

ablaom · 2021-06-08T00:11:22Z

For the record, there is also an MLJ interface for EvoTrees.jl, another pure julia implementation of gradient tree boosting. So this should make a good template for adding an MLJ interface to CatBoost.jl, I would expect. It includes, for example, an appropriate implementation of MLJ's update method, which makes "warm restarts" possible, and allows one to wrap these models in an iterative control strategy (eg, implement early stopping based on out-of-sample losses).

cc @jeremiedb

femtomc · 2021-06-25T23:26:48Z

Hi everyone -- thanks for the interest.

I suspect wrapping the low level code will be a pain. In terms of performance, of course a native wrapper would be faster than calling through the Python runtime -- but the performance penalty incurred by calling through the runtime should be negligible versus the time it takes for a model to train, etc. So we have no intention of trying to wrap the native C++ code (if CatBoost offers a C API -- this may change, although IIRC I don't think they export a C API).

We considered implementing the MLJ interface previously -- but ultimately decided that the way CatBoost does things and the way that MLJ does things are different enough that the impedance mismatch was not worth seriously trying to fix given our priorities. Our perspective then changed: this CatBoost.jl package would be a pure wrapper package -- and if someone wants to implement a MLJCatboost.jl package -- we would welcome it.

In particular, one point -- considering https://alan-turing-institute.github.io/MLJ.jl/dev/quick_start_guide_to_adding_models/#Model-type-and-constructor (the process of fitting with MLJ) --

Compare this to the (essentially API restricted) way of fitting CatBoost models:

# Create pools.
train = Pool(; data=x_train, label=y_train, group_id=queries_train)
test = Pool(; data=x_test, label=y_test, group_id=queries_test)

# small number of iterations to not slow down CI too much
default_parameters = Dict("iterations" => 10, "loss_function" => "RMSE",
                          "custom_metric" => ["MAP:top=10", "PrecisionAt:top=10",
                                              "RecallAt:top=10"], "verbose" => false,
                          "random_seed" => 314159)

function fit_model(params, train_pool, test_pool)
    model = catboost.CatBoost(params)
    model.fit(train_pool; eval_set=test_pool, plot=false)
    return model
end

Hyperparameters are passed over the line in Dict form to the Python runtime -- and there's a very large number of them available for customization by the user. So supporting a generic mutable CatBoostModel struct which satisfies the MLJ interfaces seemed more restrictive than just exposing this API to the user here.

Again, if either of you are interested in creating an MLJCatBoost wrapper library -- we would welcome it! But we are not prioritizing it.

Thank you.

ablaom · 2021-06-28T01:40:43Z

Comment to self: This is not a pure Julia implementation, but a wrap of python code (wrapping C, presumably).

ericphanson · 2021-06-28T01:50:31Z

Yep, a popular C++ library: https://github.com/catboost/catboost (if it were C, we might try to wrap it directly instead of its python interface). This is a pretty minimal wrapper that just uses PyCall and tries to make it a bit more convenient to send/receive tabular data.

ericphanson · 2022-09-11T14:12:29Z

I’d be interested in adding an MLJ interface directly to CatBoost.jl here; I think it would add a lot of value. I bet we can find a way to make the interfaces work.

azev77 · 2022-10-03T02:03:36Z

Any progress?

ericphanson · 2022-10-03T11:02:27Z

No, sorry. I was writing a new model and was thinking about doing it with CatBoost but ended up going with XGBoost after a quick check showed similar perf in this case. (I have seen CatBoost do noticeably better in other cases though). Hopefully we can find time to do it at some point, but for now it's not a priority for me.

ablaom · 2022-10-05T01:56:56Z

BTW, it looks like XGBoost.jl is getting a well-needed rewrite. dmlc/XGBoost.jl#111

🤞🏾

ericphanson · 2023-02-02T22:12:19Z

Closed by #16

v0.3.0 will have MLJ integration thanks to @tylerjthomas9 and @ablaom !

azev77 · 2023-02-04T00:19:07Z

It would be great if the MLJ docs were updated to reflect this.

ablaom · 2023-02-04T22:02:44Z

Will happen when I update the model registry shortly. I'll re-open this to flag this hasn't happened yet.

ablaom · 2023-02-04T22:03:15Z

Oh, I can't reopen. I'll create the issue at MLJModels now instead.

tylerjthomas9 mentioned this issue Oct 18, 2022

MLJ Integration #16

Merged

ablaom mentioned this issue Nov 23, 2022

Please add CatBoost or any alternate package (pure Julia) which can beat it JuliaAI/MLJ.jl#992

Closed

ericphanson closed this as completed Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port to MLJ? #9

Port to MLJ? #9

azev77 commented Jun 7, 2021

ablaom commented Jun 8, 2021

femtomc commented Jun 25, 2021 •

edited

Loading

ablaom commented Jun 28, 2021

ericphanson commented Jun 28, 2021 •

edited

Loading

ericphanson commented Sep 11, 2022

azev77 commented Oct 3, 2022

ericphanson commented Oct 3, 2022

ablaom commented Oct 5, 2022

ericphanson commented Feb 2, 2023

azev77 commented Feb 4, 2023

ablaom commented Feb 4, 2023

ablaom commented Feb 4, 2023

Port to MLJ? #9

Port to MLJ? #9

Comments

azev77 commented Jun 7, 2021

ablaom commented Jun 8, 2021

femtomc commented Jun 25, 2021 • edited Loading

ablaom commented Jun 28, 2021

ericphanson commented Jun 28, 2021 • edited Loading

ericphanson commented Sep 11, 2022

azev77 commented Oct 3, 2022

ericphanson commented Oct 3, 2022

ablaom commented Oct 5, 2022

ericphanson commented Feb 2, 2023

azev77 commented Feb 4, 2023

ablaom commented Feb 4, 2023

ablaom commented Feb 4, 2023

femtomc commented Jun 25, 2021 •

edited

Loading

ericphanson commented Jun 28, 2021 •

edited

Loading