-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port to MLJ? #9
Comments
For the record, there is also an MLJ interface for EvoTrees.jl, another pure julia implementation of gradient tree boosting. So this should make a good template for adding an MLJ interface to CatBoost.jl, I would expect. It includes, for example, an appropriate implementation of MLJ's cc @jeremiedb |
Hi everyone -- thanks for the interest. I suspect wrapping the low level code will be a pain. In terms of performance, of course a native wrapper would be faster than calling through the Python runtime -- but the performance penalty incurred by calling through the runtime should be negligible versus the time it takes for a model to train, etc. So we have no intention of trying to wrap the native C++ code (if CatBoost offers a C API -- this may change, although IIRC I don't think they export a C API). We considered implementing the MLJ interface previously -- but ultimately decided that the way CatBoost does things and the way that MLJ does things are different enough that the impedance mismatch was not worth seriously trying to fix given our priorities. Our perspective then changed: this CatBoost.jl package would be a pure wrapper package -- and if someone wants to implement a In particular, one point -- considering https://alan-turing-institute.github.io/MLJ.jl/dev/quick_start_guide_to_adding_models/#Model-type-and-constructor (the process of fitting with MLJ) -- Compare this to the (essentially API restricted) way of fitting CatBoost models: # Create pools.
train = Pool(; data=x_train, label=y_train, group_id=queries_train)
test = Pool(; data=x_test, label=y_test, group_id=queries_test)
# small number of iterations to not slow down CI too much
default_parameters = Dict("iterations" => 10, "loss_function" => "RMSE",
"custom_metric" => ["MAP:top=10", "PrecisionAt:top=10",
"RecallAt:top=10"], "verbose" => false,
"random_seed" => 314159)
function fit_model(params, train_pool, test_pool)
model = catboost.CatBoost(params)
model.fit(train_pool; eval_set=test_pool, plot=false)
return model
end Hyperparameters are passed over the line in Again, if either of you are interested in creating an Thank you. |
Comment to self: This is not a pure Julia implementation, but a wrap of python code (wrapping C, presumably). |
Yep, a popular C++ library: https://github.com/catboost/catboost (if it were C, we might try to wrap it directly instead of its python interface). This is a pretty minimal wrapper that just uses PyCall and tries to make it a bit more convenient to send/receive tabular data. |
I’d be interested in adding an MLJ interface directly to CatBoost.jl here; I think it would add a lot of value. I bet we can find a way to make the interfaces work. |
Any progress? |
No, sorry. I was writing a new model and was thinking about doing it with CatBoost but ended up going with XGBoost after a quick check showed similar perf in this case. (I have seen CatBoost do noticeably better in other cases though). Hopefully we can find time to do it at some point, but for now it's not a priority for me. |
BTW, it looks like XGBoost.jl is getting a well-needed rewrite. dmlc/XGBoost.jl#111 🤞🏾 |
Closed by #16 v0.3.0 will have MLJ integration thanks to @tylerjthomas9 and @ablaom ! |
It would be great if the MLJ docs were updated to reflect this. |
Will happen when I update the model registry shortly. I'll re-open this to flag this hasn't happened yet. |
Oh, I can't reopen. I'll create the issue at MLJModels now instead. |
Hey and thank you for this package!
I've been hoping for a CatBoost interface for a while!!
Have you considered porting this to MLJ.jl?
This would be an awesome addition as they currently support XGBoost & LightGBM.
@ablaom @tlienart
BTW, I noticed some Julia wrappers wrap ML models in high level code (like Python/R).
Other wrappers wrap the underlying low level code (eg GLMNet.jl wraps the Fortran code from glmnet.R).
Wrapping the underlying CatBoost code would prob be a pain, but would there be a performance difference?
The text was updated successfully, but these errors were encountered: