Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider how we can bring more lightgbm parameters #23

Open
yalwan-iqvia opened this issue Feb 19, 2020 · 11 comments
Open

consider how we can bring more lightgbm parameters #23

yalwan-iqvia opened this issue Feb 19, 2020 · 11 comments

Comments

@yalwan-iqvia
Copy link
Collaborator

https://lightgbm.readthedocs.io/en/latest/Parameters.html

@yalwan-iqvia
Copy link
Collaborator Author

For completeness, not all of the parameters you'd be interested in tweaking are available for tweaking. An important one comes to mind: boosting

@yalwan-iqvia
Copy link
Collaborator Author

Defaults might not be matching main LightGBM: #42 (comment)
Another vote for the boosting parameter: #42 (comment)

@azev77

@azev77
Copy link

azev77 commented Apr 16, 2020

Btw, Caret has an interface to many R packages, just like MLJ.
In their interface, they set: xgbDART, xgbLinear, xgbTree as 3 separate models instead of just choosing the parameter xgb(boost="dart"), xgb(boost="linear") etc

Might this be worth doing w/ your interface to MLJ?
LGBM_gbdt (default)
LGBM_rf
LGBM_dart
LGBM_goss

@yalwan-iqvia
Copy link
Collaborator Author

In implementation terms, doing this is likely to result in massive amounts of duplication of code.
I'm also not really sure of what the advantage is in real terms is to the user.

If you see recent commit ee01161 that we actually went in opposite direction to this, by removing object-wise distinction between binary-multiclass (which just corresponds to a change in the value for objective parameter)

However, I agree that the missing support for the boosting parameter needs to be added.

@azev77
Copy link

azev77 commented Apr 17, 2020

You’re probably right.
The only reason I brought it up is that it can make it look like MLJ has more models.
Caret claims to have > 200 models, but a lot of them are the same model w different HP

xgbDART, xgbLinear, xgbTree

@yalwan-iqvia
Copy link
Collaborator Author

Just to let you know, I haven't forgotten this. I've been buried with other work, but I'm hoping in the coming weeks to bundle a few features together and create a new release.

@sbeura
Copy link
Contributor

sbeura commented May 22, 2020

i found these important parameters missing:

boosting: defines the type of algorithm you want to run, default=gdbt
gbdt: traditional Gradient Boosting Decision Tree
rf: random forest
dart: Dropouts meet Multiple Additive Regression Trees
goss: Gradient-based One-Side Sampling

max_cat_group: When the number of category is large, finding the split point on it is easily over-fitting. So LightGBM merges them into ‘max_cat_group’ groups, and finds the split points on the group boundaries, default:64

application: This is the most important parameter and specifies the application of your model, whether it is a regression problem or classification problem. LightGBM will by default consider model as a regression model.(while we have created two separately for regression and classification, i found where the two are defined under this)

num_boost_round: Number of boosting iterations, typically 100+

ignore_column: same as categorical_features just instead of considering specific columns as categorical, it will completely ignore them.

@yalwan-iqvia
Copy link
Collaborator Author

num_boost_round is an alias for num_iterations which we already have

I am not yet sure what to do about supporting all of the aliases, because you cannot reasonably alias struct fields (that I know of), and reproducing their aliasing logic seems like a bit of a wasted effort

@yalwan-iqvia
Copy link
Collaborator Author

As for application you can see it is supported: https://github.com/IQVIA-ML/LightGBM.jl/blob/master/src/estimators.jl#L7
But the kwargs constructor doesn't take it:
https://github.com/IQVIA-ML/LightGBM.jl/blob/master/src/estimators.jl#L93
So that one is an easy fix.

@yalwan-iqvia
Copy link
Collaborator Author

#62 PR just merged, brings support for boosting parameter and those related to DART and GOSS

@yalwan-iqvia
Copy link
Collaborator Author

#79 merged, brings quite a few additional parameters. Will become available hopefully soon when we cut the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants