-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MLJ compliant docstrings #130
base: master
Are you sure you want to change the base?
Conversation
Hi @josephsdavid A couple of remarks;
|
src/MLJInterface.jl
Outdated
weights=true, | ||
descr="Microsoft LightGBM FFI wrapper: Classifier", | ||
weights=true | ||
# descr="Microsoft LightGBM FFI wrapper: Classifier", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, how come you're commenting these ones out? And if there's a good reason for it, I'd expect it to be deleted rather than commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops! I meant to delete them! There is a good reason, the existence of a docstring after the model metadata is created overwrites the descr field i believe, making it no longer needed (paging @ablaom to confirm, there is a reason but i may have mixed it up :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MLJ model trait
docstring (alias descr
) used to be for a short summary string, which was not that useful, in retrospect. Now it is not to be overloaded but instead falls back to the full docstring (the one @josephsdavid has worked on here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, yes, these should be deleted.
) | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to ask that you revert most of the changes above these lines back (except the descr ones, after explanations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh oops, did not realize i made changes above these lines 😅 Will change back so we have proper git blames :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell if this has been addressed. @yaxxie / @josephsdavid Is this done?
hah i was so excited to have all the parameters documented i missed the other pieces of work 😓 |
src/MLJInterface.jl
Outdated
) | ||
|
||
""" | ||
`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a framework for gradient boosting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josephsdavid I don't see a proper header for the docstring, or the code that auto-generates one? Is this because we are adding to an existing docstring? Or maybe some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved.
src/MLJInterface.jl
Outdated
- `feature_fraction::Float64 = 1.0`: The fraction of features to select before fitting a tree. Can be used to speed up training and reduce over-fitting. | ||
- `feature_fraction_bynode::Float64 = 1.0`: The fraction of features to select for each tree node. Can be used to reduce over-fitting. | ||
- `feature_fraction_seed::Int = 2`: Random seed to use for the gesture fraction | ||
- `bagging_fraction::Float64 = 1.0`: The fraction of samples to use before fitting fitting a tree. Can be used to speed up training and reduce over-fitting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please keep the length of lines below 92 (Blue style recommendation). I can't see easily see the end of the lines to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
|
||
predict(lgbm, train) | ||
``` | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of suggestions about the example:
- I think
PrettyPrinting
andStatistics
are both redundant. The functionpretty
is exported by MLJ. Actually, I don't think we need callpretty
here anyway. - Earlier in the docstring we use
X
andy
but here it'sfeatures
andtargets
which is confusing. For consistency with other MLJ docs, I suggest sticking toX
andy
. - I don't think the
@show
lines add much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josephsdavid Thanks for this mammoth effort. 🦣
I don't see sections "Fitted parameters" or "Report", which are required.
Given the fact that all the models have a lot of hyper-parameters in common, I wonder if you would consider, for easier maintenance, interpolating a string constant for the common ones?
I've looked over the first docstring for now. Please ping me when you've addressed my comments and I'll review the others too.
Will do! going to go over more closely over the weekend :) |
Co-authored-by: Anthony Blaom, PhD <[email protected]>
@@ -402,4 +402,363 @@ MLJModelInterface.metadata_model( | |||
descr="Microsoft LightGBM FFI wrapper: Regressor", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These descr
declarations should be deleted.
To make the automatically generated header more readable, add
human_name="LightGBM regressor"
Make similar changes to the other models.
X, y = @load_boston # a table and a vector X = DataFrame(X) train, test = | ||
partition(collect(eachindex(y)), 0.70, shuffle=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing carriage return:
X, y = @load_boston # a table and a vector X = DataFrame(X) train, test = | |
partition(collect(eachindex(y)), 0.70, shuffle=true) | |
X, y = @load_boston # a table and a vector | |
X = DataFrame(X) train, test = partition(collect(eachindex(y)), 0.70, shuffle=true) |
X, y = @load_boston # a table and a vector X = DataFrame(X) train, test = | ||
partition(collect(eachindex(y)), 0.70, shuffle=true) | ||
|
||
first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing carriage return:
first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default | |
first(X, 3) | |
lgb = LGBMRegressor() #initialised a model with default |
partition(collect(eachindex(y)), 0.70, shuffle=true) | ||
|
||
first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default | ||
params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this params
redundant?
@josephsdavid Be great if you can test examples before committing .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josephsdavid for the progress! We're getting there.
Particular attention is still needed in the examples. If you could please check they run, that will save me some review time.
|
||
predict(lgbm, train) | ||
``` | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resolved.
""" | ||
$(MLJModelInterface.doc_header(LGBMRegressor)) | ||
|
||
`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this initial "LightGBMRegressor: " is redundant. The automated header already makes it clear what we are talking about. (And I have been silently removing these before merging your PR's in other package). Also, it's confusing in this instance because the model struct is actually called LGBMRegressor
, not LightGBMRegressor
.
`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a | |
LightGBM, short for light gradient-boosting machine, is a |
Please address in all the models.
|
||
```julia | ||
|
||
using DataFrames: DataFrame using MLJ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add carriage return:
using DataFrames: DataFrame using MLJ | |
using DataFrames: DataFrame | |
using MLJ |
partition(collect(eachindex(y)), 0.70, shuffle=true) | ||
|
||
first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default | ||
params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y
is a vector.
params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit! | |
params lgbm = machine(lgb, X[train, :], y[train]) |> MLJ.fit! |
params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit! | ||
|
||
predict(lgbm, X[test, :]) ``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add the example a couple lines showing how to access the feature importances. This is a pretty popular feature of tree boosters.
Ditto the other models.
framework for gradient boosting based on decision tree algorithms and used for | ||
ranking, classification and other machine learning tasks, with a focus on | ||
performance and scalability. This model in particular is used for various types of | ||
regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last sentence mentions "regression". That doesn't sound right for this classifier.
# Training data In MLJ or MLJBase, bind an instance `model` to data with mach = | ||
machine(model, X, y) Here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing carriage returns:
# Training data In MLJ or MLJBase, bind an instance `model` to data with mach = | |
machine(model, X, y) Here: | |
# Training data | |
In MLJ or MLJBase, bind an instance `model` to data with | |
mach = machine(model, X, y) | |
Here: | |
# Training data In MLJ or MLJBase, bind an instance `model` to data with mach = | ||
machine(model, X, y) Here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Training data In MLJ or MLJBase, bind an instance `model` to data with mach = | |
machine(model, X, y) Here: | |
# Training data | |
In MLJ or MLJBase, bind an instance `model` to data with | |
mach = machine(model, X, y) | |
Here: | |
- `predict(mach, Xnew)`: return predictions of the target given new features | ||
`Xnew` having the same Scitype as `X` above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `predict(mach, Xnew)`: return predictions of the target given new features | |
`Xnew` having the same Scitype as `X` above. | |
- `predict(mach, Xnew)`: return predictions of the target given new features | |
`Xnew`, which should have the same scitype as `X` above. |
params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit! | ||
|
||
predict(lgbm, train) ``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments apply here as in the regressor example.
Hi @josephsdavid it's been quite a while since this PR was submitted and you've put a significant effort to add the MLJ compliant docstrings. I was wondering if it was possible for you to update your branch with the latest LightGBM.jl master and push it again so hopefully the PR can be re-reviewed and merged :) Also, do you happen to know what's the best way to test these docs are rendering correctly. |
In service of #913, as documented here !