Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MLJ compliant docstrings #130

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

josephsdavid
Copy link

@josephsdavid josephsdavid commented Oct 26, 2022

In service of #913, as documented here !

@yaxxie
Copy link
Contributor

yaxxie commented Oct 27, 2022

Hi @josephsdavid
Thanks for the contribution!

A couple of remarks;

  1. Please avoid simply reformatting code. This makes diffs harder to read and muddies the purpose of the contribution
  2. Please fill in the description to the PR. For example, a link to the documentation about "MLJ docstrings" would be useful to the reader.
  3. Please also don't forget to add yourself to the contributors list CONTRIBUTORS.md 🙂

@yaxxie yaxxie changed the title Add MLJ compliant docstrings! Add MLJ compliant docstrings Oct 27, 2022
weights=true,
descr="Microsoft LightGBM FFI wrapper: Classifier",
weights=true
# descr="Microsoft LightGBM FFI wrapper: Classifier",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, how come you're commenting these ones out? And if there's a good reason for it, I'd expect it to be deleted rather than commented out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops! I meant to delete them! There is a good reason, the existence of a docstring after the model metadata is created overwrites the descr field i believe, making it no longer needed (paging @ablaom to confirm, there is a reason but i may have mixed it up :) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MLJ model trait docstring (alias descr) used to be for a short summary string, which was not that useful, in retrospect. Now it is not to be overloaded but instead falls back to the full docstring (the one @josephsdavid has worked on here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, yes, these should be deleted.

)

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to ask that you revert most of the changes above these lines back (except the descr ones, after explanations)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops, did not realize i made changes above these lines 😅 Will change back so we have proper git blames :)

Copy link
Contributor

@ablaom ablaom Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if this has been addressed. @yaxxie / @josephsdavid Is this done?

@josephsdavid
Copy link
Author

  1. Please fill in the description to the PR. For example, a link to the documentation about "MLJ docstrings" would be useful to the reader.

hah i was so excited to have all the parameters documented i missed the other pieces of work 😓

)

"""
`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a framework for gradient boosting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josephsdavid I don't see a proper header for the docstring, or the code that auto-generates one? Is this because we are adding to an existing docstring? Or maybe some other reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

src/MLJInterface.jl Outdated Show resolved Hide resolved
src/MLJInterface.jl Outdated Show resolved Hide resolved
src/MLJInterface.jl Outdated Show resolved Hide resolved
src/MLJInterface.jl Outdated Show resolved Hide resolved
- `feature_fraction::Float64 = 1.0`: The fraction of features to select before fitting a tree. Can be used to speed up training and reduce over-fitting.
- `feature_fraction_bynode::Float64 = 1.0`: The fraction of features to select for each tree node. Can be used to reduce over-fitting.
- `feature_fraction_seed::Int = 2`: Random seed to use for the gesture fraction
- `bagging_fraction::Float64 = 1.0`: The fraction of samples to use before fitting fitting a tree. Can be used to speed up training and reduce over-fitting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please keep the length of lines below 92 (Blue style recommendation). I can't see easily see the end of the lines to review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

src/MLJInterface.jl Outdated Show resolved Hide resolved
src/MLJInterface.jl Outdated Show resolved Hide resolved
src/MLJInterface.jl Outdated Show resolved Hide resolved
src/MLJInterface.jl Outdated Show resolved Hide resolved

predict(lgbm, train)
```
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of suggestions about the example:

  • I think PrettyPrinting and Statistics are both redundant. The function pretty is exported by MLJ. Actually, I don't think we need call pretty here anyway.
  • Earlier in the docstring we use X and y but here it's features and targets which is confusing. For consistency with other MLJ docs, I suggest sticking to X and y.
  • I don't think the @show lines add much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved.

Copy link
Contributor

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josephsdavid Thanks for this mammoth effort. 🦣

I don't see sections "Fitted parameters" or "Report", which are required.

Given the fact that all the models have a lot of hyper-parameters in common, I wonder if you would consider, for easier maintenance, interpolating a string constant for the common ones?

I've looked over the first docstring for now. Please ping me when you've addressed my comments and I'll review the others too.

@josephsdavid
Copy link
Author

@josephsdavid Thanks for this mammoth effort. 🦣

I don't see sections "Fitted parameters" or "Report", which are required.

Given the fact that all the models have a lot of hyper-parameters in common, I wonder if you would consider, for easier maintenance, interpolating a string constant for the common ones?

I've looked over the first docstring for now. Please ping me when you've addressed my comments and I'll review the others too.

Will do! going to go over more closely over the weekend :)

@@ -402,4 +402,363 @@ MLJModelInterface.metadata_model(
descr="Microsoft LightGBM FFI wrapper: Regressor",
Copy link
Contributor

@ablaom ablaom Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These descr declarations should be deleted.

To make the automatically generated header more readable, add

    human_name="LightGBM regressor"

Make similar changes to the other models.

Comment on lines +573 to +574
X, y = @load_boston # a table and a vector X = DataFrame(X) train, test =
partition(collect(eachindex(y)), 0.70, shuffle=true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing carriage return:

Suggested change
X, y = @load_boston # a table and a vector X = DataFrame(X) train, test =
partition(collect(eachindex(y)), 0.70, shuffle=true)
X, y = @load_boston # a table and a vector
X = DataFrame(X) train, test = partition(collect(eachindex(y)), 0.70, shuffle=true)

X, y = @load_boston # a table and a vector X = DataFrame(X) train, test =
partition(collect(eachindex(y)), 0.70, shuffle=true)

first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing carriage return:

Suggested change
first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default
first(X, 3)
lgb = LGBMRegressor() #initialised a model with default

partition(collect(eachindex(y)), 0.70, shuffle=true)

first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default
params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this params redundant?

@josephsdavid Be great if you can test examples before committing .

Copy link
Contributor

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josephsdavid for the progress! We're getting there.

Particular attention is still needed in the examples. If you could please check they run, that will save me some review time.


predict(lgbm, train)
```
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved.

"""
$(MLJModelInterface.doc_header(LGBMRegressor))

`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this initial "LightGBMRegressor: " is redundant. The automated header already makes it clear what we are talking about. (And I have been silently removing these before merging your PR's in other package). Also, it's confusing in this instance because the model struct is actually called LGBMRegressor, not LightGBMRegressor.

Suggested change
`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a
LightGBM, short for light gradient-boosting machine, is a

Please address in all the models.


```julia

using DataFrames: DataFrame using MLJ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add carriage return:

Suggested change
using DataFrames: DataFrame using MLJ
using DataFrames: DataFrame
using MLJ

partition(collect(eachindex(y)), 0.70, shuffle=true)

first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default
params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y is a vector.

Suggested change
params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!
params lgbm = machine(lgb, X[train, :], y[train]) |> MLJ.fit!

params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!

predict(lgbm, X[test, :]) ```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add the example a couple lines showing how to access the feature importances. This is a pretty popular feature of tree boosters.

Ditto the other models.

framework for gradient boosting based on decision tree algorithms and used for
ranking, classification and other machine learning tasks, with a focus on
performance and scalability. This model in particular is used for various types of
regression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence mentions "regression". That doesn't sound right for this classifier.

Comment on lines +594 to +595
# Training data In MLJ or MLJBase, bind an instance `model` to data with mach =
machine(model, X, y) Here:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing carriage returns:

Suggested change
# Training data In MLJ or MLJBase, bind an instance `model` to data with mach =
machine(model, X, y) Here:
# Training data
In MLJ or MLJBase, bind an instance `model` to data with
mach = machine(model, X, y)
Here:

Comment on lines +414 to +415
# Training data In MLJ or MLJBase, bind an instance `model` to data with mach =
machine(model, X, y) Here:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Training data In MLJ or MLJBase, bind an instance `model` to data with mach =
machine(model, X, y) Here:
# Training data
In MLJ or MLJBase, bind an instance `model` to data with
mach = machine(model, X, y)
Here:

Comment on lines +607 to +608
- `predict(mach, Xnew)`: return predictions of the target given new features
`Xnew` having the same Scitype as `X` above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `predict(mach, Xnew)`: return predictions of the target given new features
`Xnew` having the same Scitype as `X` above.
- `predict(mach, Xnew)`: return predictions of the target given new features
`Xnew`, which should have the same scitype as `X` above.

params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!

predict(lgbm, train) ```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments apply here as in the regressor example.

@kainkad
Copy link
Contributor

kainkad commented Jan 14, 2025

Hi @josephsdavid it's been quite a while since this PR was submitted and you've put a significant effort to add the MLJ compliant docstrings. I was wondering if it was possible for you to update your branch with the latest LightGBM.jl master and push it again so hopefully the PR can be re-reviewed and merged :) Also, do you happen to know what's the best way to test these docs are rendering correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants