Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PartitionedLS model to the registry #552

Closed
boborbt opened this issue Apr 5, 2024 · 17 comments · Fixed by JuliaAI/MLJ.jl#1103
Closed

Add PartitionedLS model to the registry #552

boborbt opened this issue Apr 5, 2024 · 17 comments · Fixed by JuliaAI/MLJ.jl#1103

Comments

@boborbt
Copy link

boborbt commented Apr 5, 2024

Hi, I would like to have PartitionedLS added to the registry.

A PartitionedLS model solves a constrained least squared problem where the features are partitioned by the user into groups. The solution is constrained to have all features in a group contributing in the same "direction" (i.e., positively or negatively) to the solution. Also, the solution is given in terms of two kind of variables: $\beta$ variables specify how much and in which direction each group contributes to the solution, $\alpha$ variables specify how much each feature contributes to each group.

The implementation is here. It supports MLJ interface since version 1.0.0.

The code supporting the MLJ interface can be found here.

More information about the technique can be found in this paper (currently submitted to the Machine Learning Journal after a major revision).

@ablaom
Copy link
Member

ablaom commented Apr 7, 2024

Great to have this contribution, thanks.

A cursory look at the code suggests this looks good. Can you please bring the the hyperparmeter struct docstring up to spec?

@ablaom To do:

@ablaom
Copy link
Member

ablaom commented Apr 7, 2024

@boborbt
Copy link
Author

boborbt commented Apr 8, 2024

Hi,

A cursory look at the code suggests this looks good. Can you please bring the the hyperparmeter struct docstring up to spec?

I've tried to follow the spec, but I was not able to use the $(MLJModelInterface.doc_header(...)) call. For some reason if I include it precompiling the package fails asserting that MLJModelInterface is not defined (regardless to importing or using it).

To overcome the problem I updated the docstring for PartLS to match the required format. I left the docstring I was trying to use at the end of the file, with the only modification that $(MLJModelInterface... is commented out. If you have any suggestion about how to fix the problem I will be glad to incorporate it.

You can find the updated files in commit bcafc93.

@boborbt
Copy link
Author

boborbt commented Apr 8, 2024

Waiting for:

This has been fixed in commit 6b9a3f7.

@ablaom
Copy link
Member

ablaom commented Apr 8, 2024

Thanks @boborbt for that. Would you consider updating the docstring as indicated above?

@ablaom
Copy link
Member

ablaom commented Apr 9, 2024

Okay, I see this is done, many thanks! Please see my proposed update, ml-unito/PartitionedLS.jl#7.

@boborbt
Copy link
Author

boborbt commented Apr 9, 2024

Thanks @boborbt for that. Would you consider updating the docstring as indicated above?

As I mentioned above, I tried my best, but was not able to use the $(MLJModelInterface.doc_header(...)) call. It breaks compilation and I could not figure out why -- the error message says it cannot find MMI (which is defined as an alias for MLJModelInterface).

As a workaround I reported the docstring as a "normal" docstring for the struct. If you have any hint about to overcome the compilation problem I am more than willing to incorporate the fix.

Thanks.

@ablaom
Copy link
Member

ablaom commented Apr 9, 2024

I've tried to follow the spec, but I was not able to use the $(MLJModelInterface.doc_header(...)) call. For some reason if I include it precompiling the package fails asserting that MLJModelInterface is not defined (regardless to importing or using it).

Sorry, I completely missed this.

I don't think we need to worry about including the MLJModelInterface.doc_header call; it's just a convenience macro. It is used without problems for dozens of other models; sorry I can't guess the issue and don't have the bandwidth to investigate just now.

Now you have made Float64 generic, I suggest we amend the docstring as follows:

- `X`: any matrix with element type `Float64`, or any table with columns of type `Float64`

is replaced with

- `X`: any matrix or table with `Continuous` element scitype. Check column scitypes of a table `X` with `schema(X)`.

- `y`: any vector with `Continuous` element scitype. Check scitype with `scitype(y)`. 

Then I think we're ready to roll.

@boborbt
Copy link
Author

boborbt commented Apr 10, 2024

I just updated the documentation as suggested (see commit b1ca7ba9).

By the way, thanks for all the suggestions you gave me. It was worth registering to MLJ just for the free code review it comes with it 😉.

@boborbt
Copy link
Author

boborbt commented Apr 12, 2024

Any news about this issue? I believe everything needed is in place now.

@ablaom
Copy link
Member

ablaom commented Apr 12, 2024

Sorry, I posted an comment but must have forgotten to click "comment".

I'm waiting for you to tag a new release to make the doc strings live. The registration process grabs the doc strings from the latest tagged release to include in the registry metadata (which is searchable from MLJ without loading model-providing code) and appears here also.

Please ping me when this is ready.

@boborbt
Copy link
Author

boborbt commented Apr 12, 2024

No problem @ablaom, I just tagged version 1.0.4, registered on Julia general registry and generated a new release on github.
The tagged version is this one.

@boborbt
Copy link
Author

boborbt commented Apr 13, 2024

Fixed the issue you opened yesterday.

@boborbt
Copy link
Author

boborbt commented Apr 13, 2024

I would like to add support for PrecompileTools, but to do so I would need to add a workload including fitting the whole model which implies using MLJBase.machine. This makes MLJBase a dependency of the package, which was frown upon in one of your previous messages.

What do you suggest? Do I drop support of PrecompileTools? Do I add MLJBase to the dependencies? Do you see any way to add it as a non-hard depenedency?

If you need to have a look, you find the candidate code in branch precompile-tools.

@ablaom
Copy link
Member

ablaom commented Apr 13, 2024

I suggest you only add a precompile load for your "native" API. If your native API depends on MLJBase to work, then we ought to be able to fix that. For example, if you use MLJModelInterface.matrix (which won't run properly without MLJBase loaded) then you can use import Tables; Tables.matrix instead. Let me know what causes you issues.

There is a known issue preventing precompilation loads for MLJBase having much impact: JuliaAI/MLJBase.jl#900 . This has to do with the way we made MLJBase an "optional" dependency for model-providing packages before Julia introduced weak dependencies and package extensions (and recompilation workloads) to do this. And we can't "modernise" without breaking MLJModelInterface, which we really don't want to do right now.

I think this issue would carry over to your case: If you make MLJBase a hard dependency, then you won't actually realize much benefit.

I'm no expert on this issue, but this is my general impression.

@ablaom
Copy link
Member

ablaom commented Apr 14, 2024

@ablaom
Copy link
Member

ablaom commented Apr 15, 2024

Thanks @boborbt for your patience with the model adding process. It was helpful to have the prompt responses to my suggestions to get this sorted quickly. Congratulations again on the new package.

If you have not already done so, you may like to announce it here at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants