Skip to content

Commit

Permalink
more doc tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
ablaom committed Nov 22, 2024
1 parent 1449814 commit 105d7ff
Show file tree
Hide file tree
Showing 9 changed files with 83 additions and 78 deletions.
2 changes: 1 addition & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ makedocs(
"fit/update" => "fit_update.md",
"predict/transform" => "predict_transform.md",
"Kinds of Target Proxy" => "kinds_of_target_proxy.md",
"target/weights/features" => "target_weights_features.md",
"obs" => "obs.md",
"target/weights/features" => "target_weights_features.md",
"Accessor Functions" => "accessor_functions.md",
"Learner Traits" => "traits.md",
],
Expand Down
16 changes: 5 additions & 11 deletions docs/src/anatomy_of_an_implementation.md
Original file line number Diff line number Diff line change
Expand Up @@ -462,21 +462,14 @@ LearnAPI.predict(model::RidgeFitted, ::Point, Xnew) =

### `target` and `features` methods

We provide an additional overloading of [`LearnAPI.target`](@ref) to handle the additional
supported data argument of `fit`:
In the general case, we only need to implement [`LearnAPI.target`](@ref) and
[`LearnAPI.features`](@ref) to handle all possible output of `obs(learner, data)`, and now
the fallback for `LearnAPI.features` mentioned before is inadequate.

```@example anatomy2
LearnAPI.target(::Ridge, observations::RidgeFitObs) = observations.y
```

Similarly, we must overload [`LearnAPI.features`](@ref), which extracts features from
training data (objects that can be passed to `predict`) like this

```@example anatomy2
LearnAPI.features(::Ridge, observations::RidgeFitObs) = observations.A
```
as the fallback mentioned above is no longer adequate.


### Important notes:

Expand All @@ -501,7 +494,8 @@ interfaces](@ref data_interfaces) for details.

### Addition of signatures for user convenience

As above, we add a signature which plays no role vis-à-vis LearnAPI.jl.
As above, we add a signature for convenience, which the LearnAPI.jl specification
neither requires nor forbids:

```@example anatomy2
LearnAPI.fit(learner::Ridge, X, y; kwargs...) = fit(learner, (X, y); kwargs...)
Expand Down
2 changes: 1 addition & 1 deletion docs/src/common_implementation_patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
!!! important

This section is only an implementation guide. The definitive specification of the
Learn API is given in [Reference](@ref reference).
LearnAPI is given in [Reference](@ref reference).

This guide is intended to be consulted after reading [Anatomy of an Implementation](@ref),
which introduces the main interface objects and terminology.
Expand Down
16 changes: 13 additions & 3 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
```@raw html
<script async defer src="https://buttons.github.io/buttons.js"></script>
<div style="font-size:1.4em;font-weight:bold;">
<a href="anatomy_of_an_implementation.html"
style="color: #389826;">Tutorial</a> &nbsp;|&nbsp;
<a href="reference.html"
style="color: #9558B2;">Reference</a> &nbsp;|&nbsp;
<a href="common_implementation_patterns.html"
style="color: #9558B2;">Patterns</a>
</div>
<span style="color: #9558B2;font-size:4.5em;">
LearnAPI.jl</span>
<br>
Expand Down Expand Up @@ -86,11 +96,11 @@ opts out. Moreover, the `fit` and `predict` methods will also be able to consume
alternative data representations, for performance benefits in some situations.

The fallback data interface is the [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl)
`getobs/numobs` interface (here tagged as [`LearnAPI.RandomAccess()`](@ref)) and if the
`getobs/numobs` interface, here tagged as [`LearnAPI.RandomAccess()`](@ref), and if the
input consumed by the algorithm already implements that interface (tables, arrays, etc.)
then overloading `obs` is completely optional. Plain iteration interfaces, with or without
knowledge of the number of observations, can also be specified (to support, e.g., data
loaders reading images from disk).
knowledge of the number of observations, can also be specified, to support, e.g., data
loaders reading images from disk.

## Learning more

Expand Down
8 changes: 4 additions & 4 deletions docs/src/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,15 +170,15 @@ minimal (but useless) implementation, see the implementation of `SmallLearner`
- [`inverse_transform`](@ref operations): for inverting the output of
`transform` ("inverting" broadly understood)

- [`LearnAPI.target`](@ref input), [`LearnAPI.weights`](@ref input),
[`LearnAPI.features`](@ref): for extracting relevant parts of training data, where
defined.

- [`obs`](@ref data_interface): method for exposing to the user
learner-specific representations of data, which are additionally guaranteed to
implement the observation access API specified by
[`LearnAPI.data_interface(learner)`](@ref).

- [`LearnAPI.target`](@ref input), [`LearnAPI.weights`](@ref input),
[`LearnAPI.features`](@ref): for extracting relevant parts of training data, where
defined.

- [Accessor functions](@ref accessor_functions): these include functions like
`LearnAPI.feature_importances` and `LearnAPI.training_losses`, for extracting, from
training outcomes, information common to many learners. This includes
Expand Down
13 changes: 8 additions & 5 deletions docs/src/target_weights_features.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# [`target`, `weights`, and `features`](@id input)

Methods for extracting parts of training data:
Methods for extracting parts of training observations. Here "observations" means the
output of [`obs(learner, data)`](@ref); if `obs` is not overloaded for `learner`, then
"observations" is any `data` supported in calls of the form [`fit(learner, data)`](@ref)

```julia
LearnAPI.target(learner, data) -> <target variable>
LearnAPI.weights(learner, data) -> <per-observation weights>
LearnAPI.features(learner, data) -> <training "features", suitable input for `predict` or `transform`>
LearnAPI.target(learner, observations) -> <target variable>
LearnAPI.weights(learner, observations) -> <per-observation weights>
LearnAPI.features(learner, observations) -> <training "features", suitable input for `predict` or `transform`>
```

Here `data` is something supported in a call of the form `fit(learner, data)`.
Expand All @@ -19,7 +21,8 @@ Supposing `learner` is a supervised classifier predicting a one-dimensional vect
target:

```julia
model = fit(learner, data)
observations = obs(learner, data)
model = fit(learner, observations)
X = LearnAPI.features(learner, data)
y = LearnAPI.target(learner, data)
= predict(model, Point(), X)
Expand Down
12 changes: 2 additions & 10 deletions src/obs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ using `MLUtils.getobs`, with the obvious interpretation applying to the outcomes
calls (e.g., if *all* observations are subsampled, then outcomes should be the same as if
using the original data).
Implicit in preceding requirements is that `obs(learner, _)` and `obs(model, _)` are
involutive, meaning both the following hold:
It is required that `obs(learner, _)` and `obs(model, _)` are involutive, meaning both the
following hold:
```julia
obs(learner, obs(learner, data)) == obs(learner, data)
Expand All @@ -81,14 +81,6 @@ only of suitable tables and arrays, then `obs` and `LearnAPI.data_interface` do
to be overloaded. However, the user will get no performance benefits by using `obs` in
that case.
If overloading `obs(learner, data)` to output new model-specific representations of
data, it may be necessary to also overload [`LearnAPI.features(learner,
observations)`](@ref), [`LearnAPI.target(learner, observations)`](@ref) (supervised
learners), and/or [`LearnAPI.weights(learner, observations)`](@ref) (if weights are
supported), for each kind output `observations` of `obs(learner, data)`. Moreover, the
outputs of these methods, applied to `observations`, must also implement the interface
specified by [`LearnAPI.data_interface(learner)`](@ref).
## Sample implementation
Refer to the ["Anatomy of an
Expand Down
88 changes: 47 additions & 41 deletions src/target_weights_features.jl
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
"""
LearnAPI.target(learner, data) -> target
LearnAPI.target(learner, observations) -> target
Return, for each form of `data` supported in a call of the form [`fit(learner,
data)`](@ref), the target variable part of `data`. If `nothing` is returned, the
Return, for every conceivable `observations` returned by a call of the form [`obs(learner,
data)`](@ref), the target variable part of `observations`. If `nothing` is returned, the
`learner` does not see a target variable in training (is unsupervised).
The returned object `y` has the same number of observations as `data`. If `data` is the
output of an [`obs`](@ref) call, then `y` is additionally guaranteed to implement the
data interface specified by [`LearnAPI.data_interface(learner)`](@ref).
The returned object `y` has the same number of observations as `observations` does and is
guaranteed to implement the data interface specified by
[`LearnAPI.data_interface(learner)`](@ref).
# Extended help
Expand All @@ -21,57 +21,61 @@ the LearnAPI.jl documentation.
## New implementations
A fallback returns `nothing`. The method must be overloaded if `fit` consumes data
including a target variable.
A fallback returns `nothing`. The method must be overloaded if [`fit`](@ref) consumes data
that includes a target variable. If `obs` is not being overloaded, then `observations`
above is any `data` supported in calls of the form [`fit(learner, data)`](@ref). The form
of the output `y` should be suitable for pairing with the output of [`predict`](@ref), in
the evaluation of a loss function, for example.
If overloading [`obs`](@ref), ensure that the return value, unless `nothing`, implements
the data interface specified by [`LearnAPI.data_interface(learner)`](@ref), in the special
case that `data` is the output of an `obs` call.
Ensure the object `y` returned by `LearnAPI.target`, unless `nothing`, implements the data
interface specified by [`LearnAPI.data_interface(learner)`](@ref).
$(DOC_IMPLEMENTED_METHODS(":(LearnAPI.target)"; overloaded=true))
"""
target(::Any, data) = nothing
target(::Any, observations) = nothing

"""
LearnAPI.weights(learner, data) -> weights
LearnAPI.weights(learner, observations) -> weights
Return, for each form of `data` supported in a call of the form [`fit(learner,
data)`](@ref), the per-observation weights part of `data`. Where `nothing` is returned, no
weights are part of `data`, which is to be interpreted as uniform weighting.
Return, for every conceivable `observations` returned by a call of the form [`obs(learner,
data)`](@ref), the weights part of `observations`. Where `nothing` is returned, no weights
are part of `data`, which is to be interpreted as uniform weighting.
The returned object `w` has the same number of observations as `data`. If `data` is the
output of an [`obs`](@ref) call, then `w` is additionally guaranteed to implement the
data interface specified by [`LearnAPI.data_interface(learner)`](@ref).
The returned object `w` has the same number of observations as `observations` does and is
guaranteed to implement the data interface specified by
[`LearnAPI.data_interface(learner)`](@ref).
# Extended help
# New implementations
Overloading is optional. A fallback returns `nothing`.
Overloading is optional. A fallback returns `nothing`. If `obs` is not being overloaded,
then `observations` above is any `data` supported in calls of the form [`fit(learner,
data)`](@ref).
If overloading [`obs`](@ref), ensure that the return value, unless `nothing`, implements
the data interface specified by [`LearnAPI.data_interface(learner)`](@ref), in the special
case that `data` is the output of an `obs` call.
Ensure the returned object, unless `nothing`, implements the data interface specified by
[`LearnAPI.data_interface(learner)`](@ref).
$(DOC_IMPLEMENTED_METHODS(":(LearnAPI.weights)"; overloaded=true))
"""
weights(::Any, data) = nothing
weights(::Any, observations) = nothing

"""
LearnAPI.features(learner, data)
LearnAPI.features(learner, observations)
Return, for each form of `data` supported in a call of the form [`fit(learner,
data)`](@ref), the "features" part of `data` (as opposed to the target
variable, for example).
Return, for every conceivable `observations` returned by a call of the form [`obs(learner,
data)`](@ref), the "features" part of `data` (as opposed to the target variable, for
example).
The returned object `X` may always be passed to `predict` or `transform`, where
implemented, as in the following sample workflow:
```julia
model = fit(learner, data)
X = LearnAPI.features(learner, data)
observations = obs(learner, data)
model = fit(learner, observations)
X = LearnAPI.features(learner, observations)
ŷ = predict(model, kind_of_proxy, X) # eg, `kind_of_proxy = Point()`
```
Expand All @@ -80,28 +84,30 @@ For supervised models (i.e., where `:(LearnAPI.target) in LearnAPI.functions(lea
data)`, the training target.
The object `X` returned by `LearnAPI.target` has the same number of observations as
`data`. If `data` is the output of an [`obs`](@ref) call, then `X` is additionally
guaranteed to implement the data interface specified by
`observations` does and is guaranteed to implement the data interface specified by
[`LearnAPI.data_interface(learner)`](@ref).
# Extended help
# New implementations
A fallback returns `first(observations)` if `observations` is a tuple, and otherwise
returns `observations`. New implementations may need to overload this method if this
fallback is inadequate.
For density estimators, whose `fit` typically consumes *only* a target variable, you
should overload this method to return `nothing`.
should overload this method to return `nothing`. If `obs` is not being overloaded, then
`observations` above is any `data` supported in calls of the form [`fit(learner,
data)`](@ref).
It must otherwise be possible to pass the return value `X` to `predict` and/or
`transform`, and `X` must have same number of observations as `data`. A fallback returns
`first(data)` if `data` is a tuple, and otherwise returns `data`.
`transform`, and `X` must have same number of observations as `data`.
Further overloadings may be necessary to handle the case that `data` is the output of
[`obs(learner, data)`](@ref), if `obs` is being overloaded. In this case, be sure that
`X`, unless `nothing`, implements the data interface specified by
Ensure the returned object, unless `nothing`, implements the data interface specified by
[`LearnAPI.data_interface(learner)`](@ref).
"""
features(learner, data) = _first(data)
_first(data) = data
_first(data::Tuple) = first(data)
features(learner, observations) = _first(observations)
_first(observations) = observations
_first(observations::Tuple) = first(observations)
# note the factoring above guards against method ambiguities
4 changes: 2 additions & 2 deletions src/traits.jl
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@ iteration_parameter(::Any) = nothing
Return an upper bound `S` on the scitype of individual observations guaranteed to work
when calling `fit`: if `observations = obs(learner, data)` and
`ScientificTypes.scitype(o) <:S` for each `o` in `observations`, then the call
`ScientificTypes.scitype(collect(o)) <:S` for each `o` in `observations`, then the call
`fit(learner, data)` is supported.
$DOC_EXPLAIN_EACHOBS
Expand All @@ -396,7 +396,7 @@ See also [`LearnAPI.target_observation_scitype`](@ref).
# New implementations
Optional. The fallback return value is `Union{}`.
Optional. The fallback return value is `Union{}`.
"""
fit_observation_scitype(::Any) = Union{}
Expand Down

0 comments on commit 105d7ff

Please sign in to comment.