more doc tweaks

JuliaAI · Nov 22, 2024 · 105d7ff · 105d7ff
1 parent 1449814
commit 105d7ff
Show file tree

Hide file tree

Showing 9 changed files with 83 additions and 78 deletions.
diff --git a/docs/make.jl b/docs/make.jl
@@ -18,8 +18,8 @@ makedocs(
             "fit/update" => "fit_update.md",
             "predict/transform" => "predict_transform.md",
             "Kinds of Target Proxy" => "kinds_of_target_proxy.md",
-            "target/weights/features" => "target_weights_features.md",
             "obs" => "obs.md",
+            "target/weights/features" => "target_weights_features.md",
             "Accessor Functions" => "accessor_functions.md",
             "Learner Traits" => "traits.md",
         ],

diff --git a/docs/src/anatomy_of_an_implementation.md b/docs/src/anatomy_of_an_implementation.md
@@ -462,21 +462,14 @@ LearnAPI.predict(model::RidgeFitted, ::Point, Xnew) =
 
 ### `target` and `features` methods
 
-We provide an additional overloading of [`LearnAPI.target`](@ref) to handle the additional
-supported data argument of `fit`:
+In the general case, we only need to implement [`LearnAPI.target`](@ref) and
+[`LearnAPI.features`](@ref) to handle all possible output of `obs(learner, data)`, and now
+the fallback for `LearnAPI.features` mentioned before is inadequate.
 
 ```@example anatomy2
 LearnAPI.target(::Ridge, observations::RidgeFitObs) = observations.y
-```
-
-Similarly, we must overload [`LearnAPI.features`](@ref), which extracts features from
-training data (objects that can be passed to `predict`) like this
-
-```@example anatomy2
 LearnAPI.features(::Ridge, observations::RidgeFitObs) = observations.A
 ```
-as the fallback mentioned above is no longer adequate.
-
 
 ### Important notes:
 
@@ -501,7 +494,8 @@ interfaces](@ref data_interfaces) for details.
 
 ### Addition of signatures for user convenience
 
-As above, we add a signature which plays no role vis-à-vis LearnAPI.jl.
+As above, we add a signature for convenience, which the LearnAPI.jl specification
+neither requires nor forbids:
 
 ```@example anatomy2
 LearnAPI.fit(learner::Ridge, X, y; kwargs...)  = fit(learner, (X, y); kwargs...)

diff --git a/docs/src/common_implementation_patterns.md b/docs/src/common_implementation_patterns.md
@@ -3,7 +3,7 @@
 !!! important
 
 	This section is only an implementation guide. The definitive specification of the
-	Learn API is given in [Reference](@ref reference).
+	LearnAPI is given in [Reference](@ref reference).
 
 This guide is intended to be consulted after reading [Anatomy of an Implementation](@ref),
 which introduces the main interface objects and terminology.

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,5 +1,15 @@
 ```@raw html
 <script async defer src="https://buttons.github.io/buttons.js"></script>
+
+<div style="font-size:1.4em;font-weight:bold;">
+  <a href="anatomy_of_an_implementation.html"
+    style="color: #389826;">Tutorial</a>           &nbsp;|&nbsp;
+  <a href="reference.html"
+    style="color: #9558B2;">Reference</a>      &nbsp;|&nbsp;
+  <a href="common_implementation_patterns.html"
+    style="color: #9558B2;">Patterns</a>
+</div>
+
 <span style="color: #9558B2;font-size:4.5em;">
 LearnAPI.jl</span>
 <br>
@@ -86,11 +96,11 @@ opts out. Moreover, the `fit` and `predict` methods will also be able to consume
 alternative data representations, for performance benefits in some situations.
 
 The fallback data interface is the [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl)
-`getobs/numobs` interface (here tagged as [`LearnAPI.RandomAccess()`](@ref)) and if the
+`getobs/numobs` interface, here tagged as [`LearnAPI.RandomAccess()`](@ref), and if the
 input consumed by the algorithm already implements that interface (tables, arrays, etc.)
 then overloading `obs` is completely optional. Plain iteration interfaces, with or without
-knowledge of the number of observations, can also be specified (to support, e.g., data
-loaders reading images from disk).
+knowledge of the number of observations, can also be specified, to support, e.g., data
+loaders reading images from disk.
 
 ## Learning more
 

diff --git a/docs/src/reference.md b/docs/src/reference.md
@@ -170,15 +170,15 @@ minimal (but useless) implementation, see the implementation of `SmallLearner`
 - [`inverse_transform`](@ref operations): for inverting the output of
   `transform` ("inverting" broadly understood)
 
-- [`LearnAPI.target`](@ref input), [`LearnAPI.weights`](@ref input),
-  [`LearnAPI.features`](@ref): for extracting relevant parts of training data, where
-  defined.
-
 - [`obs`](@ref data_interface): method for exposing to the user
   learner-specific representations of data, which are additionally guaranteed to
   implement the observation access API specified by
   [`LearnAPI.data_interface(learner)`](@ref).
 
+- [`LearnAPI.target`](@ref input), [`LearnAPI.weights`](@ref input),
+  [`LearnAPI.features`](@ref): for extracting relevant parts of training data, where
+  defined.
+
 - [Accessor functions](@ref accessor_functions): these include functions like
   `LearnAPI.feature_importances` and `LearnAPI.training_losses`, for extracting, from
   training outcomes, information common to many learners. This includes

diff --git a/docs/src/target_weights_features.md b/docs/src/target_weights_features.md
@@ -1,11 +1,13 @@
 # [`target`, `weights`, and `features`](@id input)
 
-Methods for extracting parts of training data:
+Methods for extracting parts of training observations. Here "observations" means the
+output of [`obs(learner, data)`](@ref); if `obs` is not overloaded for `learner`, then
+"observations" is any `data` supported in calls of the form [`fit(learner, data)`](@ref)
 
 ```julia
-LearnAPI.target(learner, data) -> <target variable>
-LearnAPI.weights(learner, data) -> <per-observation weights>
-LearnAPI.features(learner, data) -> <training "features", suitable input for `predict` or `transform`>
+LearnAPI.target(learner, observations) -> <target variable>
+LearnAPI.weights(learner, observations) -> <per-observation weights>
+LearnAPI.features(learner, observations) -> <training "features", suitable input for `predict` or `transform`>
 ```
 
 Here `data` is something supported in a call of the form `fit(learner, data)`. 
@@ -19,7 +21,8 @@ Supposing `learner` is a supervised classifier predicting a one-dimensional vect
 target:
 
 ```julia
-model = fit(learner, data)
+observations = obs(learner, data)
+model = fit(learner, observations)
 X = LearnAPI.features(learner, data)
 y = LearnAPI.target(learner, data)
 ŷ = predict(model, Point(), X)

diff --git a/src/obs.jl b/src/obs.jl
@@ -61,8 +61,8 @@ using `MLUtils.getobs`, with the obvious interpretation applying to the outcomes
 calls (e.g., if *all* observations are subsampled, then outcomes should be the same as if
 using the original data).
 
-Implicit in preceding requirements is that `obs(learner, _)` and `obs(model, _)` are
-involutive, meaning both the following hold:
+It is required that `obs(learner, _)` and `obs(model, _)` are involutive, meaning both the
+following hold:
 
 ```julia
 obs(learner, obs(learner, data)) == obs(learner, data)
@@ -81,14 +81,6 @@ only of suitable tables and arrays, then `obs` and `LearnAPI.data_interface` do
 to be overloaded. However, the user will get no performance benefits by using `obs` in
 that case.
 
-If overloading `obs(learner, data)` to output new model-specific representations of
-data, it may be necessary to also overload [`LearnAPI.features(learner,
-observations)`](@ref), [`LearnAPI.target(learner, observations)`](@ref) (supervised
-learners), and/or [`LearnAPI.weights(learner, observations)`](@ref) (if weights are
-supported), for each kind output `observations` of `obs(learner, data)`. Moreover, the
-outputs of these methods, applied to `observations`, must also implement the interface
-specified by [`LearnAPI.data_interface(learner)`](@ref).
-
 ## Sample implementation
 
 Refer to the ["Anatomy of an

diff --git a/src/target_weights_features.jl b/src/target_weights_features.jl
@@ -1,13 +1,13 @@
 """
-    LearnAPI.target(learner, data) -> target
+    LearnAPI.target(learner, observations) -> target
 
-Return, for each form of `data` supported in a call of the form [`fit(learner,
-data)`](@ref), the target variable part of `data`. If `nothing` is returned, the
+Return, for every conceivable `observations` returned by a call of the form [`obs(learner,
+data)`](@ref), the target variable part of `observations`. If `nothing` is returned, the
 `learner` does not see a target variable in training (is unsupervised).
 
-The returned object `y` has the same number of observations as `data`. If `data` is the
-output of an [`obs`](@ref) call, then `y` is additionally guaranteed to implement the
-data interface specified by [`LearnAPI.data_interface(learner)`](@ref).
+The returned object `y` has the same number of observations as `observations` does and is
+guaranteed to implement the data interface specified by
+[`LearnAPI.data_interface(learner)`](@ref).
 
 # Extended help
 
@@ -21,57 +21,61 @@ the LearnAPI.jl documentation.
 
 ## New implementations
 
-A fallback returns `nothing`. The method must be overloaded if `fit` consumes data
-including a target variable.
+A fallback returns `nothing`. The method must be overloaded if [`fit`](@ref) consumes data
+that includes a target variable. If `obs` is not being overloaded, then `observations`
+above is any `data` supported in calls of the form [`fit(learner, data)`](@ref).  The form
+of the output `y` should be suitable for pairing with the output of [`predict`](@ref), in
+the evaluation of a loss function, for example.
 
-If overloading [`obs`](@ref), ensure that the return value, unless `nothing`, implements
-the data interface specified by [`LearnAPI.data_interface(learner)`](@ref), in the special
-case that `data` is the output of an `obs` call.
+Ensure the object `y` returned by `LearnAPI.target`, unless `nothing`, implements the data
+interface specified by [`LearnAPI.data_interface(learner)`](@ref).
 
 $(DOC_IMPLEMENTED_METHODS(":(LearnAPI.target)"; overloaded=true))
 
 """
-target(::Any, data) = nothing
+target(::Any, observations) = nothing
 
 """
-    LearnAPI.weights(learner, data) -> weights
+    LearnAPI.weights(learner, observations) -> weights
 
-Return, for each form of `data` supported in a call of the form [`fit(learner,
-data)`](@ref), the per-observation weights part of `data`. Where `nothing` is returned, no
-weights are part of `data`, which is to be interpreted as uniform weighting.
+Return, for every conceivable `observations` returned by a call of the form [`obs(learner,
+data)`](@ref), the weights part of `observations`. Where `nothing` is returned, no weights
+are part of `data`, which is to be interpreted as uniform weighting.
 
-The returned object `w` has the same number of observations as `data`. If `data` is the
-output of an [`obs`](@ref) call, then `w` is additionally guaranteed to implement the
-data interface specified by [`LearnAPI.data_interface(learner)`](@ref).
+The returned object `w` has the same number of observations as `observations` does and is
+guaranteed to implement the data interface specified by
+[`LearnAPI.data_interface(learner)`](@ref).
 
 # Extended help
 
 # New implementations
 
-Overloading is optional. A fallback returns `nothing`.
+Overloading is optional. A fallback returns `nothing`. If `obs` is not being overloaded,
+then `observations` above is any `data` supported in calls of the form [`fit(learner,
+data)`](@ref).
 
-If overloading [`obs`](@ref), ensure that the return value, unless `nothing`, implements
-the data interface specified by [`LearnAPI.data_interface(learner)`](@ref), in the special
-case that `data` is the output of an `obs` call.
+Ensure the returned object, unless `nothing`, implements the data interface specified by
+[`LearnAPI.data_interface(learner)`](@ref).
 
 $(DOC_IMPLEMENTED_METHODS(":(LearnAPI.weights)"; overloaded=true))
 
 """
-weights(::Any, data) = nothing
+weights(::Any, observations) = nothing
 
 """
-    LearnAPI.features(learner, data)
+    LearnAPI.features(learner, observations)
 
-Return, for each form of `data` supported in a call of the form [`fit(learner,
-data)`](@ref), the "features" part of `data` (as opposed to the target
-variable, for example).
+Return, for every conceivable `observations` returned by a call of the form [`obs(learner,
+data)`](@ref), the "features" part of `data` (as opposed to the target variable, for
+example).
 
 The returned object `X` may always be passed to `predict` or `transform`, where
 implemented, as in the following sample workflow:
 
 ```julia
-model = fit(learner, data)
-X = LearnAPI.features(learner, data)
+observations = obs(learner, data)
+model = fit(learner, observations)
+X = LearnAPI.features(learner, observations)
 ŷ = predict(model, kind_of_proxy, X) # eg, `kind_of_proxy = Point()`
 ```
 
@@ -80,28 +84,30 @@ For supervised models (i.e., where `:(LearnAPI.target) in LearnAPI.functions(lea
 data)`, the training target.
 
 The object `X` returned by `LearnAPI.target` has the same number of observations as
-`data`. If `data` is the output of an [`obs`](@ref) call, then `X` is additionally
-guaranteed to implement the data interface specified by
+`observations` does and is guaranteed to implement the data interface specified by
 [`LearnAPI.data_interface(learner)`](@ref).
 
 # Extended help
 
 # New implementations
 
+A fallback returns `first(observations)` if `observations` is a tuple, and otherwise
+returns `observations`. New implementations may need to overload this method if this
+fallback is inadequate.
+
 For density estimators, whose `fit` typically consumes *only* a target variable, you
-should overload this method to return `nothing`.
+should overload this method to return `nothing`.  If `obs` is not being overloaded, then
+`observations` above is any `data` supported in calls of the form [`fit(learner,
+data)`](@ref).
 
 It must otherwise be possible to pass the return value `X` to `predict` and/or
-`transform`, and `X` must have same number of observations as `data`. A fallback returns
-`first(data)` if `data` is a tuple, and otherwise returns `data`.
+`transform`, and `X` must have same number of observations as `data`.
 
-Further overloadings may be necessary to handle the case that `data` is the output of
-[`obs(learner, data)`](@ref), if `obs` is being overloaded. In this case, be sure that
-`X`, unless `nothing`, implements the data interface specified by
+Ensure the returned object, unless `nothing`, implements the data interface specified by
 [`LearnAPI.data_interface(learner)`](@ref).
 
 """
-features(learner, data) = _first(data)
-_first(data) = data
-_first(data::Tuple) = first(data)
+features(learner, observations) = _first(observations)
+_first(observations) = observations
+_first(observations::Tuple) = first(observations)
 # note the factoring above guards against method ambiguities
diff --git a/src/traits.jl b/src/traits.jl
@@ -387,7 +387,7 @@ iteration_parameter(::Any) = nothing
 
 Return an upper bound `S` on the scitype of individual observations guaranteed to work
 when calling `fit`: if `observations = obs(learner, data)` and
-`ScientificTypes.scitype(o) <:S` for each `o` in `observations`, then the call
+`ScientificTypes.scitype(collect(o)) <:S` for each `o` in `observations`, then the call
 `fit(learner, data)` is supported.
 
 $DOC_EXPLAIN_EACHOBS
@@ -396,7 +396,7 @@ See also [`LearnAPI.target_observation_scitype`](@ref).
 
 # New implementations
 
-Optional. The fallback return value is `Union{}`. 
+Optional. The fallback return value is `Union{}`.
 
 """
 fit_observation_scitype(::Any) = Union{}