From dc395c509e7e361262b9b5d39890b810d65fbf6f Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Thu, 3 Oct 2024 03:35:16 +0000 Subject: [PATCH] build based on 15340c1 --- dev/.documenter-siteinfo.json | 2 +- dev/accessor_functions/index.html | 2 +- dev/anatomy_of_an_implementation/index.html | 22 +++++++++---------- dev/common_implementation_patterns/index.html | 2 +- dev/fit_update/index.html | 6 ++--- dev/index.html | 2 +- dev/kinds_of_target_proxy/index.html | 2 +- dev/minimize/index.html | 2 +- dev/obs/index.html | 2 +- dev/patterns/classification/index.html | 2 +- dev/patterns/clusterering/index.html | 2 +- dev/patterns/dimension_reduction/index.html | 2 +- dev/patterns/feature_engineering/index.html | 2 +- .../incremental_algorithms/index.html | 2 +- dev/patterns/incremental_models/index.html | 2 +- dev/patterns/iterative_algorithms/index.html | 2 +- .../index.html | 2 +- dev/patterns/meta_algorithms/index.html | 2 +- .../missing_value_imputation/index.html | 2 +- dev/patterns/outlier_detection/index.html | 2 +- dev/patterns/regression/index.html | 2 +- dev/patterns/static_algorithms/index.html | 2 +- .../supervised_bayesian_algorithms/index.html | 2 +- .../supervised_bayesian_models/index.html | 2 +- dev/patterns/survival_analysis/index.html | 2 +- .../time_series_classification/index.html | 2 +- .../time_series_forecasting/index.html | 2 +- dev/predict_transform/index.html | 6 ++--- dev/reference/index.html | 2 +- dev/target_weights_features/index.html | 4 ++-- dev/testing_an_implementation/index.html | 2 +- dev/traits/index.html | 8 +++---- 32 files changed, 50 insertions(+), 50 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 324134a3..f3e333ab 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-10-03T03:32:42","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-10-03T03:35:13","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/accessor_functions/index.html b/dev/accessor_functions/index.html index 29abfa2d..09e62460 100644 --- a/dev/accessor_functions/index.html +++ b/dev/accessor_functions/index.html @@ -1,3 +1,3 @@ Accessor Functions · LearnAPI.jl

Accessor Functions

The sole argument of an accessor function is the output, model, of fit. Algorithms are free to implement any number of these, or none of them.

Algorithm-specific accessor functions may also be implemented. The names of all accessor functions are included in the list returned by LearnAPI.functions(algorithm).

Implementation guide

All new implementations must implement LearnAPI.algorithm. While, all others are optional, any implemented accessor functions must be added to the list returned by LearnAPI.functions.

Reference

LearnAPI.algorithmFunction
LearnAPI.algorithm(model)
-LearnAPI.algorithm(minimized_model)

Recover the algorithm used to train model or the output of minimize(model).

In other words, if model = fit(algorithm, data...), for some algorithm and data, then

LearnAPI.algorithm(model) == algorithm == LearnAPI.algorithm(minimize(model))

is true.

New implementations

Implementation is compulsory for new algorithm types. The behaviour described above is the only contract. If implemented, you must include :(LearnAPI.algorithm) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.extrasFunction
LearnAPI.extras(model)

Return miscellaneous byproducts of an algorithm's computation, from the object model returned by a call of the form fit(algorithm, data).

For "static" algorithms (those without training data) it may be necessary to first call transform or predict on model.

See also fit.

New implementations

Implementation is discouraged for byproducts already covered by other LearnAPI.jl accessor functions: LearnAPI.algorithm, LearnAPI.coefficients, LearnAPI.intercept, LearnAPI.tree, LearnAPI.trees, LearnAPI.feature_importances, LearnAPI.training_labels, LearnAPI.training_losses, LearnAPI.training_predictions, LearnAPI.training_scores and LearnAPI.components.

If implemented, you must include :(LearnAPI.training_labels) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.coefficientsFunction
LearnAPI.coefficients(model)

For a linear model, return the learned coefficients. The value returned has the form of an abstract vector of feature_or_class::Symbol => coefficient::Real pairs (e.g [:gender => 0.23, :height => 0.7, :weight => 0.1]) or, in the case of multi-targets, feature::Symbol => coefficients::AbstractVector{<:Real} pairs.

The model reports coefficients if :(LearnAPI.coefficients) in LearnAPI.functions(Learn.algorithm(model)).

See also LearnAPI.intercept.

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.coefficients) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.interceptFunction
LearnAPI.intercept(model)

For a linear model, return the learned intercept. The value returned is Real (single target) or an AbstractVector{<:Real} (multi-target).

The model reports intercept if :(LearnAPI.intercept) in LearnAPI.functions(Learn.algorithm(model)).

See also LearnAPI.coefficients.

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.intercept) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.treeFunction
LearnAPI.tree(model)

Return a user-friendly tree, in the form of a root object implementing the following interface defined in AbstractTrees.jl:

  • subtypes AbstractTrees.AbstractNode{T}
  • implements AbstractTrees.children()
  • implements AbstractTrees.printnode()

Such a tree can be visualized using the TreeRecipe.jl package, for example.

See also LearnAPI.trees.

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.tree) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.treesFunction
LearnAPI.trees(model)

For some ensemble model, return a vector of trees. See LearnAPI.tree for the form of such trees.

See also LearnAPI.tree.

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.trees) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.feature_importancesFunction
LearnAPI.feature_importances(model)

Return the algorithm-specific feature importances of a model output by fit(algorithm, ...) for some algorithm. The value returned has the form of an abstract vector of feature::Symbol => importance::Real pairs (e.g [:gender => 0.23, :height => 0.7, :weight => 0.1]).

The algorithm supports feature importances if :(LearnAPI.feature_importances) in LearnAPI.functions(algorithm).

If an algorithm is sometimes unable to report feature importances then LearnAPI.feature_importances will return all importances as 0.0, as in [:gender => 0.0, :height => 0.0, :weight => 0.0].

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.feature_importances) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.training_lossesFunction
LearnAPI.training_losses(model)

Return the training losses obtained when running model = fit(algorithm, ...) for some algorithm.

See also fit.

New implementations

Implement for iterative algorithms that compute and record training losses as part of training (e.g. neural networks).

If implemented, you must include :(LearnAPI.training_losses) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.training_predictionsFunction
LearnAPI.training_predictions(model)

Return internally computed training predictions when running model = fit(algorithm, ...) for some algorithm.

See also fit.

New implementations

Implement for iterative algorithms that compute and record training losses as part of training (e.g. neural networks).

If implemented, you must include :(LearnAPI.training_predictions) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.training_scoresFunction
LearnAPI.training_scores(model)

Return the training scores obtained when running model = fit(algorithm, ...) for some algorithm.

See also fit.

New implementations

Implement for algorithms, such as outlier detection algorithms, which associate a score with each observation during training, where these scores are of interest in later processes (e.g, in defining normalized scores for new data).

If implemented, you must include :(LearnAPI.training_scores) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.training_labelsFunction
LearnAPI.training_labels(model)

Return the training labels obtained when running model = fit(algorithm, ...) for some algorithm.

See also fit.

New implementations

If implemented, you must include :(LearnAPI.training_labels) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.componentsFunction
LearnAPI.components(model)

For a composite model, return the component models (fit outputs). These will be in the form of a vector of named pairs, property_name::Symbol => component_model. Here property_name is the name of some algorithm-valued property (hyper-parameter) of algorithm = LearnAPI.algorithm(model).

A composite model is one for which the corresponding algorithm includes one or more algorithm-valued properties, and for which LearnAPI.is_composite(algorithm) is true.

See also is_composite.

New implementations

Implementent if and only if model is a composite model.

If implemented, you must include :(LearnAPI.components) in the tuple returned by the LearnAPI.functions trait. .

source
+LearnAPI.algorithm(minimized_model)

Recover the algorithm used to train model or the output of minimize(model).

In other words, if model = fit(algorithm, data...), for some algorithm and data, then

LearnAPI.algorithm(model) == algorithm == LearnAPI.algorithm(minimize(model))

is true.

New implementations

Implementation is compulsory for new algorithm types. The behaviour described above is the only contract. If implemented, you must include :(LearnAPI.algorithm) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.extrasFunction
LearnAPI.extras(model)

Return miscellaneous byproducts of an algorithm's computation, from the object model returned by a call of the form fit(algorithm, data).

For "static" algorithms (those without training data) it may be necessary to first call transform or predict on model.

See also fit.

New implementations

Implementation is discouraged for byproducts already covered by other LearnAPI.jl accessor functions: LearnAPI.algorithm, LearnAPI.coefficients, LearnAPI.intercept, LearnAPI.tree, LearnAPI.trees, LearnAPI.feature_importances, LearnAPI.training_labels, LearnAPI.training_losses, LearnAPI.training_predictions, LearnAPI.training_scores and LearnAPI.components.

If implemented, you must include :(LearnAPI.training_labels) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.coefficientsFunction
LearnAPI.coefficients(model)

For a linear model, return the learned coefficients. The value returned has the form of an abstract vector of feature_or_class::Symbol => coefficient::Real pairs (e.g [:gender => 0.23, :height => 0.7, :weight => 0.1]) or, in the case of multi-targets, feature::Symbol => coefficients::AbstractVector{<:Real} pairs.

The model reports coefficients if :(LearnAPI.coefficients) in LearnAPI.functions(Learn.algorithm(model)).

See also LearnAPI.intercept.

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.coefficients) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.interceptFunction
LearnAPI.intercept(model)

For a linear model, return the learned intercept. The value returned is Real (single target) or an AbstractVector{<:Real} (multi-target).

The model reports intercept if :(LearnAPI.intercept) in LearnAPI.functions(Learn.algorithm(model)).

See also LearnAPI.coefficients.

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.intercept) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.treeFunction
LearnAPI.tree(model)

Return a user-friendly tree, in the form of a root object implementing the following interface defined in AbstractTrees.jl:

  • subtypes AbstractTrees.AbstractNode{T}
  • implements AbstractTrees.children()
  • implements AbstractTrees.printnode()

Such a tree can be visualized using the TreeRecipe.jl package, for example.

See also LearnAPI.trees.

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.tree) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.treesFunction
LearnAPI.trees(model)

For some ensemble model, return a vector of trees. See LearnAPI.tree for the form of such trees.

See also LearnAPI.tree.

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.trees) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.feature_importancesFunction
LearnAPI.feature_importances(model)

Return the algorithm-specific feature importances of a model output by fit(algorithm, ...) for some algorithm. The value returned has the form of an abstract vector of feature::Symbol => importance::Real pairs (e.g [:gender => 0.23, :height => 0.7, :weight => 0.1]).

The algorithm supports feature importances if :(LearnAPI.feature_importances) in LearnAPI.functions(algorithm).

If an algorithm is sometimes unable to report feature importances then LearnAPI.feature_importances will return all importances as 0.0, as in [:gender => 0.0, :height => 0.0, :weight => 0.0].

New implementations

Implementation is optional.

If implemented, you must include :(LearnAPI.feature_importances) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.training_lossesFunction
LearnAPI.training_losses(model)

Return the training losses obtained when running model = fit(algorithm, ...) for some algorithm.

See also fit.

New implementations

Implement for iterative algorithms that compute and record training losses as part of training (e.g. neural networks).

If implemented, you must include :(LearnAPI.training_losses) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.training_predictionsFunction
LearnAPI.training_predictions(model)

Return internally computed training predictions when running model = fit(algorithm, ...) for some algorithm.

See also fit.

New implementations

Implement for iterative algorithms that compute and record training losses as part of training (e.g. neural networks).

If implemented, you must include :(LearnAPI.training_predictions) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.training_scoresFunction
LearnAPI.training_scores(model)

Return the training scores obtained when running model = fit(algorithm, ...) for some algorithm.

See also fit.

New implementations

Implement for algorithms, such as outlier detection algorithms, which associate a score with each observation during training, where these scores are of interest in later processes (e.g, in defining normalized scores for new data).

If implemented, you must include :(LearnAPI.training_scores) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.training_labelsFunction
LearnAPI.training_labels(model)

Return the training labels obtained when running model = fit(algorithm, ...) for some algorithm.

See also fit.

New implementations

If implemented, you must include :(LearnAPI.training_labels) in the tuple returned by the LearnAPI.functions trait. .

source
LearnAPI.componentsFunction
LearnAPI.components(model)

For a composite model, return the component models (fit outputs). These will be in the form of a vector of named pairs, property_name::Symbol => component_model. Here property_name is the name of some algorithm-valued property (hyper-parameter) of algorithm = LearnAPI.algorithm(model).

A composite model is one for which the corresponding algorithm includes one or more algorithm-valued properties, and for which LearnAPI.is_composite(algorithm) is true.

See also is_composite.

New implementations

Implementent if and only if model is a composite model.

If implemented, you must include :(LearnAPI.components) in the tuple returned by the LearnAPI.functions trait. .

source
diff --git a/dev/anatomy_of_an_implementation/index.html b/dev/anatomy_of_an_implementation/index.html index 4f7e61ad..d17210ed 100644 --- a/dev/anatomy_of_an_implementation/index.html +++ b/dev/anatomy_of_an_implementation/index.html @@ -68,13 +68,13 @@ ytrain = y[train] model = fit(algorithm, (Xtrain, ytrain)) # `fit(algorithm, Xtrain, ytrain)` will also work ŷ = predict(model, Tables.subset(X, test))
4-element Vector{Float64}:
- 1.7787839661879423
- 1.474736217080038
- 0.8258896239485449
- 0.8872860616793237

Extracting coefficients:

LearnAPI.coefficients(model)
3-element Vector{Pair{Symbol, Float64}}:
- :a => 1.3075877481847373
- :b => 0.7281276945713715
- :c => 1.251451735167346

Serialization/deserialization:

using Serialization
+ 3.1200819968645677
+ 1.3058374044936492
+ 3.1093884946089787
+ 0.4275542762330104

Extracting coefficients:

LearnAPI.coefficients(model)
3-element Vector{Pair{Symbol, Float64}}:
+ :a => 1.8733889088746318
+ :b => 0.11444988311319655
+ :c => 1.8143489770611856

Serialization/deserialization:

using Serialization
 small_model = minimize(model)
 filename = tempname()
 serialize(filename, small_model)
recovered_model = deserialize(filename)
@@ -124,7 +124,7 @@
 model = fit(algorithm, MLUtils.getobs(observations_for_fit, train))
 observations_for_predict = obs(model, X)
 ẑ = predict(model, MLUtils.getobs(observations_for_predict, test))
4-element Vector{Float64}:
- 2.038937874999595
- 2.039772619460228
- 2.185727075357462
- 3.169261028759361
@assert ẑ == ŷ

For an application of obs to efficient cross-validation, see here.


¹ In LearnAPI.jl a table is any object X implementing the Tables.jl interface, additionally satisfying Tables.istable(X) == true and implementing DataAPI.nrow (and whence MLUtils.numobs). Tables that are also (unnamed) tuples are disallowed.

² An implementation can provide further accessor functions, if necessary, but like the native ones, they must be included in the LearnAPI.functions declaration.

³ The last index must be the observation index.

⁴ The data = (X, y) pattern implemented here is not the only supported pattern. For, example, data might be a single table containing both features and target variable. In this case, it will be necessary to overload LearnAPI.features in addition to LearnAPI.target; the name of the target column would need to be a hyperparameter.

+ 2.071540862735006 + 2.203838455483823 + 0.49668895250195255 + 2.2997395734156787
@assert ẑ == ŷ

For an application of obs to efficient cross-validation, see here.


¹ In LearnAPI.jl a table is any object X implementing the Tables.jl interface, additionally satisfying Tables.istable(X) == true and implementing DataAPI.nrow (and whence MLUtils.numobs). Tables that are also (unnamed) tuples are disallowed.

² An implementation can provide further accessor functions, if necessary, but like the native ones, they must be included in the LearnAPI.functions declaration.

³ The last index must be the observation index.

⁴ The data = (X, y) pattern implemented here is not the only supported pattern. For, example, data might be a single table containing both features and target variable. In this case, it will be necessary to overload LearnAPI.features in addition to LearnAPI.target; the name of the target column would need to be a hyperparameter.

diff --git a/dev/common_implementation_patterns/index.html b/dev/common_implementation_patterns/index.html index ef7ffc35..2b2e83ef 100644 --- a/dev/common_implementation_patterns/index.html +++ b/dev/common_implementation_patterns/index.html @@ -1,2 +1,2 @@ -Common Implementation Patterns · LearnAPI.jl

Common Implementation Patterns

🚧
Warning

Under construction

Warning

This section is only an implementation guide. The definitive specification of the Learn API is given in Reference.

This guide is intended to be consulted after reading Anatomy of an Implementation, which introduces the main interface objects and terminology.

Although an implementation is defined purely by the methods and traits it implements, most implementations fall into one (or more) of the following informally understood patterns or "tasks":

+Common Implementation Patterns · LearnAPI.jl

Common Implementation Patterns

🚧
Warning

Under construction

Warning

This section is only an implementation guide. The definitive specification of the Learn API is given in Reference.

This guide is intended to be consulted after reading Anatomy of an Implementation, which introduces the main interface objects and terminology.

Although an implementation is defined purely by the methods and traits it implements, most implementations fall into one (or more) of the following informally understood patterns or "tasks":

diff --git a/dev/fit_update/index.html b/dev/fit_update/index.html index 8b7a7f3f..a0e12f69 100644 --- a/dev/fit_update/index.html +++ b/dev/fit_update/index.html @@ -23,16 +23,16 @@ # But two-line version exposes byproducts of the clustering algorithm (e.g., outliers): LearnAPI.extras(model)

Implementation guide

Training

methodfallbackcompulsory?
fit(algorithm, data; verbosity=1)ignores data and applies signature belowyes, unless static
fit(algorithm; verbosity=1)noneno, unless static

Updating

methodfallbackcompulsory?
update(model, data; verbosity=1, hyperparameter_updates...)noneno
update_observations(model, data; verbosity=1, hyperparameter_updates...)noneno
update_features(model, data; verbosity=1, hyperparameter_updates...)noneno

There are some contracts regarding the behaviour of the update methods, as they relate to a previous fit call. Consult the document strings for details.

Reference

LearnAPI.fitFunction
fit(algorithm, data; verbosity=1)
 fit(algorithm; verbosity=1)

Execute the algorithm with configuration algorithm using the provided training data, returning an object, model, on which other methods, such as predict or transform, can be dispatched. LearnAPI.functions(algorithm) returns a list of methods that can be applied to either algorithm or model.

The second signature is provided by algorithms that do not generalize to new observations ("static" algorithms). In that case, transform(model, data) or predict(model, ..., data) carries out the actual algorithm execution, writing any byproducts of that operation to the mutable object model returned by fit.

Whenever fit expects a tuple form of argument, data = (X1, ..., Xn), then the signature fit(algorithm, X1, ..., Xn) is also provided.

For example, a supervised classifier will typically admit this workflow:

model = fit(algorithm, (X, y)) # or `fit(algorithm, X, y)`
-ŷ = predict(model, Xnew)

Use verbosity=0 for warnings only, and -1 for silent training.

See also predict, transform, inverse_transform, LearnAPI.functions, obs.

Extended help

New implementations

Implementation is compulsory. The signature must include verbosity. A fallback for the first signature calls the second, ignoring data:

fit(algorithm, data; kwargs...) = fit(algorithm; kwargs...)

Fallbacks also provide the data slurping versions.

Assumptions about data

By default, it is assumed that data supports the LearnAPI.RandomAccess interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof). See LearnAPI.RandomAccess for details. If this is not the case then an implementation must either: (i) overload obs to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess; or (ii) overload the trait LearnAPI.data_interface to specify a more relaxed data API. Refer to document strings for details.

source
LearnAPI.updateFunction
update(model, data; verbosity=1, hyperparam_replacements...)

Return an updated version of the model object returned by a previous fit or update call, but with the specified hyperparameter replacements, in the form p1=value1, p2=value2, ....

Provided that data is identical with the data presented in a preceding fit call, as in the example below, execution is semantically equivalent to the call fit(algorithm, data), where algorithm is LearnAPI.algorithm(model) with the specified replacements. In some cases (typically, when changing an iteration parameter) there may be a performance benefit to using update instead of retraining ab initio.

If data differs from that in the preceding fit or update call, then behaviour is algorithm-specific.

algorithm = MyForest(ntrees=100)
+ŷ = predict(model, Xnew)

Use verbosity=0 for warnings only, and -1 for silent training.

See also predict, transform, inverse_transform, LearnAPI.functions, obs.

Extended help

New implementations

Implementation is compulsory. The signature must include verbosity. A fallback for the first signature calls the second, ignoring data:

fit(algorithm, data; kwargs...) = fit(algorithm; kwargs...)

Fallbacks also provide the data slurping versions.

Assumptions about data

By default, it is assumed that data supports the LearnAPI.RandomAccess interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof). See LearnAPI.RandomAccess for details. If this is not the case then an implementation must either: (i) overload obs to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess; or (ii) overload the trait LearnAPI.data_interface to specify a more relaxed data API. Refer to document strings for details.

source
LearnAPI.updateFunction
update(model, data; verbosity=1, hyperparam_replacements...)

Return an updated version of the model object returned by a previous fit or update call, but with the specified hyperparameter replacements, in the form p1=value1, p2=value2, ....

Provided that data is identical with the data presented in a preceding fit call, as in the example below, execution is semantically equivalent to the call fit(algorithm, data), where algorithm is LearnAPI.algorithm(model) with the specified replacements. In some cases (typically, when changing an iteration parameter) there may be a performance benefit to using update instead of retraining ab initio.

If data differs from that in the preceding fit or update call, then behaviour is algorithm-specific.

algorithm = MyForest(ntrees=100)
 
 # train with 100 trees:
 model = fit(algorithm, data)
 
 # add 50 more trees:
-model = update(model, data; ntrees=150)

See also fit, update_observations, update_features.

New implementations

Implementation is optional. The signature must include verbosity. If implemented, you must include :(LearnAPI.update) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.update_observationsFunction
update_observations(model, new_data; verbosity=1, parameter_replacements...)

Return an updated version of the model object returned by a previous fit or update call given the new observations present in new_data. One may additionally specify hyperparameter replacements in the form p1=value1, p2=value2, ....

When following the call fit(algorithm, data), the update call is semantically equivalent to retraining ab initio using a concatenation of data and new_data, provided there are no hyperparameter replacements. Behaviour is otherwise algorithm-specific.

algorithm = MyNeuralNetwork(epochs=10, learning_rate=0.01)
+model = update(model, data; ntrees=150)

See also fit, update_observations, update_features.

New implementations

Implementation is optional. The signature must include verbosity. If implemented, you must include :(LearnAPI.update) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.update_observationsFunction
update_observations(model, new_data; verbosity=1, parameter_replacements...)

Return an updated version of the model object returned by a previous fit or update call given the new observations present in new_data. One may additionally specify hyperparameter replacements in the form p1=value1, p2=value2, ....

When following the call fit(algorithm, data), the update call is semantically equivalent to retraining ab initio using a concatenation of data and new_data, provided there are no hyperparameter replacements. Behaviour is otherwise algorithm-specific.

algorithm = MyNeuralNetwork(epochs=10, learning_rate=0.01)
 
 # train for ten epochs:
 model = fit(algorithm, data)
 
 # train for two more epochs using new data and new learning rate:
-model = update_observations(model, new_data; epochs=2, learning_rate=0.1)

See also fit, update, update_features.

Extended help

New implementations

Implementation is optional. The signature must include verbosity. If implemented, you must include :(LearnAPI.update_observations) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.update_featuresFunction
update_features(model, new_data; verbosity=1, parameter_replacements...)

Return an updated version of the model object returned by a previous fit or update call given the new features encapsulated in new_data. One may additionally specify hyperparameter replacements in the form p1=value1, p2=value2, ....

When following the call fit(algorithm, data), the update call is semantically equivalent to retraining ab initio using a concatenation of data and new_data, provided there are no hyperparameter replacements. Behaviour is otherwise algorithm-specific.

See also fit, update, update_features.

Extended help

New implementations

Implementation is optional. The signature must include verbosity. If implemented, you must include :(LearnAPI.update_features) in the tuple returned by the LearnAPI.functions trait.

source
+model = update_observations(model, new_data; epochs=2, learning_rate=0.1)

See also fit, update, update_features.

Extended help

New implementations

Implementation is optional. The signature must include verbosity. If implemented, you must include :(LearnAPI.update_observations) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.update_featuresFunction
update_features(model, new_data; verbosity=1, parameter_replacements...)

Return an updated version of the model object returned by a previous fit or update call given the new features encapsulated in new_data. One may additionally specify hyperparameter replacements in the form p1=value1, p2=value2, ....

When following the call fit(algorithm, data), the update call is semantically equivalent to retraining ab initio using a concatenation of data and new_data, provided there are no hyperparameter replacements. Behaviour is otherwise algorithm-specific.

See also fit, update, update_features.

Extended help

New implementations

Implementation is optional. The signature must include verbosity. If implemented, you must include :(LearnAPI.update_features) in the tuple returned by the LearnAPI.functions trait.

source
diff --git a/dev/index.html b/dev/index.html index a47b160c..aa9fc854 100644 --- a/dev/index.html +++ b/dev/index.html @@ -32,4 +32,4 @@ # Recover saved model and algorithm configuration: recovered_model = deserialize("my_random_forest.jls") @assert LearnAPI.algorithm(recovered_model) == forest -@assert predict(recovered_model, Point(), Xnew) == ŷ

Distribution and Point are singleton types owned by LearnAPI.jl. They allow dispatch based on the kind of target proxy, a key LearnAPI.jl concept. LearnAPI.jl places more emphasis on the notion of target variables and target proxies than on the usual supervised/unsupervised learning dichotomy. From this point of view, a supervised algorithm is simply one in which a target variable exists, and happens to appear as an input to training but not to prediction.

Data interfaces

Algorithms are free to consume data in any format. However, a method called obs (read as "observations") gives users and meta-algorithms access to an algorithm-specific representation of input data, which is also guaranteed to implement a standard interface for accessing individual observations, unless the algorithm explicitly opts out. Moreover, the fit and predict methods will also be able to consume these alternative data representations, for performance benefits in some situations.

The fallback data interface is the MLUtils.jl getobs/numobs interface (here tagged as LearnAPI.RandomAccess()) and if the input consumed by the algorithm already implements that interface (tables, arrays, etc.) then overloading obs is completely optional. Plain iteration interfaces, with or without knowledge of the number of observations, can also be specified (to support, e.g., data loaders reading images from disk).

Learning more

+@assert predict(recovered_model, Point(), Xnew) == ŷ

Distribution and Point are singleton types owned by LearnAPI.jl. They allow dispatch based on the kind of target proxy, a key LearnAPI.jl concept. LearnAPI.jl places more emphasis on the notion of target variables and target proxies than on the usual supervised/unsupervised learning dichotomy. From this point of view, a supervised algorithm is simply one in which a target variable exists, and happens to appear as an input to training but not to prediction.

Data interfaces

Algorithms are free to consume data in any format. However, a method called obs (read as "observations") gives users and meta-algorithms access to an algorithm-specific representation of input data, which is also guaranteed to implement a standard interface for accessing individual observations, unless the algorithm explicitly opts out. Moreover, the fit and predict methods will also be able to consume these alternative data representations, for performance benefits in some situations.

The fallback data interface is the MLUtils.jl getobs/numobs interface (here tagged as LearnAPI.RandomAccess()) and if the input consumed by the algorithm already implements that interface (tables, arrays, etc.) then overloading obs is completely optional. Plain iteration interfaces, with or without knowledge of the number of observations, can also be specified (to support, e.g., data loaders reading images from disk).

Learning more

diff --git a/dev/kinds_of_target_proxy/index.html b/dev/kinds_of_target_proxy/index.html index 202d2623..9a3fdef9 100644 --- a/dev/kinds_of_target_proxy/index.html +++ b/dev/kinds_of_target_proxy/index.html @@ -1,2 +1,2 @@ -Kinds of Target Proxy · LearnAPI.jl

Kinds of Target Proxy

The available kinds of target proxy (used for predict dispatch) are classified by subtypes of LearnAPI.KindOfProxy. These types are intended for dispatch only and have no fields.

LearnAPI.KindOfProxyType
LearnAPI.KindOfProxy

Abstract type whose concrete subtypes T each represent a different kind of proxy for some target variable, associated with some algorithm. Instances T() are used to request the form of target predictions in predict calls.

See LearnAPI.jl documentation for an explanation of "targets" and "target proxies".

For example, Distribution is a concrete subtype of LearnAPI.KindOfProxy and a call like predict(model, Distribution(), Xnew) returns a data object whose observations are probability density/mass functions, assuming algorithm supports predictions of that form.

The instances of LearnAPI.KindOfProxy are: ConfidenceInterval(), Continuous(), Distribution(), Expectile(), Fuzzy(), HazardFunction(), LabelAmbiguous(), LabelAmbiguousDistribution(), LabelAmbiguousFuzzy(), LabelAmbiguousSampleable(), LogDistribution(), LogProbability(), OutlierScore(), Parametric(), Point(), ProbabilisticFuzzy(), Probability(), Quantile(), Sampleable(), SurvivalDistribution(), SurvivalFunction(), SingleDistribution(), SingleLogDistribution(), SingleSampeable(), JointDistribution(), JointLogDistribution() and JointSampleable().

source

Simple target proxies

LearnAPI.IIDType
LearnAPI.IID <: LearnAPI.KindOfProxy

Abstract subtype of LearnAPI.KindOfProxy. If kind_of_proxy is an instance of LearnAPI.IID then, given data constisting of $n$ observations, the following must hold:

  • ŷ = LearnAPI.predict(model, kind_of_proxy, data) is data also consisting of $n$ observations.

  • The $j$th observation of , for any $j$, depends only on the $j$th observation of the provided data (no correlation between observations).

See also LearnAPI.KindOfProxy.

Extended help

typeform of an observation
LearnAPI.Pointsame as target observations; may have the interpretation of a 50% quantile, 50% expectile or mode
LearnAPI.Sampleableobject that can be sampled to obtain object of the same form as target observation
LearnAPI.Distributionexplicit probability density/mass function whose sample space is all possible target observations
LearnAPI.LogDistributionexplicit log-probability density/mass function whose sample space is possible target observations
LearnAPI.Probability¹numerical probability or probability vector
LearnAPI.LogProbability¹log-probability or log-probability vector
LearnAPI.Parametric¹a list of parameters (e.g., mean and variance) describing some distribution
LearnAPI.LabelAmbiguouscollections of labels (in case of multi-class target) but without a known correspondence to the original target labels (and of possibly different number) as in, e.g., clustering
LearnAPI.LabelAmbiguousSampleablesampleable version of LabelAmbiguous; see Sampleable above
LearnAPI.LabelAmbiguousDistributionpdf/pmf version of LabelAmbiguous; see Distribution above
LearnAPI.LabelAmbiguousFuzzysame as LabelAmbiguous but with multiple values of indeterminant number
LearnAPI.Quantile²same as target but with quantile interpretation
LearnAPI.Expectile²same as target but with expectile interpretation
LearnAPI.ConfidenceInterval²confidence interval
LearnAPI.Fuzzyfinite but possibly varying number of target observations
LearnAPI.ProbabilisticFuzzyas for Fuzzy but labeled with probabilities (not necessarily summing to one)
LearnAPI.SurvivalFunctionsurvival function
LearnAPI.SurvivalDistributionprobability distribution for survival time
LearnAPI.SurvivalHazardFunctionhazard function for survival time
LearnAPI.OutlierScorenumerical score reflecting degree of outlierness (not necessarily normalized)
LearnAPI.Continuousreal-valued approximation/interpolation of a discrete-valued target, such as a count (e.g., number of phone calls)

¹Provided for completeness but discouraged to avoid ambiguities in representation.

²The level will be controlled by a hyper-parameter; models providing only quantiles or expectiles at 50% will provide Point instead.

source

Proxies for density estimation algorithms

LearnAPI.SingleType
Single <: KindOfProxy

Abstract subtype of LearnAPI.KindOfProxy. It applies only to algorithms for which predict has no data argument, i.e., is of the form predict(model, kind_of_proxy). An example is an algorithm learning a probability distribution from samples, and we regard the samples as drawn from the "target" variable. If in this case, kind_of_proxy is an instance of LearnAPI.Single then, predict(algorithm) returns a single object representing a probability distribution.

type Tform of output of predict(model, ::T)
LearnAPI.SingleSampleableobject that can be sampled to obtain a single target observation
LearnAPI.SingleDistributionexplicit probability density/mass function for sampling the target
LearnAPI.SingleLogDistributionexplicit log-probability density/mass function for sampling the target
source

Joint probability distributions

LearnAPI.JointType
Joint <: KindOfProxy

Abstract subtype of LearnAPI.KindOfProxy. If kind_of_proxy is an instance of LearnAPI.Joint then, given data consisting of $n$ observations, predict(model, kind_of_proxy, data) represents a single probability distribution for the sample space $Y^n$, where $Y$ is the space from which the target variable takes its values.

type Tform of output of predict(model, ::T, data)
LearnAPI.JointSampleableobject that can be sampled to obtain a vector whose elements have the form of target observations; the vector length matches the number of observations in data.
LearnAPI.JointDistributionexplicit probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in data
LearnAPI.JointLogDistributionexplicit log-probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in data
source
+Kinds of Target Proxy · LearnAPI.jl

Kinds of Target Proxy

The available kinds of target proxy (used for predict dispatch) are classified by subtypes of LearnAPI.KindOfProxy. These types are intended for dispatch only and have no fields.

LearnAPI.KindOfProxyType
LearnAPI.KindOfProxy

Abstract type whose concrete subtypes T each represent a different kind of proxy for some target variable, associated with some algorithm. Instances T() are used to request the form of target predictions in predict calls.

See LearnAPI.jl documentation for an explanation of "targets" and "target proxies".

For example, Distribution is a concrete subtype of LearnAPI.KindOfProxy and a call like predict(model, Distribution(), Xnew) returns a data object whose observations are probability density/mass functions, assuming algorithm supports predictions of that form.

The instances of LearnAPI.KindOfProxy are: ConfidenceInterval(), Continuous(), Distribution(), Expectile(), Fuzzy(), HazardFunction(), LabelAmbiguous(), LabelAmbiguousDistribution(), LabelAmbiguousFuzzy(), LabelAmbiguousSampleable(), LogDistribution(), LogProbability(), OutlierScore(), Parametric(), Point(), ProbabilisticFuzzy(), Probability(), Quantile(), Sampleable(), SurvivalDistribution(), SurvivalFunction(), SingleDistribution(), SingleLogDistribution(), SingleSampeable(), JointDistribution(), JointLogDistribution() and JointSampleable().

source

Simple target proxies

LearnAPI.IIDType
LearnAPI.IID <: LearnAPI.KindOfProxy

Abstract subtype of LearnAPI.KindOfProxy. If kind_of_proxy is an instance of LearnAPI.IID then, given data constisting of $n$ observations, the following must hold:

  • ŷ = LearnAPI.predict(model, kind_of_proxy, data) is data also consisting of $n$ observations.

  • The $j$th observation of , for any $j$, depends only on the $j$th observation of the provided data (no correlation between observations).

See also LearnAPI.KindOfProxy.

Extended help

typeform of an observation
LearnAPI.Pointsame as target observations; may have the interpretation of a 50% quantile, 50% expectile or mode
LearnAPI.Sampleableobject that can be sampled to obtain object of the same form as target observation
LearnAPI.Distributionexplicit probability density/mass function whose sample space is all possible target observations
LearnAPI.LogDistributionexplicit log-probability density/mass function whose sample space is possible target observations
LearnAPI.Probability¹numerical probability or probability vector
LearnAPI.LogProbability¹log-probability or log-probability vector
LearnAPI.Parametric¹a list of parameters (e.g., mean and variance) describing some distribution
LearnAPI.LabelAmbiguouscollections of labels (in case of multi-class target) but without a known correspondence to the original target labels (and of possibly different number) as in, e.g., clustering
LearnAPI.LabelAmbiguousSampleablesampleable version of LabelAmbiguous; see Sampleable above
LearnAPI.LabelAmbiguousDistributionpdf/pmf version of LabelAmbiguous; see Distribution above
LearnAPI.LabelAmbiguousFuzzysame as LabelAmbiguous but with multiple values of indeterminant number
LearnAPI.Quantile²same as target but with quantile interpretation
LearnAPI.Expectile²same as target but with expectile interpretation
LearnAPI.ConfidenceInterval²confidence interval
LearnAPI.Fuzzyfinite but possibly varying number of target observations
LearnAPI.ProbabilisticFuzzyas for Fuzzy but labeled with probabilities (not necessarily summing to one)
LearnAPI.SurvivalFunctionsurvival function
LearnAPI.SurvivalDistributionprobability distribution for survival time
LearnAPI.SurvivalHazardFunctionhazard function for survival time
LearnAPI.OutlierScorenumerical score reflecting degree of outlierness (not necessarily normalized)
LearnAPI.Continuousreal-valued approximation/interpolation of a discrete-valued target, such as a count (e.g., number of phone calls)

¹Provided for completeness but discouraged to avoid ambiguities in representation.

²The level will be controlled by a hyper-parameter; models providing only quantiles or expectiles at 50% will provide Point instead.

source

Proxies for density estimation algorithms

LearnAPI.SingleType
Single <: KindOfProxy

Abstract subtype of LearnAPI.KindOfProxy. It applies only to algorithms for which predict has no data argument, i.e., is of the form predict(model, kind_of_proxy). An example is an algorithm learning a probability distribution from samples, and we regard the samples as drawn from the "target" variable. If in this case, kind_of_proxy is an instance of LearnAPI.Single then, predict(algorithm) returns a single object representing a probability distribution.

type Tform of output of predict(model, ::T)
LearnAPI.SingleSampleableobject that can be sampled to obtain a single target observation
LearnAPI.SingleDistributionexplicit probability density/mass function for sampling the target
LearnAPI.SingleLogDistributionexplicit log-probability density/mass function for sampling the target
source

Joint probability distributions

LearnAPI.JointType
Joint <: KindOfProxy

Abstract subtype of LearnAPI.KindOfProxy. If kind_of_proxy is an instance of LearnAPI.Joint then, given data consisting of $n$ observations, predict(model, kind_of_proxy, data) represents a single probability distribution for the sample space $Y^n$, where $Y$ is the space from which the target variable takes its values.

type Tform of output of predict(model, ::T, data)
LearnAPI.JointSampleableobject that can be sampled to obtain a vector whose elements have the form of target observations; the vector length matches the number of observations in data.
LearnAPI.JointDistributionexplicit probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in data
LearnAPI.JointLogDistributionexplicit log-probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in data
source
diff --git a/dev/minimize/index.html b/dev/minimize/index.html index 14917138..01545b39 100644 --- a/dev/minimize/index.html +++ b/dev/minimize/index.html @@ -15,4 +15,4 @@ transform(minimize(model; options...), args...; kwargs...) == transform(model, args...; kwargs...) inverse_transform(minimize(model; options), args...; kwargs...) == - inverse_transform(model, args...; kwargs...)

Additionally:

minimize(minimize(model)) == minimize(model)
source + inverse_transform(model, args...; kwargs...)

Additionally:

minimize(minimize(model)) == minimize(model)
source diff --git a/dev/obs/index.html b/dev/obs/index.html index a57e8d5d..2e744b49 100644 --- a/dev/obs/index.html +++ b/dev/obs/index.html @@ -48,4 +48,4 @@ predict_observations = obs(model, X) ẑ = predict(model, Point(), MLUtils.getobs(predict_observations, 101:150)) -@assert ẑ == ŷ

See also LearnAPI.data_interface.

Extended help

New implementations

Implementation is typically optional.

For each supported form of data in fit(algorithm, data), it must be true that model = fit(algorithm, observations) is equivalent to model = fit(algorithm, data), whenever observations = obs(algorithm, data). For each supported form of data in calls predict(model, ..., data) and transform(model, data), where implemented, the calls predict(model, ..., observations) and transform(model, observations) are supported alternatives, whenever observations = obs(model, data).

The fallback for obs is obs(model_or_algorithm, data) = data, and the fallback for LearnAPI.data_interface(algorithm) is LearnAPI.RandomAccess(). For details refer to the LearnAPI.data_interface document string.

In particular, if the data to be consumed by fit, predict or transform consists only of suitable tables and arrays, then obs and LearnAPI.data_interface do not need to be overloaded. However, the user will get no performance benefits by using obs in that case.

When overloading obs(algorithm, data) to output new model-specific representations of data, it may be necessary to also overload LearnAPI.features, LearnAPI.target (supervised algorithms), and/or LearnAPI.weights (if weights are supported), for extracting relevant parts of the representation.

Sample implementation

Refer to the "Anatomy of an Implementation" section of the LearnAPI.jl manual.

source

Data interfaces

New implementations must overload LearnAPI.data_interface(algorithm) if the output of obs does not implement LearnAPI.RandomAccess. (Arrays, most tables, and all tuples thereof, implement RandomAccess.)

LearnAPI.RandomAccessType
LearnAPI.RandomAccess

A data interface type. We say that data implements the RandomAccess interface if data implements the methods getobs and numobs from MLUtils.jl. The first method allows one to grab observations specified by an arbitrary index set, as in MLUtils.getobs(data, [2, 3, 5]), while the second method returns the total number of available observations, which is assumed to be known and finite.

All arrays implement RandomAccess, with the last index being the observation index (observations-as-columns in matrices).

A Tables.jl compatible table data implements RandomAccess if Tables.istable(data) is true and if data implements DataAPI.nrows. This includes many tables, and in particular, DataFrames. Tables that are also tuples are explicitly excluded.

Any tuple of objects implementing RandomAccess also implements RandomAccess.

If LearnAPI.data_interface(algorithm) takes the value RandomAccess(), then obs(algorithm, ...) is guaranteed to return objects implementing the RandomAccess interface, and the same holds for obs(model, ...), whenever LearnAPI.algorithm(model) == algorithm.

Implementing RandomAccess for new data types

Typically, to implement RandomAccess for a new data type requires only implementing Base.getindex and Base.length, which are the fallbacks for MLUtils.getobs and MLUtils.numobs, and this avoids making MLUtils.jl a package dependency.

See also LearnAPI.FiniteIterable, LearnAPI.Iterable.

source
LearnAPI.FiniteIterableType
LearnAPI.FiniteIterable

A data interface type. We say that data implements the FiniteIterable interface if it implements Julia's iterate interface, including Base.length, and if Base.IteratorSize(typeof(data)) == Base.HasLength(). For example, this is true if:

  • data implements the LearnAPI.RandomAccess interface (arrays and most tables)

  • data isa MLUtils.DataLoader, which includes output from MLUtils.eachobs.

If LearnAPI.data_interface(algorithm) takes the value FiniteIterable(), then obs(algorithm, ...) is guaranteed to return objects implementing the FiniteIterable interface, and the same holds for obs(model, ...), whenever LearnAPI.algorithm(model) == algorithm.

See also LearnAPI.RandomAccess, LearnAPI.Iterable.

source
LearnAPI.IterableType
LearnAPI.Iterable

A data interface type. We say that data implements the Iterable interface if it implements Julia's basic iterate interface. (Such objects may not implement MLUtils.numobs or Base.length.)

If LearnAPI.data_interface(algorithm) takes the value Iterable(), then obs(algorithm, ...) is guaranteed to return objects implementing Iterable, and the same holds for obs(model, ...), whenever LearnAPI.algorithm(model) == algorithm.

See also LearnAPI.FiniteIterable, LearnAPI.RandomAccess.

source
+@assert ẑ == ŷ

See also LearnAPI.data_interface.

Extended help

New implementations

Implementation is typically optional.

For each supported form of data in fit(algorithm, data), it must be true that model = fit(algorithm, observations) is equivalent to model = fit(algorithm, data), whenever observations = obs(algorithm, data). For each supported form of data in calls predict(model, ..., data) and transform(model, data), where implemented, the calls predict(model, ..., observations) and transform(model, observations) are supported alternatives, whenever observations = obs(model, data).

The fallback for obs is obs(model_or_algorithm, data) = data, and the fallback for LearnAPI.data_interface(algorithm) is LearnAPI.RandomAccess(). For details refer to the LearnAPI.data_interface document string.

In particular, if the data to be consumed by fit, predict or transform consists only of suitable tables and arrays, then obs and LearnAPI.data_interface do not need to be overloaded. However, the user will get no performance benefits by using obs in that case.

When overloading obs(algorithm, data) to output new model-specific representations of data, it may be necessary to also overload LearnAPI.features, LearnAPI.target (supervised algorithms), and/or LearnAPI.weights (if weights are supported), for extracting relevant parts of the representation.

Sample implementation

Refer to the "Anatomy of an Implementation" section of the LearnAPI.jl manual.

source

Data interfaces

New implementations must overload LearnAPI.data_interface(algorithm) if the output of obs does not implement LearnAPI.RandomAccess. (Arrays, most tables, and all tuples thereof, implement RandomAccess.)

LearnAPI.RandomAccessType
LearnAPI.RandomAccess

A data interface type. We say that data implements the RandomAccess interface if data implements the methods getobs and numobs from MLUtils.jl. The first method allows one to grab observations specified by an arbitrary index set, as in MLUtils.getobs(data, [2, 3, 5]), while the second method returns the total number of available observations, which is assumed to be known and finite.

All arrays implement RandomAccess, with the last index being the observation index (observations-as-columns in matrices).

A Tables.jl compatible table data implements RandomAccess if Tables.istable(data) is true and if data implements DataAPI.nrows. This includes many tables, and in particular, DataFrames. Tables that are also tuples are explicitly excluded.

Any tuple of objects implementing RandomAccess also implements RandomAccess.

If LearnAPI.data_interface(algorithm) takes the value RandomAccess(), then obs(algorithm, ...) is guaranteed to return objects implementing the RandomAccess interface, and the same holds for obs(model, ...), whenever LearnAPI.algorithm(model) == algorithm.

Implementing RandomAccess for new data types

Typically, to implement RandomAccess for a new data type requires only implementing Base.getindex and Base.length, which are the fallbacks for MLUtils.getobs and MLUtils.numobs, and this avoids making MLUtils.jl a package dependency.

See also LearnAPI.FiniteIterable, LearnAPI.Iterable.

source
LearnAPI.FiniteIterableType
LearnAPI.FiniteIterable

A data interface type. We say that data implements the FiniteIterable interface if it implements Julia's iterate interface, including Base.length, and if Base.IteratorSize(typeof(data)) == Base.HasLength(). For example, this is true if:

  • data implements the LearnAPI.RandomAccess interface (arrays and most tables)

  • data isa MLUtils.DataLoader, which includes output from MLUtils.eachobs.

If LearnAPI.data_interface(algorithm) takes the value FiniteIterable(), then obs(algorithm, ...) is guaranteed to return objects implementing the FiniteIterable interface, and the same holds for obs(model, ...), whenever LearnAPI.algorithm(model) == algorithm.

See also LearnAPI.RandomAccess, LearnAPI.Iterable.

source
LearnAPI.IterableType
LearnAPI.Iterable

A data interface type. We say that data implements the Iterable interface if it implements Julia's basic iterate interface. (Such objects may not implement MLUtils.numobs or Base.length.)

If LearnAPI.data_interface(algorithm) takes the value Iterable(), then obs(algorithm, ...) is guaranteed to return objects implementing Iterable, and the same holds for obs(model, ...), whenever LearnAPI.algorithm(model) == algorithm.

See also LearnAPI.FiniteIterable, LearnAPI.RandomAccess.

source
diff --git a/dev/patterns/classification/index.html b/dev/patterns/classification/index.html index 7e24c040..539741e4 100644 --- a/dev/patterns/classification/index.html +++ b/dev/patterns/classification/index.html @@ -1,2 +1,2 @@ -Classification · LearnAPI.jl
+Classification · LearnAPI.jl
diff --git a/dev/patterns/clusterering/index.html b/dev/patterns/clusterering/index.html index d11887a4..1f2852b0 100644 --- a/dev/patterns/clusterering/index.html +++ b/dev/patterns/clusterering/index.html @@ -1,2 +1,2 @@ -Clusterering · LearnAPI.jl
+Clusterering · LearnAPI.jl
diff --git a/dev/patterns/dimension_reduction/index.html b/dev/patterns/dimension_reduction/index.html index 38140a2e..6724b852 100644 --- a/dev/patterns/dimension_reduction/index.html +++ b/dev/patterns/dimension_reduction/index.html @@ -1,2 +1,2 @@ -Dimension Reduction · LearnAPI.jl
+Dimension Reduction · LearnAPI.jl
diff --git a/dev/patterns/feature_engineering/index.html b/dev/patterns/feature_engineering/index.html index 7d657e35..3e876757 100644 --- a/dev/patterns/feature_engineering/index.html +++ b/dev/patterns/feature_engineering/index.html @@ -1,2 +1,2 @@ -Feature Engineering · LearnAPI.jl

Feature Engineering

  • For a simple feature selection algorithm (no "learning) see [these

examples](https://github.com/JuliaAI/LearnAPI.jl/blob/dev/test/integration/static_algorithms.jl) from tests.

+Feature Engineering · LearnAPI.jl

Feature Engineering

  • For a simple feature selection algorithm (no "learning) see [these

examples](https://github.com/JuliaAI/LearnAPI.jl/blob/dev/test/integration/static_algorithms.jl) from tests.

diff --git a/dev/patterns/incremental_algorithms/index.html b/dev/patterns/incremental_algorithms/index.html index 4ac5bf8a..80345286 100644 --- a/dev/patterns/incremental_algorithms/index.html +++ b/dev/patterns/incremental_algorithms/index.html @@ -1,2 +1,2 @@ -Incremental Models · LearnAPI.jl
+Incremental Models · LearnAPI.jl
diff --git a/dev/patterns/incremental_models/index.html b/dev/patterns/incremental_models/index.html index efad4ec7..3108b0a0 100644 --- a/dev/patterns/incremental_models/index.html +++ b/dev/patterns/incremental_models/index.html @@ -1,2 +1,2 @@ -Incremental Algorithms · LearnAPI.jl
+Incremental Algorithms · LearnAPI.jl
diff --git a/dev/patterns/iterative_algorithms/index.html b/dev/patterns/iterative_algorithms/index.html index e85134cd..a38727c3 100644 --- a/dev/patterns/iterative_algorithms/index.html +++ b/dev/patterns/iterative_algorithms/index.html @@ -1,2 +1,2 @@ -Iterative Algorithms · LearnAPI.jl
+Iterative Algorithms · LearnAPI.jl
diff --git a/dev/patterns/learning_a_probability_distribution/index.html b/dev/patterns/learning_a_probability_distribution/index.html index eee6464b..98d21eb0 100644 --- a/dev/patterns/learning_a_probability_distribution/index.html +++ b/dev/patterns/learning_a_probability_distribution/index.html @@ -1,2 +1,2 @@ -Learning a Probability Distribution · LearnAPI.jl
+Learning a Probability Distribution · LearnAPI.jl
diff --git a/dev/patterns/meta_algorithms/index.html b/dev/patterns/meta_algorithms/index.html index 057430ef..18d95a25 100644 --- a/dev/patterns/meta_algorithms/index.html +++ b/dev/patterns/meta_algorithms/index.html @@ -1,2 +1,2 @@ -Meta-algorithms · LearnAPI.jl
+Meta-algorithms · LearnAPI.jl
diff --git a/dev/patterns/missing_value_imputation/index.html b/dev/patterns/missing_value_imputation/index.html index bedc5e19..b1e74a4e 100644 --- a/dev/patterns/missing_value_imputation/index.html +++ b/dev/patterns/missing_value_imputation/index.html @@ -1,2 +1,2 @@ -Missing Value Imputation · LearnAPI.jl
+Missing Value Imputation · LearnAPI.jl
diff --git a/dev/patterns/outlier_detection/index.html b/dev/patterns/outlier_detection/index.html index 0f812cd0..e1375cdd 100644 --- a/dev/patterns/outlier_detection/index.html +++ b/dev/patterns/outlier_detection/index.html @@ -1,2 +1,2 @@ -Outlier Detection · LearnAPI.jl
+Outlier Detection · LearnAPI.jl
diff --git a/dev/patterns/regression/index.html b/dev/patterns/regression/index.html index 25525b98..c67fc8de 100644 --- a/dev/patterns/regression/index.html +++ b/dev/patterns/regression/index.html @@ -1,2 +1,2 @@ -Regression · LearnAPI.jl
+Regression · LearnAPI.jl
diff --git a/dev/patterns/static_algorithms/index.html b/dev/patterns/static_algorithms/index.html index 914df388..47199014 100644 --- a/dev/patterns/static_algorithms/index.html +++ b/dev/patterns/static_algorithms/index.html @@ -1,2 +1,2 @@ -Static Algorithms · LearnAPI.jl
+Static Algorithms · LearnAPI.jl
diff --git a/dev/patterns/supervised_bayesian_algorithms/index.html b/dev/patterns/supervised_bayesian_algorithms/index.html index d7a1c9a5..7e420a3e 100644 --- a/dev/patterns/supervised_bayesian_algorithms/index.html +++ b/dev/patterns/supervised_bayesian_algorithms/index.html @@ -1,2 +1,2 @@ -Supervised Bayesian Models · LearnAPI.jl
+Supervised Bayesian Models · LearnAPI.jl
diff --git a/dev/patterns/supervised_bayesian_models/index.html b/dev/patterns/supervised_bayesian_models/index.html index 931677bb..be52e38f 100644 --- a/dev/patterns/supervised_bayesian_models/index.html +++ b/dev/patterns/supervised_bayesian_models/index.html @@ -1,2 +1,2 @@ -Supervised Bayesian Algorithms · LearnAPI.jl
+Supervised Bayesian Algorithms · LearnAPI.jl
diff --git a/dev/patterns/survival_analysis/index.html b/dev/patterns/survival_analysis/index.html index bdfa4f14..1db61b47 100644 --- a/dev/patterns/survival_analysis/index.html +++ b/dev/patterns/survival_analysis/index.html @@ -1,2 +1,2 @@ -Survival Analysis · LearnAPI.jl
+Survival Analysis · LearnAPI.jl
diff --git a/dev/patterns/time_series_classification/index.html b/dev/patterns/time_series_classification/index.html index c63400ae..8c3da83b 100644 --- a/dev/patterns/time_series_classification/index.html +++ b/dev/patterns/time_series_classification/index.html @@ -1,2 +1,2 @@ -Time Series Classification · LearnAPI.jl
+Time Series Classification · LearnAPI.jl
diff --git a/dev/patterns/time_series_forecasting/index.html b/dev/patterns/time_series_forecasting/index.html index c1149773..e3d62a7c 100644 --- a/dev/patterns/time_series_forecasting/index.html +++ b/dev/patterns/time_series_forecasting/index.html @@ -1,2 +1,2 @@ -Time Series Forecasting · LearnAPI.jl
+Time Series Forecasting · LearnAPI.jl
diff --git a/dev/predict_transform/index.html b/dev/predict_transform/index.html index 7455a228..341e849b 100644 --- a/dev/predict_transform/index.html +++ b/dev/predict_transform/index.html @@ -9,8 +9,8 @@ transform(algorithm, data) # `fit` implied

For example, if fit(algorithm, X) is defined, then predict(algorithm, X) will be shorthand for

model = fit(algorithm, X)
 predict(model, X)

Reference

LearnAPI.predictFunction
predict(model, kind_of_proxy::LearnAPI.KindOfProxy, data)
 predict(model, data)

The first signature returns target predictions, or proxies for target predictions, for input features data, according to some model returned by fit. Where supported, these are literally target predictions if kind_of_proxy = Point(), and probability density/mass functions if kind_of_proxy = Distribution(). List all options with LearnAPI.kinds_of_proxy(algorithm), where algorithm = LearnAPI.algorithm(model).

The shortcut predict(model, data) calls the first method with an algorithm-specific kind_of_proxy, namely the first element of LearnAPI.kinds_of_proxy(algorithm), which lists all supported target proxies.

The argument model is anything returned by a call of the form fit(algorithm, ...).

Example

In the following, algorithm is some supervised learning algorithm with training features X, training target y, and test features Xnew:

model = fit(algorithm, (X, y)) # or `fit(algorithm, X, y)`
-predict(model, Point(), Xnew)

See also fit, transform, inverse_transform.

Extended help

If predict supports data in the form of a tuple data = (X1, ..., Xn), then a slurping signature is also provided, as in predict(model, X1, ..., Xn).

Note predict does not mutate any argument, except in the special case LearnAPI.predict_or_transform_mutates(algorithm) = true.

New implementations

If there is no notion of a "target" variable in the LearnAPI.jl sense, or you need an operation with an inverse, implement transform instead.

Implementation is optional. Only the first signature is implemented, but each kind_of_proxy that gets an implementation must be added to the list returned by LearnAPI.kinds_of_proxy.

If implemented, you must include :(LearnAPI.predict) in the tuple returned by the LearnAPI.functions trait.

If, additionally, minimize(model) is overloaded, then the following identity must hold:

predict(minimize(model), args...) = predict(model, args...)

If LearnAPI.predict_or_transform_mutates(algorithm) is overloaded to return true, then predict may mutate it's first argument, but not in a way that alters the result of a subsequent call to predict, transform or inverse_transform. This is necessary for some non-generalizing algorithms but is otherwise discouraged. See more at fit.

Assumptions about data

By default, it is assumed that data supports the LearnAPI.RandomAccess interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof). See LearnAPI.RandomAccess for details. If this is not the case then an implementation must either: (i) overload obs to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess; or (ii) overload the trait LearnAPI.data_interface to specify a more relaxed data API. Refer to document strings for details.

source
LearnAPI.transformFunction
transform(model, data)

Return a transformation of some data, using some model, as returned by fit.

For data that consists of a tuple, a slurping version is also provided, i.e., you can do transform(model, X1, X2, X3) in place of transform(model, (X1, X2, X3)).

Example

Below, X and Xnew are data of the same form.

For an algorithm that generalizes to new data ("learns"):

model = fit(algorithm, X; verbosity=0)
+predict(model, Point(), Xnew)

See also fit, transform, inverse_transform.

Extended help

If predict supports data in the form of a tuple data = (X1, ..., Xn), then a slurping signature is also provided, as in predict(model, X1, ..., Xn).

Note predict does not mutate any argument, except in the special case LearnAPI.predict_or_transform_mutates(algorithm) = true.

New implementations

If there is no notion of a "target" variable in the LearnAPI.jl sense, or you need an operation with an inverse, implement transform instead.

Implementation is optional. Only the first signature is implemented, but each kind_of_proxy that gets an implementation must be added to the list returned by LearnAPI.kinds_of_proxy.

If implemented, you must include :(LearnAPI.predict) in the tuple returned by the LearnAPI.functions trait.

If, additionally, minimize(model) is overloaded, then the following identity must hold:

predict(minimize(model), args...) = predict(model, args...)

If LearnAPI.predict_or_transform_mutates(algorithm) is overloaded to return true, then predict may mutate it's first argument, but not in a way that alters the result of a subsequent call to predict, transform or inverse_transform. This is necessary for some non-generalizing algorithms but is otherwise discouraged. See more at fit.

Assumptions about data

By default, it is assumed that data supports the LearnAPI.RandomAccess interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof). See LearnAPI.RandomAccess for details. If this is not the case then an implementation must either: (i) overload obs to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess; or (ii) overload the trait LearnAPI.data_interface to specify a more relaxed data API. Refer to document strings for details.

source
LearnAPI.transformFunction
transform(model, data)

Return a transformation of some data, using some model, as returned by fit.

For data that consists of a tuple, a slurping version is also provided, i.e., you can do transform(model, X1, X2, X3) in place of transform(model, (X1, X2, X3)).

Example

Below, X and Xnew are data of the same form.

For an algorithm that generalizes to new data ("learns"):

model = fit(algorithm, X; verbosity=0)
 transform(model, Xnew)

For a static (non-generalizing) transformer:

model = fit(algorithm)
-W = transform(model, X)

or, in one step (where supported):

W = transform(algorithm, X)

Note transform does not mutate any argument, except in the special case LearnAPI.predict_or_transform_mutates(algorithm) = true.

See also fit, predict, inverse_transform.

Extended help

New implementations

Implementation for new LearnAPI.jl algorithms is optional. A fallback provides the slurping version. If implemented, you must include :(LearnAPI.transform) in the tuple returned by the LearnAPI.functions trait.

If, additionally, minimize(model) is overloaded, then the following identity must hold:

transform(minimize(model), args...) = transform(model, args...)

If LearnAPI.predict_or_transform_mutates(algorithm) is overloaded to return true, then transform may mutate it's first argument, but not in a way that alters the result of a subsequent call to predict, transform or inverse_transform. This is necessary for some non-generalizing algorithms but is otherwise discouraged. See more at fit.

Assumptions about data

By default, it is assumed that data supports the LearnAPI.RandomAccess interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof). See LearnAPI.RandomAccess for details. If this is not the case then an implementation must either: (i) overload obs to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess; or (ii) overload the trait LearnAPI.data_interface to specify a more relaxed data API. Refer to document strings for details.

source
LearnAPI.inverse_transformFunction
inverse_transform(model, data)

Inverse transform data according to some model returned by fit. Here "inverse" is to be understood broadly, e.g, an approximate right inverse for transform.

Example

In the following, algorithm is some dimension-reducing algorithm that generalizes to new data (such as PCA); Xtrain is the training input and Xnew the input to be reduced:

model = fit(algorithm, Xtrain)
+W = transform(model, X)

or, in one step (where supported):

W = transform(algorithm, X)

Note transform does not mutate any argument, except in the special case LearnAPI.predict_or_transform_mutates(algorithm) = true.

See also fit, predict, inverse_transform.

Extended help

New implementations

Implementation for new LearnAPI.jl algorithms is optional. A fallback provides the slurping version. If implemented, you must include :(LearnAPI.transform) in the tuple returned by the LearnAPI.functions trait.

If, additionally, minimize(model) is overloaded, then the following identity must hold:

transform(minimize(model), args...) = transform(model, args...)

If LearnAPI.predict_or_transform_mutates(algorithm) is overloaded to return true, then transform may mutate it's first argument, but not in a way that alters the result of a subsequent call to predict, transform or inverse_transform. This is necessary for some non-generalizing algorithms but is otherwise discouraged. See more at fit.

Assumptions about data

By default, it is assumed that data supports the LearnAPI.RandomAccess interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof). See LearnAPI.RandomAccess for details. If this is not the case then an implementation must either: (i) overload obs to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess; or (ii) overload the trait LearnAPI.data_interface to specify a more relaxed data API. Refer to document strings for details.

source
LearnAPI.inverse_transformFunction
inverse_transform(model, data)

Inverse transform data according to some model returned by fit. Here "inverse" is to be understood broadly, e.g, an approximate right inverse for transform.

Example

In the following, algorithm is some dimension-reducing algorithm that generalizes to new data (such as PCA); Xtrain is the training input and Xnew the input to be reduced:

model = fit(algorithm, Xtrain)
 W = transform(model, Xnew)       # reduced version of `Xnew`
-Ŵ = inverse_transform(model, W)  # embedding of `W` in original space

See also fit, transform, predict.

Extended help

New implementations

Implementation is optional. If implemented, you must include :(LearnAPI.inverse_transform) in the tuple returned by the LearnAPI.functions trait.

If, additionally, minimize(model) is overloaded, then the following identity must hold:

inverse_transform(minimize(model), args...) = inverse_transform(model, args...)
source
+Ŵ = inverse_transform(model, W) # embedding of `W` in original space

See also fit, transform, predict.

Extended help

New implementations

Implementation is optional. If implemented, you must include :(LearnAPI.inverse_transform) in the tuple returned by the LearnAPI.functions trait.

If, additionally, minimize(model) is overloaded, then the following identity must hold:

inverse_transform(minimize(model), args...) = inverse_transform(model, args...)
source diff --git a/dev/reference/index.html b/dev/reference/index.html index 1fdb4ca9..2b47be09 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -8,4 +8,4 @@ end GradientRidgeRegressor(; learning_rate=0.01, epochs=10, l2_regularization=0.01) = GradientRidgeRegressor(learning_rate, epochs, l2_regularization) -LearnAPI.constructor(::GradientRidgeRegressor) = GradientRidgeRegressor

Documentation

Attach public LearnAPI.jl-related documentation for an algorithm to it's constructor, rather than to the struct defining its type. In this way, an algorithm can implement multiple interfaces, in addition to the LearnAPI interface, with separate document strings for each.

Methods

Compulsory methods

All new algorithm types must implement fit, LearnAPI.algorithm, LearnAPI.constructor and LearnAPI.functions.

Most algorithms will also implement predict and/or transform.

List of methods


¹ We acknowledge users may not like this terminology, and may know "algorithm" by some other name, such as "strategy", "options", "hyperparameter set", "configuration", or "model". Consensus on this point is difficult; see, e.g., this Julia Discourse discussion.

+LearnAPI.constructor(::GradientRidgeRegressor) = GradientRidgeRegressor

Documentation

Attach public LearnAPI.jl-related documentation for an algorithm to it's constructor, rather than to the struct defining its type. In this way, an algorithm can implement multiple interfaces, in addition to the LearnAPI interface, with separate document strings for each.

Methods

Compulsory methods

All new algorithm types must implement fit, LearnAPI.algorithm, LearnAPI.constructor and LearnAPI.functions.

Most algorithms will also implement predict and/or transform.

List of methods


¹ We acknowledge users may not like this terminology, and may know "algorithm" by some other name, such as "strategy", "options", "hyperparameter set", "configuration", or "model". Consensus on this point is difficult; see, e.g., this Julia Discourse discussion.

diff --git a/dev/target_weights_features/index.html b/dev/target_weights_features/index.html index 0eec40ce..0de39255 100644 --- a/dev/target_weights_features/index.html +++ b/dev/target_weights_features/index.html @@ -5,7 +5,7 @@ X = LearnAPI.features(algorithm, data) y = LearnAPI.target(algorithm, data) ŷ = predict(model, Point(), X) -training_loss = sum(ŷ .!= y)

Implementation guide

The fallback returns first(data), assuming data is a tuple, and data otherwise.

methodfallbackcompulsory?
LearnAPI.targetreturns nothingno
LearnAPI.weightsreturns nothingno
LearnAPI.featuressee docstringonly if fallback fails

Reference

LearnAPI.targetFunction
LearnAPI.target(algorithm, data) -> target

Return, for each form of data supported in a call of the form fit(algorithm, data), the target variable part of data. If nothing is returned, the algorithm does not see a target variable in training (is unsupervised).

Refer to LearnAPI.jl documentation for the precise meaning of "target".

New implementations

A fallback returns nothing. Must be implemented if fit consumes data including a target variable.

If overloaded, you must include :(LearnAPI.target) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.weightsFunction
LearnAPI.weights(algorithm, data) -> weights

Return, for each form of data supported in a call of the form fit(algorithm, data), the per-observation weights part of data. Where nothing is returned, no weights are part of data, which is to be interpreted as uniform weighting.

New implementations

Overloading is optional. A fallback returns nothing.

If overloaded, you must include :(LearnAPI.weights) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.featuresFunction
LearnAPI.features(algorithm, data)

Return, for each form of data supported in a call of the form fit(algorithm, data), the "features" part of data (as opposed to the target variable, for example).

The returned object X may always be passed to predict or transform, where implemented, as in the following sample workflow:

model = fit(algorithm, data)
+training_loss = sum(ŷ .!= y)

Implementation guide

The fallback returns first(data), assuming data is a tuple, and data otherwise.

methodfallbackcompulsory?
LearnAPI.targetreturns nothingno
LearnAPI.weightsreturns nothingno
LearnAPI.featuressee docstringonly if fallback fails

Reference

LearnAPI.targetFunction
LearnAPI.target(algorithm, data) -> target

Return, for each form of data supported in a call of the form fit(algorithm, data), the target variable part of data. If nothing is returned, the algorithm does not see a target variable in training (is unsupervised).

Refer to LearnAPI.jl documentation for the precise meaning of "target".

New implementations

A fallback returns nothing. Must be implemented if fit consumes data including a target variable.

If overloaded, you must include :(LearnAPI.target) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.weightsFunction
LearnAPI.weights(algorithm, data) -> weights

Return, for each form of data supported in a call of the form fit(algorithm, data), the per-observation weights part of data. Where nothing is returned, no weights are part of data, which is to be interpreted as uniform weighting.

New implementations

Overloading is optional. A fallback returns nothing.

If overloaded, you must include :(LearnAPI.weights) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.featuresFunction
LearnAPI.features(algorithm, data)

Return, for each form of data supported in a call of the form fit(algorithm, data), the "features" part of data (as opposed to the target variable, for example).

The returned object X may always be passed to predict or transform, where implemented, as in the following sample workflow:

model = fit(algorithm, data)
 X = features(data)
 ŷ = predict(algorithm, kind_of_proxy, X) # eg, `kind_of_proxy = Point()`

The return value has the same number of observations as data does. For supervised models (i.e., where :(LearnAPI.target) in LearnAPI.functions(algorithm)) above is generally intended to be an approximate proxy for LearnAPI.target(algorithm, data), the training target.

New implementations

The only contract features must satisfy is the one about passability of the output to predict or transform, for each supported input data. The following fallbacks typically make overloading LearnAPI.features unnecessary:

LearnAPI.features(algorithm, data) = data
-LearnAPI.features(algorithm, data::Tuple) = first(data)

Overloading may be necessary if obs(algorithm, data) is overloaded to return some algorithm-specific representation of training data. For density estimators, whose fit typically consumes only a target variable, you should overload this method to return nothing.

source
+LearnAPI.features(algorithm, data::Tuple) = first(data)

Overloading may be necessary if obs(algorithm, data) is overloaded to return some algorithm-specific representation of training data. For density estimators, whose fit typically consumes only a target variable, you should overload this method to return nothing.

source diff --git a/dev/testing_an_implementation/index.html b/dev/testing_an_implementation/index.html index 83596d87..396647bb 100644 --- a/dev/testing_an_implementation/index.html +++ b/dev/testing_an_implementation/index.html @@ -1,2 +1,2 @@ -Testing an Implementation · LearnAPI.jl
+Testing an Implementation · LearnAPI.jl
diff --git a/dev/traits/index.html b/dev/traits/index.html index 90d9ebdc..7bca7302 100644 --- a/dev/traits/index.html +++ b/dev/traits/index.html @@ -10,12 +10,12 @@ julia> algorithm2.lambda 0.2

New implementations

All new implementations must overload this trait.

Attach public LearnAPI.jl-related documentation for an algorithm to the constructor, not the algorithm struct.

It must be possible to recover an algorithm from the constructor returned as follows:

properties = propertynames(algorithm)
 named_properties = NamedTuple{properties}(getproperty.(Ref(algorithm), properties))
-@assert algorithm == LearnAPI.constructor(algorithm)(; named_properties...)

The keyword constructor provided by LearnAPI.constructor must provide default values for all properties, with the exception of those that can take other LearnAPI.jl algorithms as values.

source
LearnAPI.functionsFunction
LearnAPI.functions(algorithm)

Return a tuple of expressions representing functions that can be meaningfully applied with algorithm, or an associated model (object returned by fit(algorithm, ...), as the first argument. Algorithm traits (methods for which algorithm is the only argument) are excluded.

The returned tuple may include expressions like :(DecisionTree.print_tree), which reference functions not owned by LearnAPI.jl.

The understanding is that algorithm is a LearnAPI-compliant object whenever the return value is non-empty.

Extended help

New implementations

All new implementations must overload this trait. Here's a checklist for elements in the return value:

symbolimplementation/overloading compulsory?include in returned tuple?
:(LearnAPI.fit)yesyes
:(LearnAPI.algorithm)yesyes
:(LearnAPI.minimize)noyes
:(LearnAPI.obs)noyes
:(LearnAPI.features)noyes, unless fit consumes no data
:(LearnAPI.update)noonly if implemented
:(LearnAPI.update_observations)noonly if implemented
:(LearnAPI.update_features)noonly if implemented
:(LearnAPI.target)noonly if implemented
:(LearnAPI.weights)noonly if implemented
:(LearnAPI.predict)noonly if implemented
:(LearnAPI.transform)noonly if implemented
:(LearnAPI.inverse_transform)noonly if implemented
<accessor functions>noonly if implemented

Also include any implemented accessor functions, both those owned by LearnaAPI.jl, and any algorithm-specific ones. The LearnAPI.jl accessor functions are: LearnAPI.extras, LearnAPI.algorithm, LearnAPI.coefficients, LearnAPI.intercept, LearnAPI.tree, LearnAPI.trees, LearnAPI.feature_importances, LearnAPI.training_labels, LearnAPI.training_losses, LearnAPI.training_predictions, LearnAPI.training_scores and LearnAPI.components.

source
LearnAPI.kinds_of_proxyFunction
LearnAPI.kinds_of_proxy(algorithm)

Returns a tuple of all instances, kind, for which for which predict(algorithm, kind, data...) has a guaranteed implementation. Each such kind subtypes LearnAPI.KindOfProxy. Examples are Point() (for predicting actual target values) and Distributions() (for predicting probability mass/density functions).

The call predict(model, data) always returns predict(model, kind, data), where kind is the first element of the trait's return value.

See also LearnAPI.predict, LearnAPI.KindOfProxy.

Extended help

New implementations

Must be overloaded whenever predict is implemented.

Elements of the returned tuple must be instances of types in the return value of LearnAPI.kinds_of_proxy(), i.e., one of the following, described further in LearnAPI.jl documentation: ConfidenceInterval(), Continuous(), Distribution(), Expectile(), Fuzzy(), HazardFunction(), LabelAmbiguous(), LabelAmbiguousDistribution(), LabelAmbiguousFuzzy(), LabelAmbiguousSampleable(), LogDistribution(), LogProbability(), OutlierScore(), Parametric(), Point(), ProbabilisticFuzzy(), Probability(), Quantile(), Sampleable(), SurvivalDistribution(), SurvivalFunction(), SingleDistribution(), SingleLogDistribution(), SingleSampeable(), JointDistribution(), JointLogDistribution() and JointSampleable().

Suppose, for example, we have the following implementation of a supervised learner returning only probabilistic predictions:

LearnAPI.predict(algorithm::MyNewAlgorithmType, LearnAPI.Distribution(), Xnew) = ...

Then we can declare

@trait MyNewAlgorithmType kinds_of_proxy = (LearnaAPI.Distribution(),)

LearnAPI.jl provides the fallback for predict(model, data).

For more on target variables and target proxies, refer to the LearnAPI documentation.

source
LearnAPI.tagsFunction
LearnAPI.tags(algorithm)

Lists one or more suggestive algorithm tags. Do LearnAPI.tags() to list all possible.

Warning

The value of this trait guarantees no particular behavior. The trait is intended for informal classification purposes only.

New implementations

This trait should return a tuple of strings, as in ("classifier", "text analysis").

source
LearnAPI.is_pure_juliaFunction
LearnAPI.is_pure_julia(algorithm)

Returns true if training algorithm requires evaluation of pure Julia code only.

New implementations

The fallback is false.

source
LearnAPI.pkg_nameFunction
LearnAPI.pkg_name(algorithm)

Return the name of the package module which supplies the core training algorithm for algorithm. This is not necessarily the package providing the LearnAPI interface.

Returns "unknown" if the algorithm implementation has not overloaded the trait.

New implementations

Must return a string, as in "DecisionTree".

source
LearnAPI.pkg_licenseFunction
LearnAPI.pkg_license(algorithm)

Return the name of the software license, such as "MIT", applying to the package where the core algorithm for algorithm is implemented.

source
LearnAPI.doc_urlFunction
LearnAPI.doc_url(algorithm)

Return a url where the core algorithm for algorithm is documented.

Returns "unknown" if the algorithm implementation has not overloaded the trait.

New implementations

Must return a string, such as "https://en.wikipedia.org/wiki/Decision_tree_learning".

source
LearnAPI.load_pathFunction
LearnAPI.load_path(algorithm)

Return a string indicating where in code the definition of the algorithm's constructor can be found, beginning with the name of the package module defining it. By "constructor" we mean the return value of LearnAPI.constructor(algorithm).

Implementation

For example, a return value of "FastTrees.LearnAPI.DecisionTreeClassifier" means the following julia code will not error:

import FastTrees
+@assert algorithm == LearnAPI.constructor(algorithm)(; named_properties...)

The keyword constructor provided by LearnAPI.constructor must provide default values for all properties, with the exception of those that can take other LearnAPI.jl algorithms as values.

source
LearnAPI.functionsFunction
LearnAPI.functions(algorithm)

Return a tuple of expressions representing functions that can be meaningfully applied with algorithm, or an associated model (object returned by fit(algorithm, ...), as the first argument. Algorithm traits (methods for which algorithm is the only argument) are excluded.

The returned tuple may include expressions like :(DecisionTree.print_tree), which reference functions not owned by LearnAPI.jl.

The understanding is that algorithm is a LearnAPI-compliant object whenever the return value is non-empty.

Extended help

New implementations

All new implementations must overload this trait. Here's a checklist for elements in the return value:

symbolimplementation/overloading compulsory?include in returned tuple?
:(LearnAPI.fit)yesyes
:(LearnAPI.algorithm)yesyes
:(LearnAPI.minimize)noyes
:(LearnAPI.obs)noyes
:(LearnAPI.features)noyes, unless fit consumes no data
:(LearnAPI.update)noonly if implemented
:(LearnAPI.update_observations)noonly if implemented
:(LearnAPI.update_features)noonly if implemented
:(LearnAPI.target)noonly if implemented
:(LearnAPI.weights)noonly if implemented
:(LearnAPI.predict)noonly if implemented
:(LearnAPI.transform)noonly if implemented
:(LearnAPI.inverse_transform)noonly if implemented
<accessor functions>noonly if implemented

Also include any implemented accessor functions, both those owned by LearnaAPI.jl, and any algorithm-specific ones. The LearnAPI.jl accessor functions are: LearnAPI.extras, LearnAPI.algorithm, LearnAPI.coefficients, LearnAPI.intercept, LearnAPI.tree, LearnAPI.trees, LearnAPI.feature_importances, LearnAPI.training_labels, LearnAPI.training_losses, LearnAPI.training_predictions, LearnAPI.training_scores and LearnAPI.components.

source
LearnAPI.kinds_of_proxyFunction
LearnAPI.kinds_of_proxy(algorithm)

Returns a tuple of all instances, kind, for which for which predict(algorithm, kind, data...) has a guaranteed implementation. Each such kind subtypes LearnAPI.KindOfProxy. Examples are Point() (for predicting actual target values) and Distributions() (for predicting probability mass/density functions).

The call predict(model, data) always returns predict(model, kind, data), where kind is the first element of the trait's return value.

See also LearnAPI.predict, LearnAPI.KindOfProxy.

Extended help

New implementations

Must be overloaded whenever predict is implemented.

Elements of the returned tuple must be instances of types in the return value of LearnAPI.kinds_of_proxy(), i.e., one of the following, described further in LearnAPI.jl documentation: ConfidenceInterval(), Continuous(), Distribution(), Expectile(), Fuzzy(), HazardFunction(), LabelAmbiguous(), LabelAmbiguousDistribution(), LabelAmbiguousFuzzy(), LabelAmbiguousSampleable(), LogDistribution(), LogProbability(), OutlierScore(), Parametric(), Point(), ProbabilisticFuzzy(), Probability(), Quantile(), Sampleable(), SurvivalDistribution(), SurvivalFunction(), SingleDistribution(), SingleLogDistribution(), SingleSampeable(), JointDistribution(), JointLogDistribution() and JointSampleable().

Suppose, for example, we have the following implementation of a supervised learner returning only probabilistic predictions:

LearnAPI.predict(algorithm::MyNewAlgorithmType, LearnAPI.Distribution(), Xnew) = ...

Then we can declare

@trait MyNewAlgorithmType kinds_of_proxy = (LearnaAPI.Distribution(),)

LearnAPI.jl provides the fallback for predict(model, data).

For more on target variables and target proxies, refer to the LearnAPI documentation.

source
LearnAPI.tagsFunction
LearnAPI.tags(algorithm)

Lists one or more suggestive algorithm tags. Do LearnAPI.tags() to list all possible.

Warning

The value of this trait guarantees no particular behavior. The trait is intended for informal classification purposes only.

New implementations

This trait should return a tuple of strings, as in ("classifier", "text analysis").

source
LearnAPI.is_pure_juliaFunction
LearnAPI.is_pure_julia(algorithm)

Returns true if training algorithm requires evaluation of pure Julia code only.

New implementations

The fallback is false.

source
LearnAPI.pkg_nameFunction
LearnAPI.pkg_name(algorithm)

Return the name of the package module which supplies the core training algorithm for algorithm. This is not necessarily the package providing the LearnAPI interface.

Returns "unknown" if the algorithm implementation has not overloaded the trait.

New implementations

Must return a string, as in "DecisionTree".

source
LearnAPI.pkg_licenseFunction
LearnAPI.pkg_license(algorithm)

Return the name of the software license, such as "MIT", applying to the package where the core algorithm for algorithm is implemented.

source
LearnAPI.doc_urlFunction
LearnAPI.doc_url(algorithm)

Return a url where the core algorithm for algorithm is documented.

Returns "unknown" if the algorithm implementation has not overloaded the trait.

New implementations

Must return a string, such as "https://en.wikipedia.org/wiki/Decision_tree_learning".

source
LearnAPI.load_pathFunction
LearnAPI.load_path(algorithm)

Return a string indicating where in code the definition of the algorithm's constructor can be found, beginning with the name of the package module defining it. By "constructor" we mean the return value of LearnAPI.constructor(algorithm).

Implementation

For example, a return value of "FastTrees.LearnAPI.DecisionTreeClassifier" means the following julia code will not error:

import FastTrees
 import LearnAPI
-@assert FastTrees.LearnAPI.DecisionTreeClassifier == LearnAPI.constructor(algorithm)

Returns "unknown" if the algorithm implementation has not overloaded the trait.

source
LearnAPI.is_compositeFunction
LearnAPI.is_composite(algorithm)

Returns true if one or more properties (fields) of algorithm may themselves be algorithms, and false otherwise.

See also LearnAPI.components.

New implementations

This trait should be overloaded if one or more properties (fields) of algorithm may take algorithm values. Fallback return value is false. The keyword constructor for such an algorithm need not prescribe defaults for algorithm-valued properties. Implementation of the accessor function LearnAPI.components is recommended.

The value of the trait must depend only on the type of algorithm.

source
LearnAPI.human_nameFunction
LearnAPI.human_name(algorithm)

Return a human-readable string representation of typeof(algorithm). Primarily intended for auto-generation of documentation.

New implementations

Optional. A fallback takes the type name, inserts spaces and removes capitalization. For example, KNNRegressor becomes "knn regressor". Better would be to overload the trait to return "K-nearest neighbors regressor". Ideally, this is a "concrete" noun like "ridge regressor" rather than an "abstract" noun like "ridge regression".

source
LearnAPI.data_interfaceFunction
LearnAPI.data_interface(algorithm)

Return the data interface supported by algorithm for accessing individual observations in representations of input data returned by obs(algorithm, data) or obs(model, data), whenever algorithm == LearnAPI.algorithm(model). Here data is fit, predict, or transform-consumable data.

Possible return values are LearnAPI.RandomAccess, LearnAPI.FiniteIterable, and LearnAPI.Iterable.

See also obs.

New implementations

The fallback returns LearnAPI.RandomAccess, which applies to arrays, most tables, and tuples of these. See the doc-string for details.

source
LearnAPI.iteration_parameterFunction
LearnAPI.iteration_parameter(algorithm)

The name of the iteration parameter of algorithm, or nothing if the algorithm is not iterative.

New implementations

Implement if algorithm is iterative. Returns a symbol or nothing.

source
LearnAPI.fit_observation_scitypeFunction
LearnAPI.fit_observation_scitype(algorithm)

Return an upper bound S on the scitype of individual observations guaranteed to work when calling fit: if observations = obs(algorithm, data) and ScientificTypes.scitype(o) <:S for each o in observations, then the call fit(algorithm, data) is supported.

Here, "for each o in observations" is understood in the sense of LearnAPI.data_interface(algorithm). For example, if LearnAPI.data_interface(algorithm) == Base.HasLength(), then this means "for o in MLUtils.eachobs(observations)".

See also LearnAPI.target_observation_scitype.

New implementations

Optional. The fallback return value is Union{}. Ordinarily, at most one of the following should be overloaded for given algorithm LearnAPI.fit_scitype, LearnAPI.fit_type, LearnAPI.fit_observation_scitype, LearnAPI.fit_observation_type.

source
LearnAPI.target_observation_scitypeFunction
LearnAPI.target_observation_scitype(algorithm)

Return an upper bound S on the scitype of each observation of an applicable target variable. Specifically:

  • If :(LearnAPI.target) in LearnAPI.functions(algorithm) (i.e., fit consumes target variables) then "target" means anything returned by LearnAPI.target(algorithm, data), where data is an admissible argument in the call fit(algorithm, data).

  • S will always be an upper bound on the scitype of observations that could be conceivably extracted from the output of predict.

To illustate the second case, suppose we have

model = fit(algorithm, data)
-ŷ = predict(model, Sampleable(), data_new)

Then each individual sample generated by each "observation" of (a vector of sampleable objects, say) will be bound in scitype by S.

See also See also LearnAPI.fit_observation_scitype.

New implementations

Optional. The fallback return value is Any.

source
LearnAPI.predict_or_transform_mutatesFunction
LearnAPI.predict_or_transform_mutates(algorithm)

Returns true if predict or transform possibly mutate their first argument, model, when LearnAPI.algorithm(model) == algorithm. If false, no arguments are ever mutated.

New implementations

This trait, falling back to false, may only be overloaded when fit has no data arguments (algorithm does not generalize to new data). See more at fit.

source
LearnAPI.@traitMacro
@trait(TypeEx, trait1=value1, trait2=value2, ...)

Overload a number of traits for algorithms of type TypeEx. For example, the code

@trait(
+@assert FastTrees.LearnAPI.DecisionTreeClassifier == LearnAPI.constructor(algorithm)

Returns "unknown" if the algorithm implementation has not overloaded the trait.

source
LearnAPI.is_compositeFunction
LearnAPI.is_composite(algorithm)

Returns true if one or more properties (fields) of algorithm may themselves be algorithms, and false otherwise.

See also LearnAPI.components.

New implementations

This trait should be overloaded if one or more properties (fields) of algorithm may take algorithm values. Fallback return value is false. The keyword constructor for such an algorithm need not prescribe defaults for algorithm-valued properties. Implementation of the accessor function LearnAPI.components is recommended.

The value of the trait must depend only on the type of algorithm.

source
LearnAPI.human_nameFunction
LearnAPI.human_name(algorithm)

Return a human-readable string representation of typeof(algorithm). Primarily intended for auto-generation of documentation.

New implementations

Optional. A fallback takes the type name, inserts spaces and removes capitalization. For example, KNNRegressor becomes "knn regressor". Better would be to overload the trait to return "K-nearest neighbors regressor". Ideally, this is a "concrete" noun like "ridge regressor" rather than an "abstract" noun like "ridge regression".

source
LearnAPI.data_interfaceFunction
LearnAPI.data_interface(algorithm)

Return the data interface supported by algorithm for accessing individual observations in representations of input data returned by obs(algorithm, data) or obs(model, data), whenever algorithm == LearnAPI.algorithm(model). Here data is fit, predict, or transform-consumable data.

Possible return values are LearnAPI.RandomAccess, LearnAPI.FiniteIterable, and LearnAPI.Iterable.

See also obs.

New implementations

The fallback returns LearnAPI.RandomAccess, which applies to arrays, most tables, and tuples of these. See the doc-string for details.

source
LearnAPI.iteration_parameterFunction
LearnAPI.iteration_parameter(algorithm)

The name of the iteration parameter of algorithm, or nothing if the algorithm is not iterative.

New implementations

Implement if algorithm is iterative. Returns a symbol or nothing.

source
LearnAPI.fit_observation_scitypeFunction
LearnAPI.fit_observation_scitype(algorithm)

Return an upper bound S on the scitype of individual observations guaranteed to work when calling fit: if observations = obs(algorithm, data) and ScientificTypes.scitype(o) <:S for each o in observations, then the call fit(algorithm, data) is supported.

Here, "for each o in observations" is understood in the sense of LearnAPI.data_interface(algorithm). For example, if LearnAPI.data_interface(algorithm) == Base.HasLength(), then this means "for o in MLUtils.eachobs(observations)".

See also LearnAPI.target_observation_scitype.

New implementations

Optional. The fallback return value is Union{}. Ordinarily, at most one of the following should be overloaded for given algorithm LearnAPI.fit_scitype, LearnAPI.fit_type, LearnAPI.fit_observation_scitype, LearnAPI.fit_observation_type.

source
LearnAPI.target_observation_scitypeFunction
LearnAPI.target_observation_scitype(algorithm)

Return an upper bound S on the scitype of each observation of an applicable target variable. Specifically:

  • If :(LearnAPI.target) in LearnAPI.functions(algorithm) (i.e., fit consumes target variables) then "target" means anything returned by LearnAPI.target(algorithm, data), where data is an admissible argument in the call fit(algorithm, data).

  • S will always be an upper bound on the scitype of observations that could be conceivably extracted from the output of predict.

To illustate the second case, suppose we have

model = fit(algorithm, data)
+ŷ = predict(model, Sampleable(), data_new)

Then each individual sample generated by each "observation" of (a vector of sampleable objects, say) will be bound in scitype by S.

See also See also LearnAPI.fit_observation_scitype.

New implementations

Optional. The fallback return value is Any.

source
LearnAPI.predict_or_transform_mutatesFunction
LearnAPI.predict_or_transform_mutates(algorithm)

Returns true if predict or transform possibly mutate their first argument, model, when LearnAPI.algorithm(model) == algorithm. If false, no arguments are ever mutated.

New implementations

This trait, falling back to false, may only be overloaded when fit has no data arguments (algorithm does not generalize to new data). See more at fit.

source
LearnAPI.@traitMacro
@trait(TypeEx, trait1=value1, trait2=value2, ...)

Overload a number of traits for algorithms of type TypeEx. For example, the code

@trait(
     RidgeRegressor,
     tags = ("regression", ),
     doc_url = "https://some.cool.documentation",
 )

is equivalent to

LearnAPI.tags(::RidgeRegressor) = ("regression", ),
-LearnAPI.doc_url(::RidgeRegressor) = "https://some.cool.documentation",
source
+LearnAPI.doc_url(::RidgeRegressor) = "https://some.cool.documentation",source