From 89db95d390a5acd3821e856b485e60347a853795 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 8 May 2024 21:32:58 +0000 Subject: [PATCH] build based on 14441aa --- dev/composition/index.html | 2 +- dev/datasets/index.html | 10 +++++----- dev/distributions/index.html | 6 +++--- dev/index.html | 2 +- dev/resampling/index.html | 12 ++++++------ dev/search/index.html | 2 +- dev/search_index.js | 2 +- dev/utilities/index.html | 34 +++++++++++++++++----------------- 8 files changed, 35 insertions(+), 35 deletions(-) diff --git a/dev/composition/index.html b/dev/composition/index.html index b0bbebbe..c07f295f 100644 --- a/dev/composition/index.html +++ b/dev/composition/index.html @@ -1,2 +1,2 @@ -Composition · MLJBase.jl
+Composition · MLJBase.jl
diff --git a/dev/datasets/index.html b/dev/datasets/index.html index 6ab74c58..f56cb635 100644 --- a/dev/datasets/index.html +++ b/dev/datasets/index.html @@ -4,8 +4,8 @@ categorical=true)

Load it with DelimitedFiles and Tables

data_raw, data_header = readdlm(fpath, ',', header=true)
 data_table = Tables.table(data_raw; header=Symbol.(vec(data_header)))

Retrieve the conversions:

for (n, st) in zip(names(data), scitype_union.(eachcol(data)))
     println(":$n=>$st,")
-end

Copy and paste the result in a coerce

data_table = coerce(data_table, ...)
MLJBase.load_datasetMethod
load_dataset(fpath, coercions)

Load one of standard dataset like Boston etc assuming the file is a comma separated file with a header.

source
MLJBase.load_sunspotsMethod

Load a well-known sunspot time series (table with one column). [https://www.sws.bom.gov.au/Educational/2/3/6]](https://www.sws.bom.gov.au/Educational/2/3/6)

source
MLJBase.@load_amesMacro

Load the full version of the well-known Ames Housing task.

source
MLJBase.@load_bostonMacro

Load a well-known public regression dataset with Continuous features.

source
MLJBase.@load_crabsMacro

Load a well-known crab classification dataset with nominal features.

source
MLJBase.@load_irisMacro

Load a well-known public classification task with nominal features.

source
MLJBase.@load_reduced_amesMacro

Load a reduced version of the well-known Ames Housing task

source
MLJBase.@load_smarketMacro

Load S&P Stock Market dataset, as used in (An Introduction to Statistical Learning with applications in R)https://rdrr.io/cran/ISLR/man/Smarket.html, by Witten et al (2013), Springer-Verlag, New York.

source
MLJBase.@load_sunspotsMacro

Load a well-known sunspot time series (single table with one column).

source

Synthetic datasets

MLJBase.augment_XMethod
augment_X(X, fit_intercept)

Given a matrix X, append a column of ones if fit_intercept is true. See make_regression.

source
MLJBase.finalize_XyMethod
finalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)

Internal function to finalize the make_* functions.

source
MLJBase.make_blobsFunction
X, y = make_blobs(n=100, p=2; kwargs...)

Generate Gaussian blobs for clustering and classification problems.

Return value

By default, a table X with p columns (features) and n rows (observations), together with a corresponding vector of n Multiclass target observations y, indicating blob membership.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • centers=3: either a number of centers or a c x p matrix with c pre-determined centers,

  • cluster_std=1.0: the standard deviation(s) of each blob,

  • center_box=(-10. => 10.): the limits of the p-dimensional cube within which the cluster centers are drawn if they are not provided,

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])
source
MLJBase.make_circlesFunction
X, y = make_circles(n=100; kwargs...)

Generate n labeled points close to two concentric circles for classification and clustering models.

Return value

By default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the smaller or larger circle, respectively.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • noise=0: standard deviation of the Gaussian noise added to the data,

  • factor=0.8: ratio of the smaller radius over the larger one,

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_circles(100; noise=0.5, factor=0.3)
source
MLJBase.make_moonsFunction
make_moons(n::Int=100; kwargs...)

Generates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.

Return value

By default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the left or right semi-circle.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • noise=0.1: standard deviation of the Gaussian noise added to the data,

  • xshift=1.0: horizontal translation of the second center with respect to the first one.

  • yshift=0.3: vertical translation of the second center with respect to the first one.

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_moons(100; noise=0.5)
source
MLJBase.make_regressionFunction
make_regression(n, p; kwargs...)

Generate Gaussian input features and a linear response with Gaussian noise, for use with regression models.

Return value

By default, a tuple (X, y) where table X has p columns and n rows (observations), together with a corresponding vector of n Continuous target observations y.

Keywords

  • intercept=true: Whether to generate data from a model with intercept.

  • n_targets=1: Number of columns in the target.

  • sparse=0: Proportion of the generating weight vector that is sparse.

  • noise=0.1: Standard deviation of the Gaussian noise added to the response (target).

  • outliers=0: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary is false.)

  • as_table=true: Whether X (and y, if n_targets > 1) should be a table or a matrix.

  • eltype=Float64: Element type for X and y. Must subtype AbstractFloat.

  • binary=false: Whether the target should be binarized (via a sigmoid).

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false).

Example

X, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)
source
MLJBase.outlify!Method

Add outliers to portion s of vector.

source
MLJBase.runif_abMethod
runif_ab(rng, n, p, a, b)

Internal function to generate n points in [a, b]ᵖ uniformly at random.

source
MLJBase.sigmoidMethod
sigmoid(x)

Return the sigmoid computed in a numerically stable way: $σ(x) = 1/(1+exp(-x))$

source
MLJBase.sparsify!Method
sparsify!(rng, θ, s)

Make portion s of vector θ exactly 0.

source

Utility functions

MLJBase.complementMethod
complement(folds, i)

The complement of the ith fold of folds in the concatenation of all elements of folds. Here folds is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.

complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
source
MLJBase.corestrictMethod
corestrict(X, folds, i)

The restriction of X, a vector, matrix or table, to the complement of the ith fold of folds, where folds is a tuple of vectors of row indices.

The method is curried, so that corestrict(folds, i) is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i).

Example

folds = ([1, 2], [3, 4, 5],  [6,])
-corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
source
MLJBase.partitionMethod
partition(X, fractions...;
+end

Copy and paste the result in a coerce

data_table = coerce(data_table, ...)
MLJBase.load_datasetMethod
load_dataset(fpath, coercions)

Load one of standard dataset like Boston etc assuming the file is a comma separated file with a header.

source
MLJBase.load_sunspotsMethod

Load a well-known sunspot time series (table with one column). [https://www.sws.bom.gov.au/Educational/2/3/6]](https://www.sws.bom.gov.au/Educational/2/3/6)

source

Synthetic datasets

MLJBase.finalize_XyMethod
finalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)

Internal function to finalize the make_* functions.

source
MLJBase.make_blobsFunction
X, y = make_blobs(n=100, p=2; kwargs...)

Generate Gaussian blobs for clustering and classification problems.

Return value

By default, a table X with p columns (features) and n rows (observations), together with a corresponding vector of n Multiclass target observations y, indicating blob membership.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • centers=3: either a number of centers or a c x p matrix with c pre-determined centers,

  • cluster_std=1.0: the standard deviation(s) of each blob,

  • center_box=(-10. => 10.): the limits of the p-dimensional cube within which the cluster centers are drawn if they are not provided,

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])
source
MLJBase.make_circlesFunction
X, y = make_circles(n=100; kwargs...)

Generate n labeled points close to two concentric circles for classification and clustering models.

Return value

By default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the smaller or larger circle, respectively.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • noise=0: standard deviation of the Gaussian noise added to the data,

  • factor=0.8: ratio of the smaller radius over the larger one,

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_circles(100; noise=0.5, factor=0.3)
source
MLJBase.make_moonsFunction
make_moons(n::Int=100; kwargs...)

Generates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.

Return value

By default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the left or right semi-circle.

Keyword arguments

  • shuffle=true: whether to shuffle the resulting points,

  • noise=0.1: standard deviation of the Gaussian noise added to the data,

  • xshift=1.0: horizontal translation of the second center with respect to the first one.

  • yshift=0.3: vertical translation of the second center with respect to the first one.

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type.

Example

X, y = make_moons(100; noise=0.5)
source
MLJBase.make_regressionFunction
make_regression(n, p; kwargs...)

Generate Gaussian input features and a linear response with Gaussian noise, for use with regression models.

Return value

By default, a tuple (X, y) where table X has p columns and n rows (observations), together with a corresponding vector of n Continuous target observations y.

Keywords

  • intercept=true: Whether to generate data from a model with intercept.

  • n_targets=1: Number of columns in the target.

  • sparse=0: Proportion of the generating weight vector that is sparse.

  • noise=0.1: Standard deviation of the Gaussian noise added to the response (target).

  • outliers=0: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary is false.)

  • as_table=true: Whether X (and y, if n_targets > 1) should be a table or a matrix.

  • eltype=Float64: Element type for X and y. Must subtype AbstractFloat.

  • binary=false: Whether the target should be binarized (via a sigmoid).

  • eltype=Float64: machine type of points (any subtype of AbstractFloat).

  • rng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).

  • as_table=true: whether to return the points as a table (true) or a matrix (false).

Example

X, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)
source
MLJBase.runif_abMethod
runif_ab(rng, n, p, a, b)

Internal function to generate n points in [a, b]ᵖ uniformly at random.

source
MLJBase.sigmoidMethod
sigmoid(x)

Return the sigmoid computed in a numerically stable way: $σ(x) = 1/(1+exp(-x))$

source

Utility functions

MLJBase.complementMethod
complement(folds, i)

The complement of the ith fold of folds in the concatenation of all elements of folds. Here folds is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.

complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
source
MLJBase.corestrictMethod
corestrict(X, folds, i)

The restriction of X, a vector, matrix or table, to the complement of the ith fold of folds, where folds is a tuple of vectors of row indices.

The method is curried, so that corestrict(folds, i) is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i).

Example

folds = ([1, 2], [3, 4, 5],  [6,])
+corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
source
MLJBase.partitionMethod
partition(X, fractions...;
           shuffle=nothing,
           rng=Random.GLOBAL_RNG,
           stratify=nothing,
@@ -19,8 +19,8 @@
 ([1 6], [2 7; 3 8], [4 9; 5 10])
 
 julia> X, y = make_blobs() # a table and vector
-julia> Xtrain, Xtest = partition(X, 0.8, stratify=y)

Here's an example of synchronized partitioning of multiple objects:

julia> (Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)

Keywords

  • shuffle=nothing: if set to true, shuffles the rows before taking fractions.

  • rng=Random.GLOBAL_RNG: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing is interpreted as true.

  • stratify=nothing: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle cannot be false.

  • multi=false: if true then X is expected to be a tuple of objects sharing a common length, which are each partitioned separately using the same specified fractions and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).

source
MLJBase.restrictMethod
restrict(X, folds, i)

The restriction of X, a vector, matrix or table, to the ith fold of folds, where folds is a tuple of vectors of row indices.

The method is curried, so that restrict(folds, i) is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i).

Example

folds = ([1, 2], [3, 4, 5],  [6,])
-restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]

See also corestrict

source
MLJBase.skipinvalidMethod
skipinvalid(itr)

Return an iterator over the elements in itr skipping missing and NaN values. Behaviour is similar to skipmissing.

skipinvalid(A, B)

For vectors A and B of the same length, return a tuple of vectors (A[mask], B[mask]) where mask[i] is true if and only if A[i] and B[i] are both valid (non-missing and non-NaN). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing from the element types if present in the original iterators.

source
MLJBase.unpackMethod
unpack(table, f1, f2, ... fk;
+julia> Xtrain, Xtest = partition(X, 0.8, stratify=y)

Here's an example of synchronized partitioning of multiple objects:

julia> (Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)

Keywords

  • shuffle=nothing: if set to true, shuffles the rows before taking fractions.

  • rng=Random.GLOBAL_RNG: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing is interpreted as true.

  • stratify=nothing: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle cannot be false.

  • multi=false: if true then X is expected to be a tuple of objects sharing a common length, which are each partitioned separately using the same specified fractions and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).

source
MLJBase.restrictMethod
restrict(X, folds, i)

The restriction of X, a vector, matrix or table, to the ith fold of folds, where folds is a tuple of vectors of row indices.

The method is curried, so that restrict(folds, i) is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i).

Example

folds = ([1, 2], [3, 4, 5],  [6,])
+restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]

See also corestrict

source
MLJBase.skipinvalidMethod
skipinvalid(itr)

Return an iterator over the elements in itr skipping missing and NaN values. Behaviour is similar to skipmissing.

skipinvalid(A, B)

For vectors A and B of the same length, return a tuple of vectors (A[mask], B[mask]) where mask[i] is true if and only if A[i] and B[i] are both valid (non-missing and non-NaN). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing from the element types if present in the original iterators.

source
MLJBase.unpackMethod
unpack(table, f1, f2, ... fk;
        wrap_singles=false,
        shuffle=false,
        rng::Union{AbstractRNG,Int,Nothing}=nothing,
@@ -49,4 +49,4 @@
 julia> W  # the column(s) left over
 2-element Vector{String}:
  "A"
- "B"

Whenever a returned table contains a single column, it is converted to a vector unless wrap_singles=true.

If coerce_options are specified then table is first replaced with coerce(table, coerce_options). See ScientificTypes.coerce for details.

If shuffle=true then the rows of table are first shuffled, using the global RNG, unless rng is specified; if rng is an integer, it specifies the seed of an automatically generated Mersenne twister. If rng is specified then shuffle=true is implicit.

source
+ "B"

Whenever a returned table contains a single column, it is converted to a vector unless wrap_singles=true.

If coerce_options are specified then table is first replaced with coerce(table, coerce_options). See ScientificTypes.coerce for details.

If shuffle=true then the rows of table are first shuffled, using the global RNG, unless rng is specified; if rng is an integer, it specifies the seed of an automatically generated Mersenne twister. If rng is specified then shuffle=true is implicit.

source diff --git a/dev/distributions/index.html b/dev/distributions/index.html index fa4b4325..4d02308e 100644 --- a/dev/distributions/index.html +++ b/dev/distributions/index.html @@ -26,6 +26,6 @@ [5.0, 5.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 221 [5.5, 6.0) ┤ 0 [6.0, 6.5) ┤▇▇▇▇▇▇▇▇▇▇▇ 89 - └ ┘source
MLJBase.iteratorMethod
iterator([rng, ], r::NominalRange, [,n])
-iterator([rng, ], r::NumericRange, n)

Return an iterator (currently a vector) for a ParamRange object r. In the first case iteration is over all values stored in the range (or just the first n, if n is specified). In the second case, the iteration is over approximately n ordered values, generated as follows:

(i) First, exactly n values are generated between U and L, with a spacing determined by r.scale (uniform if scale=:linear) where U and L are given by the following table:

r.lowerr.upperLU
finitefiniter.lowerr.upper
-Inffiniter.upper - 2r.unitr.upper
finiteInfr.lowerr.lower + 2r.unit
-InfInfr.origin - r.unitr.origin + r.unit

(ii) If a callable f is provided as scale, then a uniform spacing is always applied in (i) but f is broadcast over the results. (Unlike ordinary scales, this alters the effective range of values generated, instead of just altering the spacing.)

(iii) If r is a discrete numeric range (r isa NumericRange{<:Integer}) then the values are additionally rounded, with any duplicate values removed. Otherwise all the values are used (and there are exacltly n of them).

(iv) Finally, if a random number generator rng is specified, then the values are returned in random order (sampling without replacement), and otherwise they are returned in numeric order, or in the order provided to the range constructor, in the case of a NominalRange.

source
MLJBase.scaleMethod
scale(r::ParamRange)

Return the scale associated with a ParamRange object r. The possible return values are: :none (for a NominalRange), :linear, :log, :log10, :log2, or :custom (if r.scale is a callable object).

source
StatsAPI.fitMethod
Distributions.fit(D, r::MLJBase.NumericRange)

Fit and return a distribution d of type D to the one-dimensional range r.

Only types D in the table below are supported.

The distribution d is constructed in two stages. First, a distributon d0, characterized by the conditions in the second column of the table, is fit to r. Then d0 is truncated between r.lower and r.upper to obtain d.

Distribution type DCharacterization of d0
Arcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweightminimum(d) = r.lower, maximum(d) = r.upper
Normal, Gamma, InverseGaussian, Logistic, LogNormalmean(d) = r.origin, std(d) = r.unit
Cauchy, Gumbel, Laplace, (Normal)Dist.location(d) = r.origin, Dist.scale(d) = r.unit
PoissonDist.mean(d) = r.unit

Here Dist = Distributions.

source
Base.rangeMethod
r = range(model, :hyper; values=nothing)

Define a one-dimensional NominalRange object for a field hyper of model. Note that r is not directly iterable but iterator(r) is.

A nested hyperparameter is specified using dot notation. For example, :(atom.max_depth) specifies the max_depth hyperparameter of the submodel model.atom.

r = range(model, :hyper; upper=nothing, lower=nothing,
-          scale=nothing, values=nothing)

Assuming values is not specified, define a one-dimensional NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n)is an iterator of length n. To generate random elements from r, instead apply rand methods to sampler(r). The supported scales are :linear,:log, :logminus, :log10, :log10minus, :log2, or a callable object.

Note that r is not directly iterable, but iterator(r, n) is, for given resolution (length) n.

By default, the behaviour of the constructed object depends on the type of the value of the hyperparameter :hyper at model at the time of construction. To override this behaviour (for instance if model is not available) specify a type in place of model so the behaviour is determined by the value of the specified type.

A nested hyperparameter is specified using dot notation (see above).

If scale is unspecified, it is set to :linear, :log, :log10minus, or :linear, according to whether the interval (lower, upper) is bounded, right-unbounded, left-unbounded, or doubly unbounded, respectively. Note upper=Inf and lower=-Inf are allowed.

If values is specified, the other keyword arguments are ignored and a NominalRange object is returned (see above).

See also: iterator, sampler

source

Utility functions

+ └ ┘source
MLJBase.iteratorMethod
iterator([rng, ], r::NominalRange, [,n])
+iterator([rng, ], r::NumericRange, n)

Return an iterator (currently a vector) for a ParamRange object r. In the first case iteration is over all values stored in the range (or just the first n, if n is specified). In the second case, the iteration is over approximately n ordered values, generated as follows:

(i) First, exactly n values are generated between U and L, with a spacing determined by r.scale (uniform if scale=:linear) where U and L are given by the following table:

r.lowerr.upperLU
finitefiniter.lowerr.upper
-Inffiniter.upper - 2r.unitr.upper
finiteInfr.lowerr.lower + 2r.unit
-InfInfr.origin - r.unitr.origin + r.unit

(ii) If a callable f is provided as scale, then a uniform spacing is always applied in (i) but f is broadcast over the results. (Unlike ordinary scales, this alters the effective range of values generated, instead of just altering the spacing.)

(iii) If r is a discrete numeric range (r isa NumericRange{<:Integer}) then the values are additionally rounded, with any duplicate values removed. Otherwise all the values are used (and there are exacltly n of them).

(iv) Finally, if a random number generator rng is specified, then the values are returned in random order (sampling without replacement), and otherwise they are returned in numeric order, or in the order provided to the range constructor, in the case of a NominalRange.

source
MLJBase.scaleMethod
scale(r::ParamRange)

Return the scale associated with a ParamRange object r. The possible return values are: :none (for a NominalRange), :linear, :log, :log10, :log2, or :custom (if r.scale is a callable object).

source
StatsAPI.fitMethod
Distributions.fit(D, r::MLJBase.NumericRange)

Fit and return a distribution d of type D to the one-dimensional range r.

Only types D in the table below are supported.

The distribution d is constructed in two stages. First, a distributon d0, characterized by the conditions in the second column of the table, is fit to r. Then d0 is truncated between r.lower and r.upper to obtain d.

Distribution type DCharacterization of d0
Arcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweightminimum(d) = r.lower, maximum(d) = r.upper
Normal, Gamma, InverseGaussian, Logistic, LogNormalmean(d) = r.origin, std(d) = r.unit
Cauchy, Gumbel, Laplace, (Normal)Dist.location(d) = r.origin, Dist.scale(d) = r.unit
PoissonDist.mean(d) = r.unit

Here Dist = Distributions.

source
Base.rangeMethod
r = range(model, :hyper; values=nothing)

Define a one-dimensional NominalRange object for a field hyper of model. Note that r is not directly iterable but iterator(r) is.

A nested hyperparameter is specified using dot notation. For example, :(atom.max_depth) specifies the max_depth hyperparameter of the submodel model.atom.

r = range(model, :hyper; upper=nothing, lower=nothing,
+          scale=nothing, values=nothing)

Assuming values is not specified, define a one-dimensional NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n)is an iterator of length n. To generate random elements from r, instead apply rand methods to sampler(r). The supported scales are :linear,:log, :logminus, :log10, :log10minus, :log2, or a callable object.

Note that r is not directly iterable, but iterator(r, n) is, for given resolution (length) n.

By default, the behaviour of the constructed object depends on the type of the value of the hyperparameter :hyper at model at the time of construction. To override this behaviour (for instance if model is not available) specify a type in place of model so the behaviour is determined by the value of the specified type.

A nested hyperparameter is specified using dot notation (see above).

If scale is unspecified, it is set to :linear, :log, :log10minus, or :linear, according to whether the interval (lower, upper) is bounded, right-unbounded, left-unbounded, or doubly unbounded, respectively. Note upper=Inf and lower=-Inf are allowed.

If values is specified, the other keyword arguments are ignored and a NominalRange object is returned (see above).

See also: iterator, sampler

source

Utility functions

diff --git a/dev/index.html b/dev/index.html index a2b1f2a6..62e2c597 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · MLJBase.jl

MLJBase.jl

These docs are bare-bones and auto-generated. Complete MLJ documentation is here.

For MLJBase-specific developer information, see also the README.md file.

+Home · MLJBase.jl

MLJBase.jl

These docs are bare-bones and auto-generated. Complete MLJ documentation is here.

For MLJBase-specific developer information, see also the README.md file.

diff --git a/dev/resampling/index.html b/dev/resampling/index.html index a7d5bf8c..4fd1e6a7 100644 --- a/dev/resampling/index.html +++ b/dev/resampling/index.html @@ -1,9 +1,9 @@ -Resampling · MLJBase.jl

Resampling

MLJBase.CVType
cv = CV(; nfolds=6,  shuffle=nothing, rng=nothing)

Cross-validation resampling strategy, for use in evaluate!, evaluate and tuning.

train_test_pairs(cv, rows)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector. With no row pre-shuffling, the order of rows is preserved, in the sense that rows coincides precisely with the concatenation of the test vectors, in the order they are generated. The first r test vectors have length n + 1, where n, r = divrem(length(rows), nfolds), and the remaining test vectors have length n.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the CV keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.

source
MLJBase.CompactPerformanceEvaluationType
CompactPerformanceEvaluation <: AbstractPerformanceEvaluation

Type of object returned by evaluate (for models plus data) or evaluate! (for machines) when called with the option compact = true. Such objects have the same structure as the PerformanceEvaluation objects returned by default, except that the following fields are omitted to save memory: fitted_params_per_fold, report_per_fold, train_test_rows.

For more on the remaining fields, see PerformanceEvaluation.

source
MLJBase.HoldoutType
holdout = Holdout(; fraction_train=0.7, shuffle=nothing, rng=nothing)

Instantiate a Holdout resampling strategy, for use in evaluate!, evaluate and in tuning.

train_test_pairs(holdout, rows)

Returns the pair [(train, test)], where train and test are vectors such that rows=vcat(train, test) and length(train)/length(rows) is approximatey equal to fraction_train`.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the Holdout keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is specified.

source
MLJBase.InSampleType
in_sample = InSample()

Instantiate an InSample resampling strategy, for use in evaluate!, evaluate and in tuning. In this strategy the train and test sets are the same, and consist of all observations specified by the rows keyword argument. If rows is not specified, all supplied rows are used.

Example

using MLJBase, MLJModels
+Resampling · MLJBase.jl

Resampling

MLJBase.CVType
cv = CV(; nfolds=6,  shuffle=nothing, rng=nothing)

Cross-validation resampling strategy, for use in evaluate!, evaluate and tuning.

train_test_pairs(cv, rows)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector. With no row pre-shuffling, the order of rows is preserved, in the sense that rows coincides precisely with the concatenation of the test vectors, in the order they are generated. The first r test vectors have length n + 1, where n, r = divrem(length(rows), nfolds), and the remaining test vectors have length n.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the CV keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.

source
MLJBase.CompactPerformanceEvaluationType
CompactPerformanceEvaluation <: AbstractPerformanceEvaluation

Type of object returned by evaluate (for models plus data) or evaluate! (for machines) when called with the option compact = true. Such objects have the same structure as the PerformanceEvaluation objects returned by default, except that the following fields are omitted to save memory: fitted_params_per_fold, report_per_fold, train_test_rows.

For more on the remaining fields, see PerformanceEvaluation.

source
MLJBase.HoldoutType
holdout = Holdout(; fraction_train=0.7, shuffle=nothing, rng=nothing)

Instantiate a Holdout resampling strategy, for use in evaluate!, evaluate and in tuning.

train_test_pairs(holdout, rows)

Returns the pair [(train, test)], where train and test are vectors such that rows=vcat(train, test) and length(train)/length(rows) is approximatey equal to fraction_train`.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the Holdout keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is specified.

source
MLJBase.InSampleType
in_sample = InSample()

Instantiate an InSample resampling strategy, for use in evaluate!, evaluate and in tuning. In this strategy the train and test sets are the same, and consist of all observations specified by the rows keyword argument. If rows is not specified, all supplied rows are used.

Example

using MLJBase, MLJModels
 
 X, y = make_blobs()  # a table and a vector
 model = ConstantClassifier()
-train, test = partition(eachindex(y), 0.7)  # train:test = 70:30

Compute in-sample (training) loss:

evaluate(model, X, y, resampling=InSample(), rows=train, measure=brier_loss)

Compute the out-of-sample loss:

evaluate(model, X, y, resampling=[(train, test),], measure=brier_loss)

Or equivalently:

evaluate(model, X, y, resampling=Holdout(fraction_train=0.7), measure=brier_loss)
source
MLJBase.PerformanceEvaluationType
PerformanceEvaluation <: AbstractPerformanceEvaluation

Type of object returned by evaluate (for models plus data) or evaluate! (for machines). Such objects encode estimates of the performance (generalization error) of a supervised model or outlier detection model, and store other information ancillary to the computation.

If evaluate or evaluate! is called with the compact=true option, then a CompactPerformanceEvaluation object is returned instead.

When evaluate/evaluate! is called, a number of train/test pairs ("folds") of row indices are generated, according to the options provided, which are discussed in the evaluate! doc-string. Rows correspond to observations. The generated train/test pairs are recorded in the train_test_rows field of the PerformanceEvaluation struct, and the corresponding estimates, aggregated over all train/test pairs, are recorded in measurement, a vector with one entry for each measure (metric) recorded in measure.

When displayed, a PerformanceEvaluation object includes a value under the heading 1.96*SE, derived from the standard error of the per_fold entries. This value is suitable for constructing a formal 95% confidence interval for the given measurement. Such intervals should be interpreted with caution. See, for example, Bates et al. (2021).

Fields

These fields are part of the public API of the PerformanceEvaluation struct.

  • model: model used to create the performance evaluation. In the case a tuning model, this is the best model found.

  • measure: vector of measures (metrics) used to evaluate performance

  • measurement: vector of measurements - one for each element of measure - aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measure m is StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())

  • operation (e.g., predict_mode): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: predict, predict_mean, predict_mode, predict_median, or predict_joint.

  • per_fold: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.

  • per_observation: a vector of vectors of vectors containing individual per-observation measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for the ith observation in the fth test fold, evaluated using the mth measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measure measure is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed with the per_observation=false option, then e_per_observation is a vector of missings.

  • fitted_params_per_fold: a vector containing fitted params(mach) for each machine mach trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.

  • report_per_fold: a vector containing report(mach) for each machine mach training in resampling - one machine per train/test pair.

  • train_test_rows: a vector of tuples, each of the form (train, test), where train and test are vectors of row (observation) indices for training and evaluation respectively.

  • resampling: the user-specified resampling strategy to generate the train/test pairs (or literal train/test pairs if that was directly specified).

  • repeats: the number of times the resampling strategy was repeated.

See also CompactPerformanceEvaluation.

source
MLJBase.ResamplerType
resampler = Resampler(
+train, test = partition(eachindex(y), 0.7)  # train:test = 70:30

Compute in-sample (training) loss:

evaluate(model, X, y, resampling=InSample(), rows=train, measure=brier_loss)

Compute the out-of-sample loss:

evaluate(model, X, y, resampling=[(train, test),], measure=brier_loss)

Or equivalently:

evaluate(model, X, y, resampling=Holdout(fraction_train=0.7), measure=brier_loss)
source
MLJBase.PerformanceEvaluationType
PerformanceEvaluation <: AbstractPerformanceEvaluation

Type of object returned by evaluate (for models plus data) or evaluate! (for machines). Such objects encode estimates of the performance (generalization error) of a supervised model or outlier detection model, and store other information ancillary to the computation.

If evaluate or evaluate! is called with the compact=true option, then a CompactPerformanceEvaluation object is returned instead.

When evaluate/evaluate! is called, a number of train/test pairs ("folds") of row indices are generated, according to the options provided, which are discussed in the evaluate! doc-string. Rows correspond to observations. The generated train/test pairs are recorded in the train_test_rows field of the PerformanceEvaluation struct, and the corresponding estimates, aggregated over all train/test pairs, are recorded in measurement, a vector with one entry for each measure (metric) recorded in measure.

When displayed, a PerformanceEvaluation object includes a value under the heading 1.96*SE, derived from the standard error of the per_fold entries. This value is suitable for constructing a formal 95% confidence interval for the given measurement. Such intervals should be interpreted with caution. See, for example, Bates et al. (2021).

Fields

These fields are part of the public API of the PerformanceEvaluation struct.

  • model: model used to create the performance evaluation. In the case a tuning model, this is the best model found.

  • measure: vector of measures (metrics) used to evaluate performance

  • measurement: vector of measurements - one for each element of measure - aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measure m is StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())

  • operation (e.g., predict_mode): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: predict, predict_mean, predict_mode, predict_median, or predict_joint.

  • per_fold: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.

  • per_observation: a vector of vectors of vectors containing individual per-observation measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for the ith observation in the fth test fold, evaluated using the mth measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measure measure is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed with the per_observation=false option, then e_per_observation is a vector of missings.

  • fitted_params_per_fold: a vector containing fitted params(mach) for each machine mach trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.

  • report_per_fold: a vector containing report(mach) for each machine mach training in resampling - one machine per train/test pair.

  • train_test_rows: a vector of tuples, each of the form (train, test), where train and test are vectors of row (observation) indices for training and evaluation respectively.

  • resampling: the user-specified resampling strategy to generate the train/test pairs (or literal train/test pairs if that was directly specified).

  • repeats: the number of times the resampling strategy was repeated.

See also CompactPerformanceEvaluation.

source
MLJBase.ResamplerType
resampler = Resampler(
     model=ConstantRegressor(),
     resampling=CV(),
     measure=nothing,
@@ -16,9 +16,9 @@
     per_observation=true,
     logger=nothing,
     compact=false,
-)

Resampling model wrapper, used internally by the fit method of TunedModel instances and IteratedModel instances. See `evaluate! for options. Not intended for use by general user, who will ordinarily use evaluate! directly.

Given a machine mach = machine(resampler, args...) one obtains a performance evaluation of the specified model, performed according to the prescribed resampling strategy and other parameters, using data args..., by calling fit!(mach) followed by evaluate(mach).

On subsequent calls to fit!(mach) new train/test pairs of row indices are only regenerated if resampling, repeats or cache fields of resampler have changed. The evolution of an RNG field of resampler does not constitute a change (== for MLJType objects is not sensitive to such changes; see is_same_except).

If there is single train/test pair, then warm-restart behavior of the wrapped model resampler.model will extend to warm-restart behaviour of the wrapper resampler, with respect to mutations of the wrapped model.

The sample weights are passed to the specified performance measures that support weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.

The sample class_weights are passed to the specified performance measures that support per-class weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.

source
MLJBase.StratifiedCVType
stratified_cv = StratifiedCV(; nfolds=6,
+)

Resampling model wrapper, used internally by the fit method of TunedModel instances and IteratedModel instances. See `evaluate! for options. Not intended for use by general user, who will ordinarily use evaluate! directly.

Given a machine mach = machine(resampler, args...) one obtains a performance evaluation of the specified model, performed according to the prescribed resampling strategy and other parameters, using data args..., by calling fit!(mach) followed by evaluate(mach).

On subsequent calls to fit!(mach) new train/test pairs of row indices are only regenerated if resampling, repeats or cache fields of resampler have changed. The evolution of an RNG field of resampler does not constitute a change (== for MLJType objects is not sensitive to such changes; see is_same_except).

If there is single train/test pair, then warm-restart behavior of the wrapped model resampler.model will extend to warm-restart behaviour of the wrapper resampler, with respect to mutations of the wrapped model.

The sample weights are passed to the specified performance measures that support weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.

The sample class_weights are passed to the specified performance measures that support per-class weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.

source
MLJBase.StratifiedCVType
stratified_cv = StratifiedCV(; nfolds=6,
                                shuffle=false,
-                               rng=Random.GLOBAL_RNG)

Stratified cross-validation resampling strategy, for use in evaluate!, evaluate and in tuning. Applies only to classification problems (OrderedFactor or Multiclass targets).

train_test_pairs(stratified_cv, rows, y)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices) where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector.

Unlike regular cross-validation, the distribution of the levels of the target y corresponding to each train and test is constrained, as far as possible, to replicate that of y[rows] as a whole.

The stratified train_test_pairs algorithm is invariant to label renaming. For example, if you run replace!(y, 'a' => 'b', 'b' => 'a') and then re-run train_test_pairs, the returned (train, test) pairs will be the same.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the StratifedCV keywod constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.

source
MLJBase.TimeSeriesCVType
tscv = TimeSeriesCV(; nfolds=4)

Cross-validation resampling strategy, for use in evaluate!, evaluate and tuning, when observations are chronological and not expected to be independent.

train_test_pairs(tscv, rows)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The rows are partitioned sequentially into nfolds + 1 approximately equal length partitions, where the first partition is the first train set, and the second partition is the first test set. The second train set consists of the first two partitions, and the second test set consists of the third partition, and so on for each fold.

The first partition (which is the first train set) has length n + r, where n, r = divrem(length(rows), nfolds + 1), and the remaining partitions (all of the test folds) have length n.

Examples

julia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)
+                               rng=Random.GLOBAL_RNG)

Stratified cross-validation resampling strategy, for use in evaluate!, evaluate and in tuning. Applies only to classification problems (OrderedFactor or Multiclass targets).

train_test_pairs(stratified_cv, rows, y)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices) where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector.

Unlike regular cross-validation, the distribution of the levels of the target y corresponding to each train and test is constrained, as far as possible, to replicate that of y[rows] as a whole.

The stratified train_test_pairs algorithm is invariant to label renaming. For example, if you run replace!(y, 'a' => 'b', 'b' => 'a') and then re-run train_test_pairs, the returned (train, test) pairs will be the same.

Pre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the StratifedCV keywod constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.

If rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.

source
MLJBase.TimeSeriesCVType
tscv = TimeSeriesCV(; nfolds=4)

Cross-validation resampling strategy, for use in evaluate!, evaluate and tuning, when observations are chronological and not expected to be independent.

train_test_pairs(tscv, rows)

Returns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The rows are partitioned sequentially into nfolds + 1 approximately equal length partitions, where the first partition is the first train set, and the second partition is the first test set. The second train set consists of the first two partitions, and the second test set consists of the third partition, and so on for each fold.

The first partition (which is the first train set) has length n + r, where n, r = divrem(length(rows), nfolds + 1), and the remaining partitions (all of the test folds) have length n.

Examples

julia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)
 3-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:
  (1:4, 5:6)
  (1:6, 7:8)
@@ -44,5 +44,5 @@
 _.per_observation = [missing]
 _.fitted_params_per_fold = [ … ]
 _.report_per_fold = [ … ]
-_.train_test_rows = [ … ]
source
MLJBase.evaluate!Method
evaluate!(mach; resampling=CV(), measure=nothing, options...)

Estimate the performance of a machine mach wrapping a supervised model in data, using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector. Returns a PerformanceEvaluation object.

Available resampling strategies are CV, Holdout, InSample, StratifiedCV and TimeSeriesCV. If resampling is not an instance of one of these, then a vector of tuples of the form (train_rows, test_rows) is expected. For example, setting

resampling = [((1:100), (101:200)),
-              ((101:200), (1:100))]

gives two-fold cross-validation using the first 200 rows of data.

Any measure conforming to the StatisticalMeasuresBase.jl API can be provided, assuming it can consume multiple observations.

Although evaluate! is mutating, mach.model and mach.args are not mutated.

Additional keyword options

  • rows - vector of observation indices from which both train and test folds are constructed (default is all observations)

  • operation/operations=nothing - One of predict, predict_mean, predict_mode, predict_median, or predict_joint, or a vector of these of the same length as measure/measures. Automatically inferred if left unspecified. For example, predict_mode will be used for a Multiclass target, if model is a probabilistic predictor, but measure is expects literal (point) target predictions. Operations actually applied can be inspected from the operation field of the object returned.

  • weights - per-sample Real weights for measures that support them (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).

  • class_weights - dictionary of Real per-class weights for use with measures that support these, in classification problems (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).

  • repeats::Int=1: set to a higher value for repeated (Monte Carlo) resampling. For example, if repeats = 10, then resampling = CV(nfolds=5, shuffle=true), generates a total of 50 (train, test) pairs for evaluation and subsequent aggregation.

  • acceleration=CPU1(): acceleration/parallelization option; can be any instance of CPU1, (single-threaded computation), CPUThreads (multi-threaded computation) or CPUProcesses (multi-process computation); default is default_resource(). These types are owned by ComputationalResources.jl.

  • force=false: set to true to force cold-restart of each training event

  • verbosity::Int=1 logging level; can be negative

  • check_measure=true: whether to screen measures for possible incompatibility with the model. Will not catch all incompatibilities.

  • per_observation=true: whether to calculate estimates for individual observations; if false the per_observation field of the returned object is populated with missings. Setting to false may reduce compute time and allocations.

  • logger - a logger object (see MLJBase.log_evaluation)

  • compact=false - if true, the returned evaluation object excludes these fields: fitted_params_per_fold, report_per_fold, train_test_rows.

See also evaluate, PerformanceEvaluation, CompactPerformanceEvaluation.

source
MLJBase.log_evaluationMethod
log_evaluation(logger, performance_evaluation)

Log a performance evaluation to logger, an object specific to some logging platform, such as mlflow. If logger=nothing then no logging is performed. The method is called at the end of every call to evaluate/evaluate! using the logger provided by the logger keyword argument.

Implementations for new logging platforms

Julia interfaces to workflow logging platforms, such as mlflow (provided by the MLFlowClient.jl interface) should overload log_evaluation(logger::LoggerType, performance_evaluation), where LoggerType is a platform-specific type for logger objects. For an example, see the implementation provided by the MLJFlow.jl package.

source
+_.train_test_rows = [ … ]
source
MLJBase.evaluate!Method
evaluate!(mach; resampling=CV(), measure=nothing, options...)

Estimate the performance of a machine mach wrapping a supervised model in data, using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector. Returns a PerformanceEvaluation object.

Available resampling strategies are CV, Holdout, InSample, StratifiedCV and TimeSeriesCV. If resampling is not an instance of one of these, then a vector of tuples of the form (train_rows, test_rows) is expected. For example, setting

resampling = [((1:100), (101:200)),
+              ((101:200), (1:100))]

gives two-fold cross-validation using the first 200 rows of data.

Any measure conforming to the StatisticalMeasuresBase.jl API can be provided, assuming it can consume multiple observations.

Although evaluate! is mutating, mach.model and mach.args are not mutated.

Additional keyword options

  • rows - vector of observation indices from which both train and test folds are constructed (default is all observations)

  • operation/operations=nothing - One of predict, predict_mean, predict_mode, predict_median, or predict_joint, or a vector of these of the same length as measure/measures. Automatically inferred if left unspecified. For example, predict_mode will be used for a Multiclass target, if model is a probabilistic predictor, but measure is expects literal (point) target predictions. Operations actually applied can be inspected from the operation field of the object returned.

  • weights - per-sample Real weights for measures that support them (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).

  • class_weights - dictionary of Real per-class weights for use with measures that support these, in classification problems (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).

  • repeats::Int=1: set to a higher value for repeated (Monte Carlo) resampling. For example, if repeats = 10, then resampling = CV(nfolds=5, shuffle=true), generates a total of 50 (train, test) pairs for evaluation and subsequent aggregation.

  • acceleration=CPU1(): acceleration/parallelization option; can be any instance of CPU1, (single-threaded computation), CPUThreads (multi-threaded computation) or CPUProcesses (multi-process computation); default is default_resource(). These types are owned by ComputationalResources.jl.

  • force=false: set to true to force cold-restart of each training event

  • verbosity::Int=1 logging level; can be negative

  • check_measure=true: whether to screen measures for possible incompatibility with the model. Will not catch all incompatibilities.

  • per_observation=true: whether to calculate estimates for individual observations; if false the per_observation field of the returned object is populated with missings. Setting to false may reduce compute time and allocations.

  • logger - a logger object (see MLJBase.log_evaluation)

  • compact=false - if true, the returned evaluation object excludes these fields: fitted_params_per_fold, report_per_fold, train_test_rows.

See also evaluate, PerformanceEvaluation, CompactPerformanceEvaluation.

source
MLJBase.log_evaluationMethod
log_evaluation(logger, performance_evaluation)

Log a performance evaluation to logger, an object specific to some logging platform, such as mlflow. If logger=nothing then no logging is performed. The method is called at the end of every call to evaluate/evaluate! using the logger provided by the logger keyword argument.

Implementations for new logging platforms

Julia interfaces to workflow logging platforms, such as mlflow (provided by the MLFlowClient.jl interface) should overload log_evaluation(logger::LoggerType, performance_evaluation), where LoggerType is a platform-specific type for logger objects. For an example, see the implementation provided by the MLJFlow.jl package.

source
diff --git a/dev/search/index.html b/dev/search/index.html index b0e22d72..fa131424 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · MLJBase.jl

Loading search...

    +Search · MLJBase.jl

    Loading search...

      diff --git a/dev/search_index.js b/dev/search_index.js index f6c47411..956a83b6 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"distributions/#Distributions","page":"Distributions","title":"Distributions","text":"","category":"section"},{"location":"distributions/#Univariate-Finite-Distribution","page":"Distributions","title":"Univariate Finite Distribution","text":"","category":"section"},{"location":"distributions/","page":"Distributions","title":"Distributions","text":"Modules = [MLJBase]\nPages = [\"interface/univariate_finite.jl\"]","category":"page"},{"location":"distributions/#hyperparameters","page":"Distributions","title":"hyperparameters","text":"","category":"section"},{"location":"distributions/","page":"Distributions","title":"Distributions","text":"Modules = [MLJBase]\nPages = [\"hyperparam/one_dimensional_range_methods.jl\", \"hyperparam/one_dimensional_ranges.jl\"]","category":"page"},{"location":"distributions/#Distributions.sampler-Union{Tuple{T}, Tuple{NumericRange{T}, Distributions.UnivariateDistribution}} where T","page":"Distributions","title":"Distributions.sampler","text":"sampler(r::NominalRange, probs::AbstractVector{<:Real})\nsampler(r::NominalRange)\nsampler(r::NumericRange{T}, d)\n\nConstruct an object s which can be used to generate random samples from a ParamRange object r (a one-dimensional range) using one of the following calls:\n\nrand(s) # for one sample\nrand(s, n) # for n samples\nrand(rng, s [, n]) # to specify an RNG\n\nThe argument probs can be any probability vector with the same length as r.values. The second sampler method above calls the first with a uniform probs vector.\n\nThe argument d can be either an arbitrary instance of UnivariateDistribution from the Distributions.jl package, or one of a Distributions.jl types for which fit(d, ::NumericRange) is defined. These include: Arcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweight, Normal, Gamma, InverseGaussian, Logistic, LogNormal, Cauchy, Gumbel, Laplace, and Poisson; but see the doc-string for Distributions.fit for an up-to-date list.\n\nIf d is an instance, then sampling is from a truncated form of the supplied distribution d, the truncation bounds being r.lower and r.upper (the attributes r.origin and r.unit attributes are ignored). For discrete numeric ranges (T <: Integer) the samples are rounded.\n\nIf d is a type then a suitably truncated distribution is automatically generated using Distributions.fit(d, r).\n\nImportant. Values are generated with no regard to r.scale, except in the special case r.scale is a callable object f. In that case, f is applied to all values generated by rand as described above (prior to rounding, in the case of discrete numeric ranges).\n\nExamples\n\njulia> r = range(Char, :letter, values=collect(\"abc\"))\njulia> s = sampler(r, [0.1, 0.2, 0.7])\njulia> samples = rand(s, 1000);\njulia> StatsBase.countmap(samples)\nDict{Char,Int64} with 3 entries:\n 'a' => 107\n 'b' => 205\n 'c' => 688\n\njulia> r = range(Int, :k, lower=2, upper=6) # numeric but discrete\njulia> s = sampler(r, Normal)\njulia> samples = rand(s, 1000);\njulia> UnicodePlots.histogram(samples)\n ┌ ┐\n[2.0, 2.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 119\n[2.5, 3.0) ┤ 0\n[3.0, 3.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 296\n[3.5, 4.0) ┤ 0\n[4.0, 4.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 275\n[4.5, 5.0) ┤ 0\n[5.0, 5.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 221\n[5.5, 6.0) ┤ 0\n[6.0, 6.5) ┤▇▇▇▇▇▇▇▇▇▇▇ 89\n └ ┘\n\n\n\n\n\n","category":"method"},{"location":"distributions/#MLJBase.iterator-Tuple{Random.AbstractRNG, ParamRange, Vararg{Any}}","page":"Distributions","title":"MLJBase.iterator","text":"iterator([rng, ], r::NominalRange, [,n])\niterator([rng, ], r::NumericRange, n)\n\nReturn an iterator (currently a vector) for a ParamRange object r. In the first case iteration is over all values stored in the range (or just the first n, if n is specified). In the second case, the iteration is over approximately n ordered values, generated as follows:\n\n(i) First, exactly n values are generated between U and L, with a spacing determined by r.scale (uniform if scale=:linear) where U and L are given by the following table:\n\nr.lower r.upper L U\nfinite finite r.lower r.upper\n-Inf finite r.upper - 2r.unit r.upper\nfinite Inf r.lower r.lower + 2r.unit\n-Inf Inf r.origin - r.unit r.origin + r.unit\n\n(ii) If a callable f is provided as scale, then a uniform spacing is always applied in (i) but f is broadcast over the results. (Unlike ordinary scales, this alters the effective range of values generated, instead of just altering the spacing.)\n\n(iii) If r is a discrete numeric range (r isa NumericRange{<:Integer}) then the values are additionally rounded, with any duplicate values removed. Otherwise all the values are used (and there are exacltly n of them).\n\n(iv) Finally, if a random number generator rng is specified, then the values are returned in random order (sampling without replacement), and otherwise they are returned in numeric order, or in the order provided to the range constructor, in the case of a NominalRange.\n\n\n\n\n\n","category":"method"},{"location":"distributions/#MLJBase.scale-Tuple{NominalRange}","page":"Distributions","title":"MLJBase.scale","text":"scale(r::ParamRange)\n\nReturn the scale associated with a ParamRange object r. The possible return values are: :none (for a NominalRange), :linear, :log, :log10, :log2, or :custom (if r.scale is a callable object).\n\n\n\n\n\n","category":"method"},{"location":"distributions/#StatsAPI.fit-Union{Tuple{D}, Tuple{Type{D}, NumericRange}} where D<:Distributions.Distribution","page":"Distributions","title":"StatsAPI.fit","text":"Distributions.fit(D, r::MLJBase.NumericRange)\n\nFit and return a distribution d of type D to the one-dimensional range r.\n\nOnly types D in the table below are supported.\n\nThe distribution d is constructed in two stages. First, a distributon d0, characterized by the conditions in the second column of the table, is fit to r. Then d0 is truncated between r.lower and r.upper to obtain d.\n\nDistribution type D Characterization of d0\nArcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweight minimum(d) = r.lower, maximum(d) = r.upper\nNormal, Gamma, InverseGaussian, Logistic, LogNormal mean(d) = r.origin, std(d) = r.unit\nCauchy, Gumbel, Laplace, (Normal) Dist.location(d) = r.origin, Dist.scale(d) = r.unit\nPoisson Dist.mean(d) = r.unit\n\nHere Dist = Distributions.\n\n\n\n\n\n","category":"method"},{"location":"distributions/#Base.range-Union{Tuple{D}, Tuple{Union{Model, Type}, Union{Expr, Symbol}}} where D","page":"Distributions","title":"Base.range","text":"r = range(model, :hyper; values=nothing)\n\nDefine a one-dimensional NominalRange object for a field hyper of model. Note that r is not directly iterable but iterator(r) is.\n\nA nested hyperparameter is specified using dot notation. For example, :(atom.max_depth) specifies the max_depth hyperparameter of the submodel model.atom.\n\nr = range(model, :hyper; upper=nothing, lower=nothing,\n scale=nothing, values=nothing)\n\nAssuming values is not specified, define a one-dimensional NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n)is an iterator of length n. To generate random elements from r, instead apply rand methods to sampler(r). The supported scales are :linear,:log, :logminus, :log10, :log10minus, :log2, or a callable object.\n\nNote that r is not directly iterable, but iterator(r, n) is, for given resolution (length) n.\n\nBy default, the behaviour of the constructed object depends on the type of the value of the hyperparameter :hyper at model at the time of construction. To override this behaviour (for instance if model is not available) specify a type in place of model so the behaviour is determined by the value of the specified type.\n\nA nested hyperparameter is specified using dot notation (see above).\n\nIf scale is unspecified, it is set to :linear, :log, :log10minus, or :linear, according to whether the interval (lower, upper) is bounded, right-unbounded, left-unbounded, or doubly unbounded, respectively. Note upper=Inf and lower=-Inf are allowed.\n\nIf values is specified, the other keyword arguments are ignored and a NominalRange object is returned (see above).\n\nSee also: iterator, sampler\n\n\n\n\n\n","category":"method"},{"location":"distributions/#Utility-functions","page":"Distributions","title":"Utility functions","text":"","category":"section"},{"location":"distributions/","page":"Distributions","title":"Distributions","text":"Modules = [MLJBase]\nPages = [\"distributions.jl\"]","category":"page"},{"location":"utilities/#Utilities","page":"Utilities","title":"Utilities","text":"","category":"section"},{"location":"utilities/#Machines","page":"Utilities","title":"Machines","text":"","category":"section"},{"location":"utilities/","page":"Utilities","title":"Utilities","text":"Modules = [MLJBase]\nPages = [\"machines.jl\"]","category":"page"},{"location":"utilities/#Base.replace-Union{Tuple{C}, Tuple{Machine{<:Any, <:Any, C}, Vararg{Pair}}} where C","page":"Utilities","title":"Base.replace","text":"replace(mach::Machine, field1 => value1, field2 => value2, ...)\n\nPrivate method.\n\nReturn a shallow copy of the machine mach with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay, which is always a new instance Channel{Bool}(1).\n\nThe following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):\n\nreplace(mach, :args => (), :data => (), :data_resampled_data => (), :cache => nothing)\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.age-Tuple{Machine}","page":"Utilities","title":"MLJBase.age","text":"age(mach::Machine)\n\nReturn an integer representing the number of times mach has been trained or updated. For more detail, see the discussion of training logic at fit_only!.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.ancestors-Tuple{Machine}","page":"Utilities","title":"MLJBase.ancestors","text":"ancestors(mach::Machine; self=false)\n\nAll ancestors of mach, including mach if self=true.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.default_scitype_check_level","page":"Utilities","title":"MLJBase.default_scitype_check_level","text":"default_scitype_check_level()\n\nReturn the current global default value for scientific type checking when constructing machines.\n\ndefault_scitype_check_level(i::Integer)\n\nSet the global default value for scientific type checking to i.\n\nThe effect of the scitype_check_level option in calls of the form machine(model, data, scitype_check_level=...) is summarized below:\n\nscitype_check_level Inspect scitypes? If Unknown in scitypes If other scitype mismatch\n0 × \n1 (value at startup) ✓ warning\n2 ✓ warning warning\n3 ✓ warning error\n4 ✓ error error\n\nSee also machine\n\n\n\n\n\n","category":"function"},{"location":"utilities/#MLJBase.fit_only!-Union{Tuple{Machine{<:Any, <:Any, cache_data}}, Tuple{cache_data}} where cache_data","page":"Utilities","title":"MLJBase.fit_only!","text":"MLJBase.fit_only!(\n mach::Machine;\n rows=nothing,\n verbosity=1,\n force=false,\n composite=nothing,\n)\n\nWithout mutating any other machine on which it may depend, perform one of the following actions to the machine mach, using the data and model bound to it, and restricting the data to rows if specified:\n\nAb initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment mach.state.\nTraining update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements MLJBase.update). Increment mach.state.\nNo-operation. Leave existing learned parameters untouched. Do not increment mach.state.\n\nIf the model, model, bound to mach is a symbol, then instead perform the action using the true model given by getproperty(composite, model). See also machine.\n\nTraining action logic\n\nFor the action to be a no-operation, either mach.frozen == true or or none of the following apply:\n\n(i) mach has never been trained (mach.state == 0).\n(ii) force == true.\n(iii) The state of some other machine on which mach depends has changed since the last time mach was trained (ie, the last time mach.state was last incremented).\n(iv) The specified rows have changed since the last retraining and mach.model does not have Static type.\n(v) mach.model is a model and different from the last model used for training, but has the same type.\n(vi) mach.model is a model but has a type different from the last model used for training.\n(vii) mach.model is a symbol and (composite, mach.model) is different from the last model used for training, but has the same type.\n(viii) mach.model is a symbol and (composite, mach.model) has a different type from the last model used for training.\n\nIn any of the cases (i) - (iv), (vi), or (viii), mach is trained ab initio. If (v) or (vii) is true, then a training update is applied.\n\nTo freeze or unfreeze mach, use freeze!(mach) or thaw!(mach).\n\nImplementation details\n\nThe data to which a machine is bound is stored in mach.args. Each element of args is either a Node object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source node. In all cases, to obtain concrete data for actual training, each argument N is called, as in N() or N(rows=rows), and either MLJBase.fit (ab initio training) or MLJBase.update (training update) is dispatched on mach.model and this data. See the \"Adding models for general use\" section of the MLJ documentation for more on these lower-level training methods.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.freeze!-Tuple{Machine}","page":"Utilities","title":"MLJBase.freeze!","text":"freeze!(mach)\n\nFreeze the machine mach so that it will never be retrained (unless thawed).\n\nSee also thaw!.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.last_model-Tuple{Any}","page":"Utilities","title":"MLJBase.last_model","text":"last_model(mach::Machine)\n\nReturn the last model used to train the machine mach. This is a bona fide model, even if mach.model is a symbol.\n\nReturns nothing if mach has not been trained.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.machine","page":"Utilities","title":"MLJBase.machine","text":"machine(model, args...; cache=true, scitype_check_level=1)\n\nConstruct a Machine object binding a model, storing hyper-parameters of some machine learning algorithm, to some data, args. Calling fit! on a Machine instance mach stores outcomes of applying the algorithm in mach, which can be inspected using fitted_params(mach) (learned paramters) and report(mach) (other outcomes). This in turn enables generalization to new data using operations such as predict or transform:\n\nusing MLJModels\nX, y = make_regression()\n\nPCA = @load PCA pkg=MultivariateStats\nmodel = PCA()\nmach = machine(model, X)\nfit!(mach, rows=1:50)\ntransform(mach, selectrows(X, 51:100)) # or transform(mach, rows=51:100)\n\nDecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree\nmodel = DecisionTreeRegressor()\nmach = machine(model, X, y)\nfit!(mach, rows=1:50)\npredict(mach, selectrows(X, 51:100)) # or predict(mach, rows=51:100)\n\nSpecify cache=false to prioritize memory management over speed.\n\nWhen building a learning network, Node objects can be substituted for the concrete data but no type or dimension checks are applied.\n\nChecks on the types of training data\n\nA model articulates its data requirements using scientific types, i.e., using the scitype function instead of the typeof function.\n\nIf scitype_check_level > 0 then the scitype of each arg in args is computed, and this is compared with the scitypes expected by the model, unless args contains Unknown scitypes and scitype_check_level < 4, in which case no further action is taken. Whether warnings are issued or errors thrown depends the level. For details, see default_scitype_check_level, a method to inspect or change the default level (1 at startup).\n\nMachines with model placeholders\n\nA symbol can be substituted for a model in machine constructors to act as a placeholder for a model specified at training time. The symbol must be the field name for a struct whose corresponding value is a model, as shown in the following example:\n\nmutable struct MyComposite\n transformer\n classifier\nend\n\nmy_composite = MyComposite(Standardizer(), ConstantClassifier)\n\nX, y = make_blobs()\nmach = machine(:classifier, X, y)\nfit!(mach, composite=my_composite)\n\nThe last two lines are equivalent to\n\nmach = machine(ConstantClassifier(), X, y)\nfit!(mach)\n\nDelaying model specification is used when exporting learning networks as new stand-alone model types. See prefit and the MLJ documentation on learning networks.\n\nSee also fit!, default_scitype_check_level, MLJBase.save, serializable.\n\n\n\n\n\n","category":"function"},{"location":"utilities/#MLJBase.machine-Tuple{Union{IO, String}}","page":"Utilities","title":"MLJBase.machine","text":"machine(file::Union{String, IO})\n\nRebuild from a file a machine that has been serialized using the default Serialization module.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.report-Tuple{Machine}","page":"Utilities","title":"MLJBase.report","text":"report(mach)\n\nReturn the report for a machine mach that has been fit!, for example the coefficients in a linear model.\n\nThis is a named tuple and human-readable if possible.\n\nIf mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)\n\njulia> using MLJ\njulia> @load LinearBinaryClassifier pkg=GLM\njulia> X, y = @load_crabs;\njulia> pipe = Standardizer() |> LinearBinaryClassifier();\njulia> mach = machine(pipe, X, y) |> fit!;\n\njulia> report(mach).linear_binary_classifier\n(deviance = 3.8893386087844543e-7,\n dof_residual = 195.0,\n stderror = [18954.83496713119, 6502.845740757159, 48484.240246060406, 34971.131004997274, 20654.82322484894, 2111.1294584763386],\n vcov = [3.592857686311793e8 9.122732393971942e6 … -8.454645589364915e7 5.38856837634321e6; 9.122732393971942e6 4.228700272808351e7 … -4.978433790526467e7 -8.442545425533723e6; … ; -8.454645589364915e7 -4.978433790526467e7 … 4.2662172244975924e8 2.1799125705781363e7; 5.38856837634321e6 -8.442545425533723e6 … 2.1799125705781363e7 4.456867590446599e6],)\n\n\nSee also fitted_params\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.report_given_method-Tuple{Machine}","page":"Utilities","title":"MLJBase.report_given_method","text":"report_given_method(mach::Machine)\n\nSame as report(mach) but broken down by the method (fit, predict, etc) that contributed the report.\n\nA specialized method intended for learning network applications.\n\nThe return value is a dictionary keyed on the symbol representing the method (:fit, :predict, etc) and the values report contributed by that method.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.restore!","page":"Utilities","title":"MLJBase.restore!","text":"restore!(mach::Machine)\n\nRestore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach, one has mach.state=1. Intended for restoring deserialized machine objects to a useable form.\n\nFor an example see serializable.\n\n\n\n\n\n","category":"function"},{"location":"utilities/#MLJBase.serializable-Union{Tuple{Machine{<:Any, <:Any, C}}, Tuple{C}, Tuple{Machine{<:Any, <:Any, C}, Any}} where C","page":"Utilities","title":"MLJBase.serializable","text":"serializable(mach::Machine)\n\nReturns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.\n\nAny general purpose Julia serializer may be applied to the output of serializable (eg, JLSO, BSON, JLD) but you must call restore!(mach) on the deserialised object mach before using it. See the example below.\n\nIf using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save (or MLJ.save) method.\n\nA machine returned by serializable is characterized by the property mach.state == -1.\n\nExample using JLSO\n\nusing MLJ\nusing JLSO\nTree = @load DecisionTreeClassifier\ntree = Tree()\nX, y = @load_iris\nmach = fit!(machine(tree, X, y))\n\n# This machine can now be serialized\nsmach = serializable(mach)\nJLSO.save(\"machine.jlso\", :machine => smach)\n\n# Deserialize and restore learned parameters to useable form:\nloaded_mach = JLSO.load(\"machine.jlso\")[:machine]\nrestore!(loaded_mach)\n\npredict(loaded_mach, X)\npredict(mach, X)\n\nSee also restore!, MLJBase.save.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.thaw!-Tuple{Machine}","page":"Utilities","title":"MLJBase.thaw!","text":"thaw!(mach)\n\nUnfreeze the machine mach so that it can be retrained.\n\nSee also freeze!.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJModelInterface.feature_importances-Tuple{Machine}","page":"Utilities","title":"MLJModelInterface.feature_importances","text":"feature_importances(mach::Machine)\n\nReturn a list of feature => importance pairs for a fitted machine, mach, for supported models. Otherwise return nothing.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJModelInterface.fitted_params-Tuple{Machine}","page":"Utilities","title":"MLJModelInterface.fitted_params","text":"fitted_params(mach)\n\nReturn the learned parameters for a machine mach that has been fit!, for example the coefficients in a linear model.\n\nThis is a named tuple and human-readable if possible.\n\nIf mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)\n\njulia> using MLJ\njulia> @load LogisticClassifier pkg=MLJLinearModels\njulia> X, y = @load_crabs;\njulia> pipe = Standardizer() |> LogisticClassifier();\njulia> mach = machine(pipe, X, y) |> fit!;\n\njulia> fitted_params(mach).logistic_classifier\n(classes = CategoricalArrays.CategoricalValue{String,UInt32}[\"B\", \"O\"],\n coefs = Pair{Symbol,Float64}[:FL => 3.7095037897680405, :RW => 0.1135739140854546, :CL => -1.6036892745322038, :CW => -4.415667573486482, :BD => 3.238476051092471],\n intercept = 0.0883301599726305,)\n\nSee also report\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJModelInterface.save-Tuple{Union{IO, String}, Machine}","page":"Utilities","title":"MLJModelInterface.save","text":"MLJ.save(filename, mach::Machine)\nMLJ.save(io, mach::Machine)\n\nMLJBase.save(filename, mach::Machine)\nMLJBase.save(io, mach::Machine)\n\nSerialize the machine mach to a file with path filename, or to an input/output stream io (at least IOBuffer instances are supported) using the Serialization module.\n\nTo serialise using a different format, see serializable.\n\nMachines are deserialized using the machine constructor as shown in the example below.\n\nThe implementation of save for machines changed in MLJ 0.18 (MLJBase 0.20). You can only restore a machine saved using older versions of MLJ using an older version.\n\nExample\n\nusing MLJ\nTree = @load DecisionTreeClassifier\nX, y = @load_iris\nmach = fit!(machine(Tree(), X, y))\n\nMLJ.save(\"tree.jls\", mach)\nmach_predict_only = machine(\"tree.jls\")\npredict(mach_predict_only, X)\n\n# using a buffer:\nio = IOBuffer()\nMLJ.save(io, mach)\nseekstart(io)\npredict_only_mach = machine(io)\npredict(predict_only_mach, X)\n\nwarning: Only load files from trusted sources\nMaliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.\n\nSee also serializable, machine.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#StatsAPI.fit!-Tuple{Machine}","page":"Utilities","title":"StatsAPI.fit!","text":"fit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)\n\nFit the machine mach. In the case that mach has Node arguments, first train all other machines on which mach depends.\n\nTo attempt to fit a machine without touching any other machine, use fit_only!. For more on options and the the internal logic of fitting see fit_only!\n\n\n\n\n\n","category":"method"},{"location":"utilities/#Parameter-Inspection","page":"Utilities","title":"Parameter Inspection","text":"","category":"section"},{"location":"utilities/","page":"Utilities","title":"Utilities","text":"Modules = [MLJBase]\nPages = [\"parameter_inspection.jl\"]","category":"page"},{"location":"utilities/#Show","page":"Utilities","title":"Show","text":"","category":"section"},{"location":"utilities/","page":"Utilities","title":"Utilities","text":"Modules = [MLJBase]\nPages = [\"show.jl\"]","category":"page"},{"location":"utilities/#MLJBase._recursive_show-Tuple{IO, MLJType, Any, Any}","page":"Utilities","title":"MLJBase._recursive_show","text":"_recursive_show(stream, object, current_depth, depth)\n\nPrivate method.\n\nGenerate a table of the properties of the MLJType object, dislaying each property value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:\n\nIf f is itself a MLJType object, then its short form is shown and _recursive_show generates as separate table for each of its properties (and so on, up to a depth of argument depth).\nOtherwise f is displayed as \"(omitted T)\" where T = typeof(f), unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME\"plain/text\") form of f is shown. To override this behaviour, overload the _show method for the type in question.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.abbreviated-Tuple{Any}","page":"Utilities","title":"MLJBase.abbreviated","text":"to display abbreviated versions of integers\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.color_off-Tuple{}","page":"Utilities","title":"MLJBase.color_off","text":"color_off()\n\nSuppress color and bold output at the REPL for displaying MLJ objects.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.color_on-Tuple{}","page":"Utilities","title":"MLJBase.color_on","text":"color_on()\n\nEnable color and bold output at the REPL, for enhanced display of MLJ objects.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.handle-Tuple{Any}","page":"Utilities","title":"MLJBase.handle","text":"return abbreviated object id (as string) or it's registered handle (as string) if this exists\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.@constant-Tuple{Any}","page":"Utilities","title":"MLJBase.@constant","text":"@constant x = value\n\nPrivate method (used in testing).\n\nEquivalent to const x = value but registers the binding thus:\n\nMLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x\n\nRegistered objects get displayed using the variable name to which it was bound in calls to show(x), etc.\n\nWARNING: As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.\n\n\n\n\n\n","category":"macro"},{"location":"utilities/#MLJBase.@more-Tuple{}","page":"Utilities","title":"MLJBase.@more","text":"@more\n\nEntered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all properties of the last REPL value.\n\n\n\n\n\n","category":"macro"},{"location":"utilities/#Utility-functions","page":"Utilities","title":"Utility functions","text":"","category":"section"},{"location":"utilities/","page":"Utilities","title":"Utilities","text":"Modules = [MLJBase]\nPages = [\"utilities.jl\"]","category":"page"},{"location":"utilities/#MLJBase._permute_rows-Tuple{AbstractVecOrMat, Vector{Int64}}","page":"Utilities","title":"MLJBase._permute_rows","text":"_permute_rows(obj, perm)\n\nInternal function to return a vector or matrix with permuted rows given the permutation perm.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.available_name-Tuple{Any, Any}","page":"Utilities","title":"MLJBase.available_name","text":"available_name(modl::Module, name::Symbol)\n\nFunction to replace, if necessary, a given name with a modified one that ensures it is not the name of any existing object in the global scope of modl. Modifications are created with numerical suffixes.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.check_same_nrows-Tuple{Any, Any}","page":"Utilities","title":"MLJBase.check_same_nrows","text":"check_same_nrows(X, Y)\n\nInternal function to check two objects, each a vector or a matrix, have the same number of rows.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.chunks-Tuple{AbstractRange, Integer}","page":"Utilities","title":"MLJBase.chunks","text":"chunks(range, n)\n\nSplit an AbstractRange into n subranges of approximately equal length.\n\nExample\n\njulia> collect(chunks(1:5, 2))\n2-element Array{UnitRange{Int64},1}:\n 1:3\n 4:5\n\nPrivate method\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.flat_values-Tuple{NamedTuple}","page":"Utilities","title":"MLJBase.flat_values","text":"flat_values(t::NamedTuple)\n\nView a nested named tuple t as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.\n\njulia> t = (X = (x = 1, y = 2), Y = 3);\njulia> flat_values(t)\n(1, 2, 3)\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.generate_name!-Tuple{DataType, Any}","page":"Utilities","title":"MLJBase.generate_name!","text":"generate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)\n\nGiven a type M (e.g., MyEvenInteger{N}) return a symbolic, snake-case, representation of the type name (such as my_even_integer). The symbol is pushed to existing_names, which must be an AbstractVector to which a Symbol can be pushed.\n\nIf the snake-case representation already exists in existing_names a suitable integer is appended to the name.\n\nIf only is specified, then the operation is restricted to those M for which M isa only. In all other cases the symbolic name is generated using substitute as the base symbol.\n\njulia> existing_names = [];\njulia> generate_name!(Vector{Int}, existing_names)\n:vector\n\njulia> generate_name!(Vector{Int}, existing_names)\n:vector2\n\njulia> generate_name!(AbstractFloat, existing_names)\n:abstract_float\n\njulia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)\n:not_array\n\njulia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)\n:not_array2\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.guess_model_target_observation_scitype-Tuple{Any}","page":"Utilities","title":"MLJBase.guess_model_target_observation_scitype","text":"guess_model_targetobservation_scitype(model)\n\nPrivate method\n\nTry to infer a lowest upper bound on the scitype of target observations acceptable to model, by inspecting target_scitype(model). Return Unknown if unable to draw reliable inferrence.\n\nThe observation scitype for a table is here understood as the scitype of a row converted to a vector.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.guess_observation_scitype-Tuple{Any}","page":"Utilities","title":"MLJBase.guess_observation_scitype","text":"guess_observation_scitype(y)\n\nPrivate method.\n\nIf y is an AbstractArray, return the scitype of y[:, :, ..., :, 1]. If y is a table, return the scitype of the first row, converted to a vector, unless this row has missing elements, in which case return Unknown.\n\nIn all other cases, Unknown.\n\njulia> guess_observation_scitype([missing, 1, 2, 3])\nUnion{Missing, Count}\n\njulia> guess_observation_scitype(rand(3, 2))\nAbstractVector{Continuous}\n\njulia> guess_observation_scitype((x=rand(3), y=rand(Bool, 3)))\nAbstractVector{Union{Continuous, Count}}\n\njulia> guess_observation_scitype((x=[missing, 1, 2], y=[1, 2, 3]))\nUnknown\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.init_rng-Tuple{Any}","page":"Utilities","title":"MLJBase.init_rng","text":"init_rng(rng)\n\nCreate an AbstractRNG from rng. If rng is a non-negative Integer, it returns a MersenneTwister random number generator seeded with rng; If rng is an AbstractRNG object it returns rng, otherwise it throws an error.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.observation-Tuple{Type}","page":"Utilities","title":"MLJBase.observation","text":"observation(S)\n\nPrivate method.\n\nTries to infer the per-observation scitype from the scitype of S, when S is known to be the scitype of some container with multiple observations; here we view the scitype for one row of a table to be the scitype of the row converted to a vector. Return Unknown if unable to draw reliable inferrence.\n\nThe observation scitype for a table is here understood as the scitype of a row converted to a vector.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.prepend-Tuple{Symbol, Nothing}","page":"Utilities","title":"MLJBase.prepend","text":"MLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})\n\nFor prepending symbols in expressions like :(y.w) and :(x1.x2.x3).\n\njulia> prepend(:x, :y)\n:(x.y)\n\njulia> prepend(:x, :(y.z))\n:(x.y.z)\n\njulia> prepend(:w, ans)\n:(w.x.y.z)\n\nIf the second argument is nothing, then nothing is returned.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.recursive_getproperty-Tuple{Any, Symbol}","page":"Utilities","title":"MLJBase.recursive_getproperty","text":"recursive_getproperty(object, nested_name::Expr)\n\nCall getproperty recursively on object to extract the value of some nested property, as in the following example:\n\njulia> object = (X = (x = 1, y = 2), Y = 3);\njulia> recursive_getproperty(object, :(X.y))\n2\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.recursive_setproperty!-Tuple{Any, Symbol, Any}","page":"Utilities","title":"MLJBase.recursive_setproperty!","text":"recursively_setproperty!(object, nested_name::Expr, value)\n\nSet a nested property of an object to value, as in the following example:\n\njulia> mutable struct Foo\n X\n Y\n end\n\njulia> mutable struct Bar\n x\n y\n end\n\njulia> object = Foo(Bar(1, 2), 3)\nFoo(Bar(1, 2), 3)\n\njulia> recursively_setproperty!(object, :(X.y), 42)\n42\n\njulia> object\nFoo(Bar(1, 42), 3)\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.sequence_string-Union{Tuple{Itr}, Tuple{Itr, Any}} where Itr","page":"Utilities","title":"MLJBase.sequence_string","text":"sequence_string(itr, n=3)\n\nReturn a \"sequence\" string from the first n elements generated by itr.\n\njulia> MLJBase.sequence_string(1:10, 4)\n\"1, 2, 3, 4, ...\"\n\nPrivate method.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.shuffle_rows-Tuple{AbstractVecOrMat, AbstractVecOrMat}","page":"Utilities","title":"MLJBase.shuffle_rows","text":"shuffle_rows(X::AbstractVecOrMat,\n Y::AbstractVecOrMat;\n rng::AbstractRNG=Random.GLOBAL_RNG)\n\nReturn row-shuffled vectors or matrices using a random permutation of X and Y. An optional random number generator can be specified using the rng argument.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.unwind-Tuple","page":"Utilities","title":"MLJBase.unwind","text":"unwind(iterators...)\n\nRepresent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.\n\nExample\n\njulia> iterators = ([1, 2], [\"a\",\"b\"], [\"x\", \"y\", \"z\"]);\njulia> MLJTuning.unwind(iterators...)\n12×3 Array{Any,2}:\n 1 \"a\" \"x\"\n 2 \"a\" \"x\"\n 1 \"b\" \"x\"\n 2 \"b\" \"x\"\n 1 \"a\" \"y\"\n 2 \"a\" \"y\"\n 1 \"b\" \"y\"\n 2 \"b\" \"y\"\n 1 \"a\" \"z\"\n 2 \"a\" \"z\"\n 1 \"b\" \"z\"\n 2 \"b\" \"z\"\n\n\n\n\n\n","category":"method"},{"location":"resampling/#Resampling","page":"Resampling","title":"Resampling","text":"","category":"section"},{"location":"resampling/","page":"Resampling","title":"Resampling","text":"Modules = [MLJBase]\nPages = [\"resampling.jl\"]","category":"page"},{"location":"resampling/#MLJBase.CV","page":"Resampling","title":"MLJBase.CV","text":"cv = CV(; nfolds=6, shuffle=nothing, rng=nothing)\n\nCross-validation resampling strategy, for use in evaluate!, evaluate and tuning.\n\ntrain_test_pairs(cv, rows)\n\nReturns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector. With no row pre-shuffling, the order of rows is preserved, in the sense that rows coincides precisely with the concatenation of the test vectors, in the order they are generated. The first r test vectors have length n + 1, where n, r = divrem(length(rows), nfolds), and the remaining test vectors have length n.\n\nPre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the CV keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.\n\nIf rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.CompactPerformanceEvaluation","page":"Resampling","title":"MLJBase.CompactPerformanceEvaluation","text":"CompactPerformanceEvaluation <: AbstractPerformanceEvaluation\n\nType of object returned by evaluate (for models plus data) or evaluate! (for machines) when called with the option compact = true. Such objects have the same structure as the PerformanceEvaluation objects returned by default, except that the following fields are omitted to save memory: fitted_params_per_fold, report_per_fold, train_test_rows.\n\nFor more on the remaining fields, see PerformanceEvaluation.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.Holdout","page":"Resampling","title":"MLJBase.Holdout","text":"holdout = Holdout(; fraction_train=0.7, shuffle=nothing, rng=nothing)\n\nInstantiate a Holdout resampling strategy, for use in evaluate!, evaluate and in tuning.\n\ntrain_test_pairs(holdout, rows)\n\nReturns the pair [(train, test)], where train and test are vectors such that rows=vcat(train, test) and length(train)/length(rows) is approximatey equal to fraction_train`.\n\nPre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the Holdout keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.\n\nIf rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is specified.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.InSample","page":"Resampling","title":"MLJBase.InSample","text":"in_sample = InSample()\n\nInstantiate an InSample resampling strategy, for use in evaluate!, evaluate and in tuning. In this strategy the train and test sets are the same, and consist of all observations specified by the rows keyword argument. If rows is not specified, all supplied rows are used.\n\nExample\n\nusing MLJBase, MLJModels\n\nX, y = make_blobs() # a table and a vector\nmodel = ConstantClassifier()\ntrain, test = partition(eachindex(y), 0.7) # train:test = 70:30\n\nCompute in-sample (training) loss:\n\nevaluate(model, X, y, resampling=InSample(), rows=train, measure=brier_loss)\n\nCompute the out-of-sample loss:\n\nevaluate(model, X, y, resampling=[(train, test),], measure=brier_loss)\n\nOr equivalently:\n\nevaluate(model, X, y, resampling=Holdout(fraction_train=0.7), measure=brier_loss)\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.PerformanceEvaluation","page":"Resampling","title":"MLJBase.PerformanceEvaluation","text":"PerformanceEvaluation <: AbstractPerformanceEvaluation\n\nType of object returned by evaluate (for models plus data) or evaluate! (for machines). Such objects encode estimates of the performance (generalization error) of a supervised model or outlier detection model, and store other information ancillary to the computation.\n\nIf evaluate or evaluate! is called with the compact=true option, then a CompactPerformanceEvaluation object is returned instead.\n\nWhen evaluate/evaluate! is called, a number of train/test pairs (\"folds\") of row indices are generated, according to the options provided, which are discussed in the evaluate! doc-string. Rows correspond to observations. The generated train/test pairs are recorded in the train_test_rows field of the PerformanceEvaluation struct, and the corresponding estimates, aggregated over all train/test pairs, are recorded in measurement, a vector with one entry for each measure (metric) recorded in measure.\n\nWhen displayed, a PerformanceEvaluation object includes a value under the heading 1.96*SE, derived from the standard error of the per_fold entries. This value is suitable for constructing a formal 95% confidence interval for the given measurement. Such intervals should be interpreted with caution. See, for example, Bates et al. (2021).\n\nFields\n\nThese fields are part of the public API of the PerformanceEvaluation struct.\n\nmodel: model used to create the performance evaluation. In the case a tuning model, this is the best model found.\nmeasure: vector of measures (metrics) used to evaluate performance\nmeasurement: vector of measurements - one for each element of measure - aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measure m is StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())\noperation (e.g., predict_mode): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: predict, predict_mean, predict_mode, predict_median, or predict_joint.\nper_fold: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.\nper_observation: a vector of vectors of vectors containing individual per-observation measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for the ith observation in the fth test fold, evaluated using the mth measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measure measure is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed with the per_observation=false option, then e_per_observation is a vector of missings.\nfitted_params_per_fold: a vector containing fitted params(mach) for each machine mach trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.\nreport_per_fold: a vector containing report(mach) for each machine mach training in resampling - one machine per train/test pair.\ntrain_test_rows: a vector of tuples, each of the form (train, test), where train and test are vectors of row (observation) indices for training and evaluation respectively.\nresampling: the user-specified resampling strategy to generate the train/test pairs (or literal train/test pairs if that was directly specified).\nrepeats: the number of times the resampling strategy was repeated.\n\nSee also CompactPerformanceEvaluation.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.Resampler","page":"Resampling","title":"MLJBase.Resampler","text":"resampler = Resampler(\n model=ConstantRegressor(),\n resampling=CV(),\n measure=nothing,\n weights=nothing,\n class_weights=nothing\n operation=predict,\n repeats = 1,\n acceleration=default_resource(),\n check_measure=true,\n per_observation=true,\n logger=nothing,\n compact=false,\n)\n\nResampling model wrapper, used internally by the fit method of TunedModel instances and IteratedModel instances. See `evaluate! for options. Not intended for use by general user, who will ordinarily use evaluate! directly.\n\nGiven a machine mach = machine(resampler, args...) one obtains a performance evaluation of the specified model, performed according to the prescribed resampling strategy and other parameters, using data args..., by calling fit!(mach) followed by evaluate(mach).\n\nOn subsequent calls to fit!(mach) new train/test pairs of row indices are only regenerated if resampling, repeats or cache fields of resampler have changed. The evolution of an RNG field of resampler does not constitute a change (== for MLJType objects is not sensitive to such changes; see is_same_except).\n\nIf there is single train/test pair, then warm-restart behavior of the wrapped model resampler.model will extend to warm-restart behaviour of the wrapper resampler, with respect to mutations of the wrapped model.\n\nThe sample weights are passed to the specified performance measures that support weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.\n\nThe sample class_weights are passed to the specified performance measures that support per-class weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.StratifiedCV","page":"Resampling","title":"MLJBase.StratifiedCV","text":"stratified_cv = StratifiedCV(; nfolds=6,\n shuffle=false,\n rng=Random.GLOBAL_RNG)\n\nStratified cross-validation resampling strategy, for use in evaluate!, evaluate and in tuning. Applies only to classification problems (OrderedFactor or Multiclass targets).\n\ntrain_test_pairs(stratified_cv, rows, y)\n\nReturns an nfolds-length iterator of (train, test) pairs of vectors (row indices) where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector.\n\nUnlike regular cross-validation, the distribution of the levels of the target y corresponding to each train and test is constrained, as far as possible, to replicate that of y[rows] as a whole.\n\nThe stratified train_test_pairs algorithm is invariant to label renaming. For example, if you run replace!(y, 'a' => 'b', 'b' => 'a') and then re-run train_test_pairs, the returned (train, test) pairs will be the same.\n\nPre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the StratifedCV keywod constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.\n\nIf rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.TimeSeriesCV","page":"Resampling","title":"MLJBase.TimeSeriesCV","text":"tscv = TimeSeriesCV(; nfolds=4)\n\nCross-validation resampling strategy, for use in evaluate!, evaluate and tuning, when observations are chronological and not expected to be independent.\n\ntrain_test_pairs(tscv, rows)\n\nReturns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The rows are partitioned sequentially into nfolds + 1 approximately equal length partitions, where the first partition is the first train set, and the second partition is the first test set. The second train set consists of the first two partitions, and the second test set consists of the third partition, and so on for each fold.\n\nThe first partition (which is the first train set) has length n + r, where n, r = divrem(length(rows), nfolds + 1), and the remaining partitions (all of the test folds) have length n.\n\nExamples\n\njulia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)\n3-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:\n (1:4, 5:6)\n (1:6, 7:8)\n (1:8, 9:10)\n\njulia> model = (@load RidgeRegressor pkg=MultivariateStats verbosity=0)();\n\njulia> data = @load_sunspots;\n\njulia> X = (lag1 = data.sunspot_number[2:end-1],\n lag2 = data.sunspot_number[1:end-2]);\n\njulia> y = data.sunspot_number[3:end];\n\njulia> tscv = TimeSeriesCV(nfolds=3);\n\njulia> evaluate(model, X, y, resampling=tscv, measure=rmse, verbosity=0)\n┌───────────────────────────┬───────────────┬────────────────────┐\n│ _.measure │ _.measurement │ _.per_fold │\n├───────────────────────────┼───────────────┼────────────────────┤\n│ RootMeanSquaredError @753 │ 21.7 │ [25.4, 16.3, 22.4] │\n└───────────────────────────┴───────────────┴────────────────────┘\n_.per_observation = [missing]\n_.fitted_params_per_fold = [ … ]\n_.report_per_fold = [ … ]\n_.train_test_rows = [ … ]\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.evaluate!-Tuple{Machine{<:Union{Annotator, Supervised}}}","page":"Resampling","title":"MLJBase.evaluate!","text":"evaluate!(mach; resampling=CV(), measure=nothing, options...)\n\nEstimate the performance of a machine mach wrapping a supervised model in data, using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector. Returns a PerformanceEvaluation object.\n\nAvailable resampling strategies are CV, Holdout, InSample, StratifiedCV and TimeSeriesCV. If resampling is not an instance of one of these, then a vector of tuples of the form (train_rows, test_rows) is expected. For example, setting\n\nresampling = [((1:100), (101:200)),\n ((101:200), (1:100))]\n\ngives two-fold cross-validation using the first 200 rows of data.\n\nAny measure conforming to the StatisticalMeasuresBase.jl API can be provided, assuming it can consume multiple observations.\n\nAlthough evaluate! is mutating, mach.model and mach.args are not mutated.\n\nAdditional keyword options\n\nrows - vector of observation indices from which both train and test folds are constructed (default is all observations)\noperation/operations=nothing - One of predict, predict_mean, predict_mode, predict_median, or predict_joint, or a vector of these of the same length as measure/measures. Automatically inferred if left unspecified. For example, predict_mode will be used for a Multiclass target, if model is a probabilistic predictor, but measure is expects literal (point) target predictions. Operations actually applied can be inspected from the operation field of the object returned.\nweights - per-sample Real weights for measures that support them (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).\nclass_weights - dictionary of Real per-class weights for use with measures that support these, in classification problems (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).\nrepeats::Int=1: set to a higher value for repeated (Monte Carlo) resampling. For example, if repeats = 10, then resampling = CV(nfolds=5, shuffle=true), generates a total of 50 (train, test) pairs for evaluation and subsequent aggregation.\nacceleration=CPU1(): acceleration/parallelization option; can be any instance of CPU1, (single-threaded computation), CPUThreads (multi-threaded computation) or CPUProcesses (multi-process computation); default is default_resource(). These types are owned by ComputationalResources.jl.\nforce=false: set to true to force cold-restart of each training event\nverbosity::Int=1 logging level; can be negative\ncheck_measure=true: whether to screen measures for possible incompatibility with the model. Will not catch all incompatibilities.\nper_observation=true: whether to calculate estimates for individual observations; if false the per_observation field of the returned object is populated with missings. Setting to false may reduce compute time and allocations.\nlogger - a logger object (see MLJBase.log_evaluation)\ncompact=false - if true, the returned evaluation object excludes these fields: fitted_params_per_fold, report_per_fold, train_test_rows.\n\nSee also evaluate, PerformanceEvaluation, CompactPerformanceEvaluation.\n\n\n\n\n\n","category":"method"},{"location":"resampling/#MLJBase.log_evaluation-Tuple{Any, Any}","page":"Resampling","title":"MLJBase.log_evaluation","text":"log_evaluation(logger, performance_evaluation)\n\nLog a performance evaluation to logger, an object specific to some logging platform, such as mlflow. If logger=nothing then no logging is performed. The method is called at the end of every call to evaluate/evaluate! using the logger provided by the logger keyword argument.\n\nImplementations for new logging platforms\n\nJulia interfaces to workflow logging platforms, such as mlflow (provided by the MLFlowClient.jl interface) should overload log_evaluation(logger::LoggerType, performance_evaluation), where LoggerType is a platform-specific type for logger objects. For an example, see the implementation provided by the MLJFlow.jl package.\n\n\n\n\n\n","category":"method"},{"location":"resampling/#MLJModelInterface.evaluate-Tuple{Union{Annotator, Supervised}, Vararg{Any}}","page":"Resampling","title":"MLJModelInterface.evaluate","text":"evaluate(model, data...; cache=true, options...)\n\nEquivalent to evaluate!(machine(model, data..., cache=cache); options...). See the machine version evaluate! for the complete list of options.\n\nReturns a PerformanceEvaluation object.\n\nSee also evaluate!.\n\n\n\n\n\n","category":"method"},{"location":"composition/#Composition","page":"Composition","title":"Composition","text":"","category":"section"},{"location":"composition/#Composites","page":"Composition","title":"Composites","text":"","category":"section"},{"location":"composition/","page":"Composition","title":"Composition","text":"Modules = [MLJBase]\nPages = [\"composition/composites.jl\"]","category":"page"},{"location":"composition/#Networks","page":"Composition","title":"Networks","text":"","category":"section"},{"location":"composition/","page":"Composition","title":"Composition","text":"Modules = [MLJBase]\nPages = [\"composition/networks.jl\"]","category":"page"},{"location":"composition/#Pipelines","page":"Composition","title":"Pipelines","text":"","category":"section"},{"location":"composition/","page":"Composition","title":"Composition","text":"Modules = [MLJBase]\nPages = [\"composition/pipeline_static.jl\", \"composition/pipelines.jl\"]","category":"page"},{"location":"#MLJBase.jl","page":"Home","title":"MLJBase.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"These docs are bare-bones and auto-generated. Complete MLJ documentation is here. ","category":"page"},{"location":"","page":"Home","title":"Home","text":"For MLJBase-specific developer information, see also the README.md file.","category":"page"},{"location":"datasets/#Datasets","page":"Datasets","title":"Datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Pages = [\"data/datasets_synthetic.jl\"]","category":"page"},{"location":"datasets/#Standard-datasets","page":"Datasets","title":"Standard datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"To add a new dataset assuming it has a header and is, at path data/newdataset.csv","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Start by loading it with CSV:","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"fpath = joinpath(\"datadir\", \"newdataset.csv\")\ndata = CSV.read(fpath, copycols=true,\n categorical=true)","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Load it with DelimitedFiles and Tables","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"data_raw, data_header = readdlm(fpath, ',', header=true)\ndata_table = Tables.table(data_raw; header=Symbol.(vec(data_header)))","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Retrieve the conversions:","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"for (n, st) in zip(names(data), scitype_union.(eachcol(data)))\n println(\":$n=>$st,\")\nend","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Copy and paste the result in a coerce","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"data_table = coerce(data_table, ...)","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MLJBase]\nPages = [\"data/datasets.jl\"]","category":"page"},{"location":"datasets/#MLJBase.load_dataset-Tuple{String, Tuple}","page":"Datasets","title":"MLJBase.load_dataset","text":"load_dataset(fpath, coercions)\n\nLoad one of standard dataset like Boston etc assuming the file is a comma separated file with a header.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.load_sunspots-Tuple{}","page":"Datasets","title":"MLJBase.load_sunspots","text":"Load a well-known sunspot time series (table with one column). [https://www.sws.bom.gov.au/Educational/2/3/6]](https://www.sws.bom.gov.au/Educational/2/3/6)\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.@load_ames-Tuple{}","page":"Datasets","title":"MLJBase.@load_ames","text":"Load the full version of the well-known Ames Housing task.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_boston-Tuple{}","page":"Datasets","title":"MLJBase.@load_boston","text":"Load a well-known public regression dataset with Continuous features.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_crabs-Tuple{}","page":"Datasets","title":"MLJBase.@load_crabs","text":"Load a well-known crab classification dataset with nominal features.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_iris-Tuple{}","page":"Datasets","title":"MLJBase.@load_iris","text":"Load a well-known public classification task with nominal features.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_reduced_ames-Tuple{}","page":"Datasets","title":"MLJBase.@load_reduced_ames","text":"Load a reduced version of the well-known Ames Housing task\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_smarket-Tuple{}","page":"Datasets","title":"MLJBase.@load_smarket","text":"Load S&P Stock Market dataset, as used in (An Introduction to Statistical Learning with applications in R)https://rdrr.io/cran/ISLR/man/Smarket.html, by Witten et al (2013), Springer-Verlag, New York.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_sunspots-Tuple{}","page":"Datasets","title":"MLJBase.@load_sunspots","text":"Load a well-known sunspot time series (single table with one column).\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#Synthetic-datasets","page":"Datasets","title":"Synthetic datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MLJBase]\nPages = [\"data/datasets_synthetic.jl\"]","category":"page"},{"location":"datasets/#MLJBase.augment_X-Tuple{Matrix{<:Real}, Bool}","page":"Datasets","title":"MLJBase.augment_X","text":"augment_X(X, fit_intercept)\n\nGiven a matrix X, append a column of ones if fit_intercept is true. See make_regression.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.finalize_Xy-NTuple{6, Any}","page":"Datasets","title":"MLJBase.finalize_Xy","text":"finalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)\n\nInternal function to finalize the make_* functions.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.make_blobs","page":"Datasets","title":"MLJBase.make_blobs","text":"X, y = make_blobs(n=100, p=2; kwargs...)\n\nGenerate Gaussian blobs for clustering and classification problems.\n\nReturn value\n\nBy default, a table X with p columns (features) and n rows (observations), together with a corresponding vector of n Multiclass target observations y, indicating blob membership.\n\nKeyword arguments\n\nshuffle=true: whether to shuffle the resulting points,\ncenters=3: either a number of centers or a c x p matrix with c pre-determined centers,\ncluster_std=1.0: the standard deviation(s) of each blob,\ncenter_box=(-10. => 10.): the limits of the p-dimensional cube within which the cluster centers are drawn if they are not provided,\neltype=Float64: machine type of points (any subtype of AbstractFloat).\nrng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).\nas_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type. \n\nExample\n\nX, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MLJBase.make_circles","page":"Datasets","title":"MLJBase.make_circles","text":"X, y = make_circles(n=100; kwargs...)\n\nGenerate n labeled points close to two concentric circles for classification and clustering models.\n\nReturn value\n\nBy default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the smaller or larger circle, respectively.\n\nKeyword arguments\n\nshuffle=true: whether to shuffle the resulting points,\nnoise=0: standard deviation of the Gaussian noise added to the data,\nfactor=0.8: ratio of the smaller radius over the larger one,\n\neltype=Float64: machine type of points (any subtype of AbstractFloat).\nrng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).\nas_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type. \n\nExample\n\nX, y = make_circles(100; noise=0.5, factor=0.3)\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MLJBase.make_moons","page":"Datasets","title":"MLJBase.make_moons","text":"make_moons(n::Int=100; kwargs...)\n\nGenerates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.\n\nReturn value\n\nBy default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the left or right semi-circle.\n\nKeyword arguments\n\nshuffle=true: whether to shuffle the resulting points,\nnoise=0.1: standard deviation of the Gaussian noise added to the data,\nxshift=1.0: horizontal translation of the second center with respect to the first one.\nyshift=0.3: vertical translation of the second center with respect to the first one. \neltype=Float64: machine type of points (any subtype of AbstractFloat).\nrng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).\nas_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type. \n\nExample\n\nX, y = make_moons(100; noise=0.5)\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MLJBase.make_regression","page":"Datasets","title":"MLJBase.make_regression","text":"make_regression(n, p; kwargs...)\n\nGenerate Gaussian input features and a linear response with Gaussian noise, for use with regression models.\n\nReturn value\n\nBy default, a tuple (X, y) where table X has p columns and n rows (observations), together with a corresponding vector of n Continuous target observations y.\n\nKeywords\n\nintercept=true: Whether to generate data from a model with intercept.\nn_targets=1: Number of columns in the target.\nsparse=0: Proportion of the generating weight vector that is sparse.\nnoise=0.1: Standard deviation of the Gaussian noise added to the response (target).\noutliers=0: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary is false.)\nas_table=true: Whether X (and y, if n_targets > 1) should be a table or a matrix.\neltype=Float64: Element type for X and y. Must subtype AbstractFloat.\nbinary=false: Whether the target should be binarized (via a sigmoid).\neltype=Float64: machine type of points (any subtype of AbstractFloat).\nrng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).\nas_table=true: whether to return the points as a table (true) or a matrix (false). \n\nExample\n\nX, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MLJBase.outlify!-Tuple{Any, Any, Any}","page":"Datasets","title":"MLJBase.outlify!","text":"Add outliers to portion s of vector.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.runif_ab-NTuple{5, Any}","page":"Datasets","title":"MLJBase.runif_ab","text":"runif_ab(rng, n, p, a, b)\n\nInternal function to generate n points in [a, b]ᵖ uniformly at random.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.sigmoid-Tuple{Float64}","page":"Datasets","title":"MLJBase.sigmoid","text":"sigmoid(x)\n\nReturn the sigmoid computed in a numerically stable way: σ(x) = 1(1+exp(-x))\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.sparsify!-Tuple{Any, Any, Any}","page":"Datasets","title":"MLJBase.sparsify!","text":"sparsify!(rng, θ, s)\n\nMake portion s of vector θ exactly 0.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#Utility-functions","page":"Datasets","title":"Utility functions","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MLJBase]\nPages = [\"data/data.jl\"]","category":"page"},{"location":"datasets/#MLJBase.complement-Tuple{Any, Any}","page":"Datasets","title":"MLJBase.complement","text":"complement(folds, i)\n\nThe complement of the ith fold of folds in the concatenation of all elements of folds. Here folds is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.\n\ncomplement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.corestrict-Union{Tuple{N}, Tuple{Tuple{Vararg{T, N}} where T, Any}} where N","page":"Datasets","title":"MLJBase.corestrict","text":"corestrict(X, folds, i)\n\nThe restriction of X, a vector, matrix or table, to the complement of the ith fold of folds, where folds is a tuple of vectors of row indices.\n\nThe method is curried, so that corestrict(folds, i) is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i).\n\nExample\n\nfolds = ([1, 2], [3, 4, 5], [6,])\ncorestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.partition-Tuple{Any, Vararg{Real}}","page":"Datasets","title":"MLJBase.partition","text":"partition(X, fractions...;\n shuffle=nothing,\n rng=Random.GLOBAL_RNG,\n stratify=nothing,\n multi=false)\n\nSplits the vector, matrix or table X into a tuple of objects of the same type, whose vertical concatenation is X. The number of rows in each component of the return value is determined by the corresponding fractions of length(nrows(X)), where valid fractions are floats between 0 and 1 whose sum is less than one. The last fraction is not provided, as it is inferred from the preceding ones.\n\nFor synchronized partitioning of multiple objects, use the multi=true option.\n\njulia> partition(1:1000, 0.8)\n([1,...,800], [801,...,1000])\n\njulia> partition(1:1000, 0.2, 0.7)\n([1,...,200], [201,...,900], [901,...,1000])\n\njulia> partition(reshape(1:10, 5, 2), 0.2, 0.4)\n([1 6], [2 7; 3 8], [4 9; 5 10])\n\njulia> X, y = make_blobs() # a table and vector\njulia> Xtrain, Xtest = partition(X, 0.8, stratify=y)\n\nHere's an example of synchronized partitioning of multiple objects:\n\njulia> (Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)\n\nKeywords\n\nshuffle=nothing: if set to true, shuffles the rows before taking fractions.\nrng=Random.GLOBAL_RNG: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing is interpreted as true.\nstratify=nothing: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle cannot be false.\nmulti=false: if true then X is expected to be a tuple of objects sharing a common length, which are each partitioned separately using the same specified fractions and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.restrict-Union{Tuple{N}, Tuple{Tuple{Vararg{T, N}} where T, Any}} where N","page":"Datasets","title":"MLJBase.restrict","text":"restrict(X, folds, i)\n\nThe restriction of X, a vector, matrix or table, to the ith fold of folds, where folds is a tuple of vectors of row indices.\n\nThe method is curried, so that restrict(folds, i) is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i).\n\nExample\n\n\n\nfolds = ([1, 2], [3, 4, 5], [6,])\nrestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]\n\nSee also corestrict\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.skipinvalid-Tuple{Any}","page":"Datasets","title":"MLJBase.skipinvalid","text":"skipinvalid(itr)\n\nReturn an iterator over the elements in itr skipping missing and NaN values. Behaviour is similar to skipmissing.\n\nskipinvalid(A, B)\n\nFor vectors A and B of the same length, return a tuple of vectors (A[mask], B[mask]) where mask[i] is true if and only if A[i] and B[i] are both valid (non-missing and non-NaN). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing from the element types if present in the original iterators.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.unpack-Tuple{Any, Vararg{Any}}","page":"Datasets","title":"MLJBase.unpack","text":"unpack(table, f1, f2, ... fk;\n wrap_singles=false,\n shuffle=false,\n rng::Union{AbstractRNG,Int,Nothing}=nothing,\n coerce_options...)\n\nHorizontally split any Tables.jl compatible table into smaller tables or vectors by making column selections determined by the predicates f1, f2, ..., fk. Selection from the column names is without replacement. A predicate is any object f such that f(name) is true or false for each column name::Symbol of table.\n\nReturns a tuple of tables/vectors with length one greater than the number of supplied predicates, with the last component including all previously unselected columns.\n\njulia> table = DataFrame(x=[1,2], y=['a', 'b'], z=[10.0, 20.0], w=[\"A\", \"B\"])\n2×4 DataFrame\n Row │ x y z w\n │ Int64 Char Float64 String\n─────┼──────────────────────────────\n 1 │ 1 a 10.0 A\n 2 │ 2 b 20.0 B\n\njulia> Z, XY, W = unpack(table, ==(:z), !=(:w));\njulia> Z\n2-element Vector{Float64}:\n 10.0\n 20.0\n\njulia> XY\n2×2 DataFrame\n Row │ x y\n │ Int64 Char\n─────┼─────────────\n 1 │ 1 a\n 2 │ 2 b\n\njulia> W # the column(s) left over\n2-element Vector{String}:\n \"A\"\n \"B\"\n\nWhenever a returned table contains a single column, it is converted to a vector unless wrap_singles=true.\n\nIf coerce_options are specified then table is first replaced with coerce(table, coerce_options). See ScientificTypes.coerce for details.\n\nIf shuffle=true then the rows of table are first shuffled, using the global RNG, unless rng is specified; if rng is an integer, it specifies the seed of an automatically generated Mersenne twister. If rng is specified then shuffle=true is implicit.\n\n\n\n\n\n","category":"method"}] +[{"location":"distributions/#Distributions","page":"Distributions","title":"Distributions","text":"","category":"section"},{"location":"distributions/#Univariate-Finite-Distribution","page":"Distributions","title":"Univariate Finite Distribution","text":"","category":"section"},{"location":"distributions/","page":"Distributions","title":"Distributions","text":"Modules = [MLJBase]\nPages = [\"interface/univariate_finite.jl\"]","category":"page"},{"location":"distributions/#hyperparameters","page":"Distributions","title":"hyperparameters","text":"","category":"section"},{"location":"distributions/","page":"Distributions","title":"Distributions","text":"Modules = [MLJBase]\nPages = [\"hyperparam/one_dimensional_range_methods.jl\", \"hyperparam/one_dimensional_ranges.jl\"]","category":"page"},{"location":"distributions/#Distributions.sampler-Union{Tuple{T}, Tuple{NumericRange{T}, Distributions.UnivariateDistribution}} where T","page":"Distributions","title":"Distributions.sampler","text":"sampler(r::NominalRange, probs::AbstractVector{<:Real})\nsampler(r::NominalRange)\nsampler(r::NumericRange{T}, d)\n\nConstruct an object s which can be used to generate random samples from a ParamRange object r (a one-dimensional range) using one of the following calls:\n\nrand(s) # for one sample\nrand(s, n) # for n samples\nrand(rng, s [, n]) # to specify an RNG\n\nThe argument probs can be any probability vector with the same length as r.values. The second sampler method above calls the first with a uniform probs vector.\n\nThe argument d can be either an arbitrary instance of UnivariateDistribution from the Distributions.jl package, or one of a Distributions.jl types for which fit(d, ::NumericRange) is defined. These include: Arcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweight, Normal, Gamma, InverseGaussian, Logistic, LogNormal, Cauchy, Gumbel, Laplace, and Poisson; but see the doc-string for Distributions.fit for an up-to-date list.\n\nIf d is an instance, then sampling is from a truncated form of the supplied distribution d, the truncation bounds being r.lower and r.upper (the attributes r.origin and r.unit attributes are ignored). For discrete numeric ranges (T <: Integer) the samples are rounded.\n\nIf d is a type then a suitably truncated distribution is automatically generated using Distributions.fit(d, r).\n\nImportant. Values are generated with no regard to r.scale, except in the special case r.scale is a callable object f. In that case, f is applied to all values generated by rand as described above (prior to rounding, in the case of discrete numeric ranges).\n\nExamples\n\njulia> r = range(Char, :letter, values=collect(\"abc\"))\njulia> s = sampler(r, [0.1, 0.2, 0.7])\njulia> samples = rand(s, 1000);\njulia> StatsBase.countmap(samples)\nDict{Char,Int64} with 3 entries:\n 'a' => 107\n 'b' => 205\n 'c' => 688\n\njulia> r = range(Int, :k, lower=2, upper=6) # numeric but discrete\njulia> s = sampler(r, Normal)\njulia> samples = rand(s, 1000);\njulia> UnicodePlots.histogram(samples)\n ┌ ┐\n[2.0, 2.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 119\n[2.5, 3.0) ┤ 0\n[3.0, 3.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 296\n[3.5, 4.0) ┤ 0\n[4.0, 4.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 275\n[4.5, 5.0) ┤ 0\n[5.0, 5.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 221\n[5.5, 6.0) ┤ 0\n[6.0, 6.5) ┤▇▇▇▇▇▇▇▇▇▇▇ 89\n └ ┘\n\n\n\n\n\n","category":"method"},{"location":"distributions/#MLJBase.iterator-Tuple{Random.AbstractRNG, ParamRange, Vararg{Any}}","page":"Distributions","title":"MLJBase.iterator","text":"iterator([rng, ], r::NominalRange, [,n])\niterator([rng, ], r::NumericRange, n)\n\nReturn an iterator (currently a vector) for a ParamRange object r. In the first case iteration is over all values stored in the range (or just the first n, if n is specified). In the second case, the iteration is over approximately n ordered values, generated as follows:\n\n(i) First, exactly n values are generated between U and L, with a spacing determined by r.scale (uniform if scale=:linear) where U and L are given by the following table:\n\nr.lower r.upper L U\nfinite finite r.lower r.upper\n-Inf finite r.upper - 2r.unit r.upper\nfinite Inf r.lower r.lower + 2r.unit\n-Inf Inf r.origin - r.unit r.origin + r.unit\n\n(ii) If a callable f is provided as scale, then a uniform spacing is always applied in (i) but f is broadcast over the results. (Unlike ordinary scales, this alters the effective range of values generated, instead of just altering the spacing.)\n\n(iii) If r is a discrete numeric range (r isa NumericRange{<:Integer}) then the values are additionally rounded, with any duplicate values removed. Otherwise all the values are used (and there are exacltly n of them).\n\n(iv) Finally, if a random number generator rng is specified, then the values are returned in random order (sampling without replacement), and otherwise they are returned in numeric order, or in the order provided to the range constructor, in the case of a NominalRange.\n\n\n\n\n\n","category":"method"},{"location":"distributions/#MLJBase.scale-Tuple{NominalRange}","page":"Distributions","title":"MLJBase.scale","text":"scale(r::ParamRange)\n\nReturn the scale associated with a ParamRange object r. The possible return values are: :none (for a NominalRange), :linear, :log, :log10, :log2, or :custom (if r.scale is a callable object).\n\n\n\n\n\n","category":"method"},{"location":"distributions/#StatsAPI.fit-Union{Tuple{D}, Tuple{Type{D}, NumericRange}} where D<:Distributions.Distribution","page":"Distributions","title":"StatsAPI.fit","text":"Distributions.fit(D, r::MLJBase.NumericRange)\n\nFit and return a distribution d of type D to the one-dimensional range r.\n\nOnly types D in the table below are supported.\n\nThe distribution d is constructed in two stages. First, a distributon d0, characterized by the conditions in the second column of the table, is fit to r. Then d0 is truncated between r.lower and r.upper to obtain d.\n\nDistribution type D Characterization of d0\nArcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweight minimum(d) = r.lower, maximum(d) = r.upper\nNormal, Gamma, InverseGaussian, Logistic, LogNormal mean(d) = r.origin, std(d) = r.unit\nCauchy, Gumbel, Laplace, (Normal) Dist.location(d) = r.origin, Dist.scale(d) = r.unit\nPoisson Dist.mean(d) = r.unit\n\nHere Dist = Distributions.\n\n\n\n\n\n","category":"method"},{"location":"distributions/#Base.range-Union{Tuple{D}, Tuple{Union{Model, Type}, Union{Expr, Symbol}}} where D","page":"Distributions","title":"Base.range","text":"r = range(model, :hyper; values=nothing)\n\nDefine a one-dimensional NominalRange object for a field hyper of model. Note that r is not directly iterable but iterator(r) is.\n\nA nested hyperparameter is specified using dot notation. For example, :(atom.max_depth) specifies the max_depth hyperparameter of the submodel model.atom.\n\nr = range(model, :hyper; upper=nothing, lower=nothing,\n scale=nothing, values=nothing)\n\nAssuming values is not specified, define a one-dimensional NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n)is an iterator of length n. To generate random elements from r, instead apply rand methods to sampler(r). The supported scales are :linear,:log, :logminus, :log10, :log10minus, :log2, or a callable object.\n\nNote that r is not directly iterable, but iterator(r, n) is, for given resolution (length) n.\n\nBy default, the behaviour of the constructed object depends on the type of the value of the hyperparameter :hyper at model at the time of construction. To override this behaviour (for instance if model is not available) specify a type in place of model so the behaviour is determined by the value of the specified type.\n\nA nested hyperparameter is specified using dot notation (see above).\n\nIf scale is unspecified, it is set to :linear, :log, :log10minus, or :linear, according to whether the interval (lower, upper) is bounded, right-unbounded, left-unbounded, or doubly unbounded, respectively. Note upper=Inf and lower=-Inf are allowed.\n\nIf values is specified, the other keyword arguments are ignored and a NominalRange object is returned (see above).\n\nSee also: iterator, sampler\n\n\n\n\n\n","category":"method"},{"location":"distributions/#Utility-functions","page":"Distributions","title":"Utility functions","text":"","category":"section"},{"location":"distributions/","page":"Distributions","title":"Distributions","text":"Modules = [MLJBase]\nPages = [\"distributions.jl\"]","category":"page"},{"location":"utilities/#Utilities","page":"Utilities","title":"Utilities","text":"","category":"section"},{"location":"utilities/#Machines","page":"Utilities","title":"Machines","text":"","category":"section"},{"location":"utilities/","page":"Utilities","title":"Utilities","text":"Modules = [MLJBase]\nPages = [\"machines.jl\"]","category":"page"},{"location":"utilities/#Base.replace-Union{Tuple{C}, Tuple{Machine{<:Any, <:Any, C}, Vararg{Pair}}} where C","page":"Utilities","title":"Base.replace","text":"replace(mach::Machine, field1 => value1, field2 => value2, ...)\n\nPrivate method.\n\nReturn a shallow copy of the machine mach with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay, which is always a new instance Channel{Bool}(1).\n\nThe following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):\n\nreplace(mach, :args => (), :data => (), :data_resampled_data => (), :cache => nothing)\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.age-Tuple{Machine}","page":"Utilities","title":"MLJBase.age","text":"age(mach::Machine)\n\nReturn an integer representing the number of times mach has been trained or updated. For more detail, see the discussion of training logic at fit_only!.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.ancestors-Tuple{Machine}","page":"Utilities","title":"MLJBase.ancestors","text":"ancestors(mach::Machine; self=false)\n\nAll ancestors of mach, including mach if self=true.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.default_scitype_check_level","page":"Utilities","title":"MLJBase.default_scitype_check_level","text":"default_scitype_check_level()\n\nReturn the current global default value for scientific type checking when constructing machines.\n\ndefault_scitype_check_level(i::Integer)\n\nSet the global default value for scientific type checking to i.\n\nThe effect of the scitype_check_level option in calls of the form machine(model, data, scitype_check_level=...) is summarized below:\n\nscitype_check_level Inspect scitypes? If Unknown in scitypes If other scitype mismatch\n0 × \n1 (value at startup) ✓ warning\n2 ✓ warning warning\n3 ✓ warning error\n4 ✓ error error\n\nSee also machine\n\n\n\n\n\n","category":"function"},{"location":"utilities/#MLJBase.fit_only!-Union{Tuple{Machine{<:Any, <:Any, cache_data}}, Tuple{cache_data}} where cache_data","page":"Utilities","title":"MLJBase.fit_only!","text":"MLJBase.fit_only!(\n mach::Machine;\n rows=nothing,\n verbosity=1,\n force=false,\n composite=nothing,\n)\n\nWithout mutating any other machine on which it may depend, perform one of the following actions to the machine mach, using the data and model bound to it, and restricting the data to rows if specified:\n\nAb initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment mach.state.\nTraining update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements MLJBase.update). Increment mach.state.\nNo-operation. Leave existing learned parameters untouched. Do not increment mach.state.\n\nIf the model, model, bound to mach is a symbol, then instead perform the action using the true model given by getproperty(composite, model). See also machine.\n\nTraining action logic\n\nFor the action to be a no-operation, either mach.frozen == true or or none of the following apply:\n\n(i) mach has never been trained (mach.state == 0).\n(ii) force == true.\n(iii) The state of some other machine on which mach depends has changed since the last time mach was trained (ie, the last time mach.state was last incremented).\n(iv) The specified rows have changed since the last retraining and mach.model does not have Static type.\n(v) mach.model is a model and different from the last model used for training, but has the same type.\n(vi) mach.model is a model but has a type different from the last model used for training.\n(vii) mach.model is a symbol and (composite, mach.model) is different from the last model used for training, but has the same type.\n(viii) mach.model is a symbol and (composite, mach.model) has a different type from the last model used for training.\n\nIn any of the cases (i) - (iv), (vi), or (viii), mach is trained ab initio. If (v) or (vii) is true, then a training update is applied.\n\nTo freeze or unfreeze mach, use freeze!(mach) or thaw!(mach).\n\nImplementation details\n\nThe data to which a machine is bound is stored in mach.args. Each element of args is either a Node object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source node. In all cases, to obtain concrete data for actual training, each argument N is called, as in N() or N(rows=rows), and either MLJBase.fit (ab initio training) or MLJBase.update (training update) is dispatched on mach.model and this data. See the \"Adding models for general use\" section of the MLJ documentation for more on these lower-level training methods.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.freeze!-Tuple{Machine}","page":"Utilities","title":"MLJBase.freeze!","text":"freeze!(mach)\n\nFreeze the machine mach so that it will never be retrained (unless thawed).\n\nSee also thaw!.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.last_model-Tuple{Any}","page":"Utilities","title":"MLJBase.last_model","text":"last_model(mach::Machine)\n\nReturn the last model used to train the machine mach. This is a bona fide model, even if mach.model is a symbol.\n\nReturns nothing if mach has not been trained.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.machine","page":"Utilities","title":"MLJBase.machine","text":"machine(model, args...; cache=true, scitype_check_level=1)\n\nConstruct a Machine object binding a model, storing hyper-parameters of some machine learning algorithm, to some data, args. Calling fit! on a Machine instance mach stores outcomes of applying the algorithm in mach, which can be inspected using fitted_params(mach) (learned paramters) and report(mach) (other outcomes). This in turn enables generalization to new data using operations such as predict or transform:\n\nusing MLJModels\nX, y = make_regression()\n\nPCA = @load PCA pkg=MultivariateStats\nmodel = PCA()\nmach = machine(model, X)\nfit!(mach, rows=1:50)\ntransform(mach, selectrows(X, 51:100)) # or transform(mach, rows=51:100)\n\nDecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree\nmodel = DecisionTreeRegressor()\nmach = machine(model, X, y)\nfit!(mach, rows=1:50)\npredict(mach, selectrows(X, 51:100)) # or predict(mach, rows=51:100)\n\nSpecify cache=false to prioritize memory management over speed.\n\nWhen building a learning network, Node objects can be substituted for the concrete data but no type or dimension checks are applied.\n\nChecks on the types of training data\n\nA model articulates its data requirements using scientific types, i.e., using the scitype function instead of the typeof function.\n\nIf scitype_check_level > 0 then the scitype of each arg in args is computed, and this is compared with the scitypes expected by the model, unless args contains Unknown scitypes and scitype_check_level < 4, in which case no further action is taken. Whether warnings are issued or errors thrown depends the level. For details, see default_scitype_check_level, a method to inspect or change the default level (1 at startup).\n\nMachines with model placeholders\n\nA symbol can be substituted for a model in machine constructors to act as a placeholder for a model specified at training time. The symbol must be the field name for a struct whose corresponding value is a model, as shown in the following example:\n\nmutable struct MyComposite\n transformer\n classifier\nend\n\nmy_composite = MyComposite(Standardizer(), ConstantClassifier)\n\nX, y = make_blobs()\nmach = machine(:classifier, X, y)\nfit!(mach, composite=my_composite)\n\nThe last two lines are equivalent to\n\nmach = machine(ConstantClassifier(), X, y)\nfit!(mach)\n\nDelaying model specification is used when exporting learning networks as new stand-alone model types. See prefit and the MLJ documentation on learning networks.\n\nSee also fit!, default_scitype_check_level, MLJBase.save, serializable.\n\n\n\n\n\n","category":"function"},{"location":"utilities/#MLJBase.machine-Tuple{Union{IO, String}}","page":"Utilities","title":"MLJBase.machine","text":"machine(file::Union{String, IO})\n\nRebuild from a file a machine that has been serialized using the default Serialization module.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.report-Tuple{Machine}","page":"Utilities","title":"MLJBase.report","text":"report(mach)\n\nReturn the report for a machine mach that has been fit!, for example the coefficients in a linear model.\n\nThis is a named tuple and human-readable if possible.\n\nIf mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)\n\njulia> using MLJ\njulia> @load LinearBinaryClassifier pkg=GLM\njulia> X, y = @load_crabs;\njulia> pipe = Standardizer() |> LinearBinaryClassifier();\njulia> mach = machine(pipe, X, y) |> fit!;\n\njulia> report(mach).linear_binary_classifier\n(deviance = 3.8893386087844543e-7,\n dof_residual = 195.0,\n stderror = [18954.83496713119, 6502.845740757159, 48484.240246060406, 34971.131004997274, 20654.82322484894, 2111.1294584763386],\n vcov = [3.592857686311793e8 9.122732393971942e6 … -8.454645589364915e7 5.38856837634321e6; 9.122732393971942e6 4.228700272808351e7 … -4.978433790526467e7 -8.442545425533723e6; … ; -8.454645589364915e7 -4.978433790526467e7 … 4.2662172244975924e8 2.1799125705781363e7; 5.38856837634321e6 -8.442545425533723e6 … 2.1799125705781363e7 4.456867590446599e6],)\n\n\nSee also fitted_params\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.report_given_method-Tuple{Machine}","page":"Utilities","title":"MLJBase.report_given_method","text":"report_given_method(mach::Machine)\n\nSame as report(mach) but broken down by the method (fit, predict, etc) that contributed the report.\n\nA specialized method intended for learning network applications.\n\nThe return value is a dictionary keyed on the symbol representing the method (:fit, :predict, etc) and the values report contributed by that method.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.restore!","page":"Utilities","title":"MLJBase.restore!","text":"restore!(mach::Machine)\n\nRestore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach, one has mach.state=1. Intended for restoring deserialized machine objects to a useable form.\n\nFor an example see serializable.\n\n\n\n\n\n","category":"function"},{"location":"utilities/#MLJBase.serializable-Union{Tuple{Machine{<:Any, <:Any, C}}, Tuple{C}, Tuple{Machine{<:Any, <:Any, C}, Any}} where C","page":"Utilities","title":"MLJBase.serializable","text":"serializable(mach::Machine)\n\nReturns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.\n\nAny general purpose Julia serializer may be applied to the output of serializable (eg, JLSO, BSON, JLD) but you must call restore!(mach) on the deserialised object mach before using it. See the example below.\n\nIf using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save (or MLJ.save) method.\n\nA machine returned by serializable is characterized by the property mach.state == -1.\n\nExample using JLSO\n\nusing MLJ\nusing JLSO\nTree = @load DecisionTreeClassifier\ntree = Tree()\nX, y = @load_iris\nmach = fit!(machine(tree, X, y))\n\n# This machine can now be serialized\nsmach = serializable(mach)\nJLSO.save(\"machine.jlso\", :machine => smach)\n\n# Deserialize and restore learned parameters to useable form:\nloaded_mach = JLSO.load(\"machine.jlso\")[:machine]\nrestore!(loaded_mach)\n\npredict(loaded_mach, X)\npredict(mach, X)\n\nSee also restore!, MLJBase.save.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.thaw!-Tuple{Machine}","page":"Utilities","title":"MLJBase.thaw!","text":"thaw!(mach)\n\nUnfreeze the machine mach so that it can be retrained.\n\nSee also freeze!.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJModelInterface.feature_importances-Tuple{Machine}","page":"Utilities","title":"MLJModelInterface.feature_importances","text":"feature_importances(mach::Machine)\n\nReturn a list of feature => importance pairs for a fitted machine, mach, for supported models. Otherwise return nothing.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJModelInterface.fitted_params-Tuple{Machine}","page":"Utilities","title":"MLJModelInterface.fitted_params","text":"fitted_params(mach)\n\nReturn the learned parameters for a machine mach that has been fit!, for example the coefficients in a linear model.\n\nThis is a named tuple and human-readable if possible.\n\nIf mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)\n\njulia> using MLJ\njulia> @load LogisticClassifier pkg=MLJLinearModels\njulia> X, y = @load_crabs;\njulia> pipe = Standardizer() |> LogisticClassifier();\njulia> mach = machine(pipe, X, y) |> fit!;\n\njulia> fitted_params(mach).logistic_classifier\n(classes = CategoricalArrays.CategoricalValue{String,UInt32}[\"B\", \"O\"],\n coefs = Pair{Symbol,Float64}[:FL => 3.7095037897680405, :RW => 0.1135739140854546, :CL => -1.6036892745322038, :CW => -4.415667573486482, :BD => 3.238476051092471],\n intercept = 0.0883301599726305,)\n\nSee also report\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJModelInterface.save-Tuple{Union{IO, String}, Machine}","page":"Utilities","title":"MLJModelInterface.save","text":"MLJ.save(filename, mach::Machine)\nMLJ.save(io, mach::Machine)\n\nMLJBase.save(filename, mach::Machine)\nMLJBase.save(io, mach::Machine)\n\nSerialize the machine mach to a file with path filename, or to an input/output stream io (at least IOBuffer instances are supported) using the Serialization module.\n\nTo serialise using a different format, see serializable.\n\nMachines are deserialized using the machine constructor as shown in the example below.\n\nThe implementation of save for machines changed in MLJ 0.18 (MLJBase 0.20). You can only restore a machine saved using older versions of MLJ using an older version.\n\nExample\n\nusing MLJ\nTree = @load DecisionTreeClassifier\nX, y = @load_iris\nmach = fit!(machine(Tree(), X, y))\n\nMLJ.save(\"tree.jls\", mach)\nmach_predict_only = machine(\"tree.jls\")\npredict(mach_predict_only, X)\n\n# using a buffer:\nio = IOBuffer()\nMLJ.save(io, mach)\nseekstart(io)\npredict_only_mach = machine(io)\npredict(predict_only_mach, X)\n\nwarning: Only load files from trusted sources\nMaliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.\n\nSee also serializable, machine.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#StatsAPI.fit!-Tuple{Machine}","page":"Utilities","title":"StatsAPI.fit!","text":"fit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)\n\nFit the machine mach. In the case that mach has Node arguments, first train all other machines on which mach depends.\n\nTo attempt to fit a machine without touching any other machine, use fit_only!. For more on options and the the internal logic of fitting see fit_only!\n\n\n\n\n\n","category":"method"},{"location":"utilities/#Parameter-Inspection","page":"Utilities","title":"Parameter Inspection","text":"","category":"section"},{"location":"utilities/","page":"Utilities","title":"Utilities","text":"Modules = [MLJBase]\nPages = [\"parameter_inspection.jl\"]","category":"page"},{"location":"utilities/#Show","page":"Utilities","title":"Show","text":"","category":"section"},{"location":"utilities/","page":"Utilities","title":"Utilities","text":"Modules = [MLJBase]\nPages = [\"show.jl\"]","category":"page"},{"location":"utilities/#MLJBase._recursive_show-Tuple{IO, MLJType, Any, Any}","page":"Utilities","title":"MLJBase._recursive_show","text":"_recursive_show(stream, object, current_depth, depth)\n\nPrivate method.\n\nGenerate a table of the properties of the MLJType object, dislaying each property value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:\n\nIf f is itself a MLJType object, then its short form is shown and _recursive_show generates as separate table for each of its properties (and so on, up to a depth of argument depth).\nOtherwise f is displayed as \"(omitted T)\" where T = typeof(f), unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME\"plain/text\") form of f is shown. To override this behaviour, overload the _show method for the type in question.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.abbreviated-Tuple{Any}","page":"Utilities","title":"MLJBase.abbreviated","text":"to display abbreviated versions of integers\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.color_off-Tuple{}","page":"Utilities","title":"MLJBase.color_off","text":"color_off()\n\nSuppress color and bold output at the REPL for displaying MLJ objects.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.color_on-Tuple{}","page":"Utilities","title":"MLJBase.color_on","text":"color_on()\n\nEnable color and bold output at the REPL, for enhanced display of MLJ objects.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.handle-Tuple{Any}","page":"Utilities","title":"MLJBase.handle","text":"return abbreviated object id (as string) or it's registered handle (as string) if this exists\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.@constant-Tuple{Any}","page":"Utilities","title":"MLJBase.@constant","text":"@constant x = value\n\nPrivate method (used in testing).\n\nEquivalent to const x = value but registers the binding thus:\n\nMLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x\n\nRegistered objects get displayed using the variable name to which it was bound in calls to show(x), etc.\n\nWARNING: As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.\n\n\n\n\n\n","category":"macro"},{"location":"utilities/#MLJBase.@more-Tuple{}","page":"Utilities","title":"MLJBase.@more","text":"@more\n\nEntered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all properties of the last REPL value.\n\n\n\n\n\n","category":"macro"},{"location":"utilities/#Utility-functions","page":"Utilities","title":"Utility functions","text":"","category":"section"},{"location":"utilities/","page":"Utilities","title":"Utilities","text":"Modules = [MLJBase]\nPages = [\"utilities.jl\"]","category":"page"},{"location":"utilities/#MLJBase._permute_rows-Tuple{AbstractVecOrMat, Vector{Int64}}","page":"Utilities","title":"MLJBase._permute_rows","text":"_permute_rows(obj, perm)\n\nInternal function to return a vector or matrix with permuted rows given the permutation perm.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.available_name-Tuple{Any, Any}","page":"Utilities","title":"MLJBase.available_name","text":"available_name(modl::Module, name::Symbol)\n\nFunction to replace, if necessary, a given name with a modified one that ensures it is not the name of any existing object in the global scope of modl. Modifications are created with numerical suffixes.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.check_same_nrows-Tuple{Any, Any}","page":"Utilities","title":"MLJBase.check_same_nrows","text":"check_same_nrows(X, Y)\n\nInternal function to check two objects, each a vector or a matrix, have the same number of rows.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.chunks-Tuple{AbstractRange, Integer}","page":"Utilities","title":"MLJBase.chunks","text":"chunks(range, n)\n\nSplit an AbstractRange into n subranges of approximately equal length.\n\nExample\n\njulia> collect(chunks(1:5, 2))\n2-element Array{UnitRange{Int64},1}:\n 1:3\n 4:5\n\nPrivate method\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.flat_values-Tuple{NamedTuple}","page":"Utilities","title":"MLJBase.flat_values","text":"flat_values(t::NamedTuple)\n\nView a nested named tuple t as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.\n\njulia> t = (X = (x = 1, y = 2), Y = 3);\njulia> flat_values(t)\n(1, 2, 3)\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.generate_name!-Tuple{DataType, Any}","page":"Utilities","title":"MLJBase.generate_name!","text":"generate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)\n\nGiven a type M (e.g., MyEvenInteger{N}) return a symbolic, snake-case, representation of the type name (such as my_even_integer). The symbol is pushed to existing_names, which must be an AbstractVector to which a Symbol can be pushed.\n\nIf the snake-case representation already exists in existing_names a suitable integer is appended to the name.\n\nIf only is specified, then the operation is restricted to those M for which M isa only. In all other cases the symbolic name is generated using substitute as the base symbol.\n\njulia> existing_names = [];\njulia> generate_name!(Vector{Int}, existing_names)\n:vector\n\njulia> generate_name!(Vector{Int}, existing_names)\n:vector2\n\njulia> generate_name!(AbstractFloat, existing_names)\n:abstract_float\n\njulia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)\n:not_array\n\njulia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)\n:not_array2\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.guess_model_target_observation_scitype-Tuple{Any}","page":"Utilities","title":"MLJBase.guess_model_target_observation_scitype","text":"guess_model_targetobservation_scitype(model)\n\nPrivate method\n\nTry to infer a lowest upper bound on the scitype of target observations acceptable to model, by inspecting target_scitype(model). Return Unknown if unable to draw reliable inferrence.\n\nThe observation scitype for a table is here understood as the scitype of a row converted to a vector.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.guess_observation_scitype-Tuple{Any}","page":"Utilities","title":"MLJBase.guess_observation_scitype","text":"guess_observation_scitype(y)\n\nPrivate method.\n\nIf y is an AbstractArray, return the scitype of y[:, :, ..., :, 1]. If y is a table, return the scitype of the first row, converted to a vector, unless this row has missing elements, in which case return Unknown.\n\nIn all other cases, Unknown.\n\njulia> guess_observation_scitype([missing, 1, 2, 3])\nUnion{Missing, Count}\n\njulia> guess_observation_scitype(rand(3, 2))\nAbstractVector{Continuous}\n\njulia> guess_observation_scitype((x=rand(3), y=rand(Bool, 3)))\nAbstractVector{Union{Continuous, Count}}\n\njulia> guess_observation_scitype((x=[missing, 1, 2], y=[1, 2, 3]))\nUnknown\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.init_rng-Tuple{Any}","page":"Utilities","title":"MLJBase.init_rng","text":"init_rng(rng)\n\nCreate an AbstractRNG from rng. If rng is a non-negative Integer, it returns a MersenneTwister random number generator seeded with rng; If rng is an AbstractRNG object it returns rng, otherwise it throws an error.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.observation-Tuple{Type}","page":"Utilities","title":"MLJBase.observation","text":"observation(S)\n\nPrivate method.\n\nTries to infer the per-observation scitype from the scitype of S, when S is known to be the scitype of some container with multiple observations; here we view the scitype for one row of a table to be the scitype of the row converted to a vector. Return Unknown if unable to draw reliable inferrence.\n\nThe observation scitype for a table is here understood as the scitype of a row converted to a vector.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.prepend-Tuple{Symbol, Nothing}","page":"Utilities","title":"MLJBase.prepend","text":"MLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})\n\nFor prepending symbols in expressions like :(y.w) and :(x1.x2.x3).\n\njulia> prepend(:x, :y)\n:(x.y)\n\njulia> prepend(:x, :(y.z))\n:(x.y.z)\n\njulia> prepend(:w, ans)\n:(w.x.y.z)\n\nIf the second argument is nothing, then nothing is returned.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.recursive_getproperty-Tuple{Any, Symbol}","page":"Utilities","title":"MLJBase.recursive_getproperty","text":"recursive_getproperty(object, nested_name::Expr)\n\nCall getproperty recursively on object to extract the value of some nested property, as in the following example:\n\njulia> object = (X = (x = 1, y = 2), Y = 3);\njulia> recursive_getproperty(object, :(X.y))\n2\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.recursive_setproperty!-Tuple{Any, Symbol, Any}","page":"Utilities","title":"MLJBase.recursive_setproperty!","text":"recursively_setproperty!(object, nested_name::Expr, value)\n\nSet a nested property of an object to value, as in the following example:\n\njulia> mutable struct Foo\n X\n Y\n end\n\njulia> mutable struct Bar\n x\n y\n end\n\njulia> object = Foo(Bar(1, 2), 3)\nFoo(Bar(1, 2), 3)\n\njulia> recursively_setproperty!(object, :(X.y), 42)\n42\n\njulia> object\nFoo(Bar(1, 42), 3)\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.sequence_string-Union{Tuple{Itr}, Tuple{Itr, Any}} where Itr","page":"Utilities","title":"MLJBase.sequence_string","text":"sequence_string(itr, n=3)\n\nReturn a \"sequence\" string from the first n elements generated by itr.\n\njulia> MLJBase.sequence_string(1:10, 4)\n\"1, 2, 3, 4, ...\"\n\nPrivate method.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.shuffle_rows-Tuple{AbstractVecOrMat, AbstractVecOrMat}","page":"Utilities","title":"MLJBase.shuffle_rows","text":"shuffle_rows(X::AbstractVecOrMat,\n Y::AbstractVecOrMat;\n rng::AbstractRNG=Random.GLOBAL_RNG)\n\nReturn row-shuffled vectors or matrices using a random permutation of X and Y. An optional random number generator can be specified using the rng argument.\n\n\n\n\n\n","category":"method"},{"location":"utilities/#MLJBase.unwind-Tuple","page":"Utilities","title":"MLJBase.unwind","text":"unwind(iterators...)\n\nRepresent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.\n\nExample\n\njulia> iterators = ([1, 2], [\"a\",\"b\"], [\"x\", \"y\", \"z\"]);\njulia> MLJTuning.unwind(iterators...)\n12×3 Array{Any,2}:\n 1 \"a\" \"x\"\n 2 \"a\" \"x\"\n 1 \"b\" \"x\"\n 2 \"b\" \"x\"\n 1 \"a\" \"y\"\n 2 \"a\" \"y\"\n 1 \"b\" \"y\"\n 2 \"b\" \"y\"\n 1 \"a\" \"z\"\n 2 \"a\" \"z\"\n 1 \"b\" \"z\"\n 2 \"b\" \"z\"\n\n\n\n\n\n","category":"method"},{"location":"resampling/#Resampling","page":"Resampling","title":"Resampling","text":"","category":"section"},{"location":"resampling/","page":"Resampling","title":"Resampling","text":"Modules = [MLJBase]\nPages = [\"resampling.jl\"]","category":"page"},{"location":"resampling/#MLJBase.CV","page":"Resampling","title":"MLJBase.CV","text":"cv = CV(; nfolds=6, shuffle=nothing, rng=nothing)\n\nCross-validation resampling strategy, for use in evaluate!, evaluate and tuning.\n\ntrain_test_pairs(cv, rows)\n\nReturns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector. With no row pre-shuffling, the order of rows is preserved, in the sense that rows coincides precisely with the concatenation of the test vectors, in the order they are generated. The first r test vectors have length n + 1, where n, r = divrem(length(rows), nfolds), and the remaining test vectors have length n.\n\nPre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the CV keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.\n\nIf rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.CompactPerformanceEvaluation","page":"Resampling","title":"MLJBase.CompactPerformanceEvaluation","text":"CompactPerformanceEvaluation <: AbstractPerformanceEvaluation\n\nType of object returned by evaluate (for models plus data) or evaluate! (for machines) when called with the option compact = true. Such objects have the same structure as the PerformanceEvaluation objects returned by default, except that the following fields are omitted to save memory: fitted_params_per_fold, report_per_fold, train_test_rows.\n\nFor more on the remaining fields, see PerformanceEvaluation.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.Holdout","page":"Resampling","title":"MLJBase.Holdout","text":"holdout = Holdout(; fraction_train=0.7, shuffle=nothing, rng=nothing)\n\nInstantiate a Holdout resampling strategy, for use in evaluate!, evaluate and in tuning.\n\ntrain_test_pairs(holdout, rows)\n\nReturns the pair [(train, test)], where train and test are vectors such that rows=vcat(train, test) and length(train)/length(rows) is approximatey equal to fraction_train`.\n\nPre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the Holdout keyword constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.\n\nIf rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is specified.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.InSample","page":"Resampling","title":"MLJBase.InSample","text":"in_sample = InSample()\n\nInstantiate an InSample resampling strategy, for use in evaluate!, evaluate and in tuning. In this strategy the train and test sets are the same, and consist of all observations specified by the rows keyword argument. If rows is not specified, all supplied rows are used.\n\nExample\n\nusing MLJBase, MLJModels\n\nX, y = make_blobs() # a table and a vector\nmodel = ConstantClassifier()\ntrain, test = partition(eachindex(y), 0.7) # train:test = 70:30\n\nCompute in-sample (training) loss:\n\nevaluate(model, X, y, resampling=InSample(), rows=train, measure=brier_loss)\n\nCompute the out-of-sample loss:\n\nevaluate(model, X, y, resampling=[(train, test),], measure=brier_loss)\n\nOr equivalently:\n\nevaluate(model, X, y, resampling=Holdout(fraction_train=0.7), measure=brier_loss)\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.PerformanceEvaluation","page":"Resampling","title":"MLJBase.PerformanceEvaluation","text":"PerformanceEvaluation <: AbstractPerformanceEvaluation\n\nType of object returned by evaluate (for models plus data) or evaluate! (for machines). Such objects encode estimates of the performance (generalization error) of a supervised model or outlier detection model, and store other information ancillary to the computation.\n\nIf evaluate or evaluate! is called with the compact=true option, then a CompactPerformanceEvaluation object is returned instead.\n\nWhen evaluate/evaluate! is called, a number of train/test pairs (\"folds\") of row indices are generated, according to the options provided, which are discussed in the evaluate! doc-string. Rows correspond to observations. The generated train/test pairs are recorded in the train_test_rows field of the PerformanceEvaluation struct, and the corresponding estimates, aggregated over all train/test pairs, are recorded in measurement, a vector with one entry for each measure (metric) recorded in measure.\n\nWhen displayed, a PerformanceEvaluation object includes a value under the heading 1.96*SE, derived from the standard error of the per_fold entries. This value is suitable for constructing a formal 95% confidence interval for the given measurement. Such intervals should be interpreted with caution. See, for example, Bates et al. (2021).\n\nFields\n\nThese fields are part of the public API of the PerformanceEvaluation struct.\n\nmodel: model used to create the performance evaluation. In the case a tuning model, this is the best model found.\nmeasure: vector of measures (metrics) used to evaluate performance\nmeasurement: vector of measurements - one for each element of measure - aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measure m is StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())\noperation (e.g., predict_mode): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: predict, predict_mean, predict_mode, predict_median, or predict_joint.\nper_fold: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.\nper_observation: a vector of vectors of vectors containing individual per-observation measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for the ith observation in the fth test fold, evaluated using the mth measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measure measure is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed with the per_observation=false option, then e_per_observation is a vector of missings.\nfitted_params_per_fold: a vector containing fitted params(mach) for each machine mach trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.\nreport_per_fold: a vector containing report(mach) for each machine mach training in resampling - one machine per train/test pair.\ntrain_test_rows: a vector of tuples, each of the form (train, test), where train and test are vectors of row (observation) indices for training and evaluation respectively.\nresampling: the user-specified resampling strategy to generate the train/test pairs (or literal train/test pairs if that was directly specified).\nrepeats: the number of times the resampling strategy was repeated.\n\nSee also CompactPerformanceEvaluation.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.Resampler","page":"Resampling","title":"MLJBase.Resampler","text":"resampler = Resampler(\n model=ConstantRegressor(),\n resampling=CV(),\n measure=nothing,\n weights=nothing,\n class_weights=nothing\n operation=predict,\n repeats = 1,\n acceleration=default_resource(),\n check_measure=true,\n per_observation=true,\n logger=nothing,\n compact=false,\n)\n\nResampling model wrapper, used internally by the fit method of TunedModel instances and IteratedModel instances. See `evaluate! for options. Not intended for use by general user, who will ordinarily use evaluate! directly.\n\nGiven a machine mach = machine(resampler, args...) one obtains a performance evaluation of the specified model, performed according to the prescribed resampling strategy and other parameters, using data args..., by calling fit!(mach) followed by evaluate(mach).\n\nOn subsequent calls to fit!(mach) new train/test pairs of row indices are only regenerated if resampling, repeats or cache fields of resampler have changed. The evolution of an RNG field of resampler does not constitute a change (== for MLJType objects is not sensitive to such changes; see is_same_except).\n\nIf there is single train/test pair, then warm-restart behavior of the wrapped model resampler.model will extend to warm-restart behaviour of the wrapper resampler, with respect to mutations of the wrapped model.\n\nThe sample weights are passed to the specified performance measures that support weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.\n\nThe sample class_weights are passed to the specified performance measures that support per-class weights for evaluation. These weights are not to be confused with any weights bound to a Resampler instance in a machine, used for training the wrapped model when supported.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.StratifiedCV","page":"Resampling","title":"MLJBase.StratifiedCV","text":"stratified_cv = StratifiedCV(; nfolds=6,\n shuffle=false,\n rng=Random.GLOBAL_RNG)\n\nStratified cross-validation resampling strategy, for use in evaluate!, evaluate and in tuning. Applies only to classification problems (OrderedFactor or Multiclass targets).\n\ntrain_test_pairs(stratified_cv, rows, y)\n\nReturns an nfolds-length iterator of (train, test) pairs of vectors (row indices) where each train and test is a sub-vector of rows. The test vectors are mutually exclusive and exhaust rows. Each train vector is the complement of the corresponding test vector.\n\nUnlike regular cross-validation, the distribution of the levels of the target y corresponding to each train and test is constrained, as far as possible, to replicate that of y[rows] as a whole.\n\nThe stratified train_test_pairs algorithm is invariant to label renaming. For example, if you run replace!(y, 'a' => 'b', 'b' => 'a') and then re-run train_test_pairs, the returned (train, test) pairs will be the same.\n\nPre-shuffling of rows is controlled by rng and shuffle. If rng is an integer, then the StratifedCV keywod constructor resets it to MersenneTwister(rng). Otherwise some AbstractRNG object is expected.\n\nIf rng is left unspecified, rng is reset to Random.GLOBAL_RNG, in which case rows are only pre-shuffled if shuffle=true is explicitly specified.\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.TimeSeriesCV","page":"Resampling","title":"MLJBase.TimeSeriesCV","text":"tscv = TimeSeriesCV(; nfolds=4)\n\nCross-validation resampling strategy, for use in evaluate!, evaluate and tuning, when observations are chronological and not expected to be independent.\n\ntrain_test_pairs(tscv, rows)\n\nReturns an nfolds-length iterator of (train, test) pairs of vectors (row indices), where each train and test is a sub-vector of rows. The rows are partitioned sequentially into nfolds + 1 approximately equal length partitions, where the first partition is the first train set, and the second partition is the first test set. The second train set consists of the first two partitions, and the second test set consists of the third partition, and so on for each fold.\n\nThe first partition (which is the first train set) has length n + r, where n, r = divrem(length(rows), nfolds + 1), and the remaining partitions (all of the test folds) have length n.\n\nExamples\n\njulia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)\n3-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:\n (1:4, 5:6)\n (1:6, 7:8)\n (1:8, 9:10)\n\njulia> model = (@load RidgeRegressor pkg=MultivariateStats verbosity=0)();\n\njulia> data = @load_sunspots;\n\njulia> X = (lag1 = data.sunspot_number[2:end-1],\n lag2 = data.sunspot_number[1:end-2]);\n\njulia> y = data.sunspot_number[3:end];\n\njulia> tscv = TimeSeriesCV(nfolds=3);\n\njulia> evaluate(model, X, y, resampling=tscv, measure=rmse, verbosity=0)\n┌───────────────────────────┬───────────────┬────────────────────┐\n│ _.measure │ _.measurement │ _.per_fold │\n├───────────────────────────┼───────────────┼────────────────────┤\n│ RootMeanSquaredError @753 │ 21.7 │ [25.4, 16.3, 22.4] │\n└───────────────────────────┴───────────────┴────────────────────┘\n_.per_observation = [missing]\n_.fitted_params_per_fold = [ … ]\n_.report_per_fold = [ … ]\n_.train_test_rows = [ … ]\n\n\n\n\n\n","category":"type"},{"location":"resampling/#MLJBase.evaluate!-Tuple{Machine{<:Union{Annotator, Supervised}}}","page":"Resampling","title":"MLJBase.evaluate!","text":"evaluate!(mach; resampling=CV(), measure=nothing, options...)\n\nEstimate the performance of a machine mach wrapping a supervised model in data, using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector. Returns a PerformanceEvaluation object.\n\nAvailable resampling strategies are CV, Holdout, InSample, StratifiedCV and TimeSeriesCV. If resampling is not an instance of one of these, then a vector of tuples of the form (train_rows, test_rows) is expected. For example, setting\n\nresampling = [((1:100), (101:200)),\n ((101:200), (1:100))]\n\ngives two-fold cross-validation using the first 200 rows of data.\n\nAny measure conforming to the StatisticalMeasuresBase.jl API can be provided, assuming it can consume multiple observations.\n\nAlthough evaluate! is mutating, mach.model and mach.args are not mutated.\n\nAdditional keyword options\n\nrows - vector of observation indices from which both train and test folds are constructed (default is all observations)\noperation/operations=nothing - One of predict, predict_mean, predict_mode, predict_median, or predict_joint, or a vector of these of the same length as measure/measures. Automatically inferred if left unspecified. For example, predict_mode will be used for a Multiclass target, if model is a probabilistic predictor, but measure is expects literal (point) target predictions. Operations actually applied can be inspected from the operation field of the object returned.\nweights - per-sample Real weights for measures that support them (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).\nclass_weights - dictionary of Real per-class weights for use with measures that support these, in classification problems (not to be confused with weights used in training, such as the w in mach = machine(model, X, y, w)).\nrepeats::Int=1: set to a higher value for repeated (Monte Carlo) resampling. For example, if repeats = 10, then resampling = CV(nfolds=5, shuffle=true), generates a total of 50 (train, test) pairs for evaluation and subsequent aggregation.\nacceleration=CPU1(): acceleration/parallelization option; can be any instance of CPU1, (single-threaded computation), CPUThreads (multi-threaded computation) or CPUProcesses (multi-process computation); default is default_resource(). These types are owned by ComputationalResources.jl.\nforce=false: set to true to force cold-restart of each training event\nverbosity::Int=1 logging level; can be negative\ncheck_measure=true: whether to screen measures for possible incompatibility with the model. Will not catch all incompatibilities.\nper_observation=true: whether to calculate estimates for individual observations; if false the per_observation field of the returned object is populated with missings. Setting to false may reduce compute time and allocations.\nlogger - a logger object (see MLJBase.log_evaluation)\ncompact=false - if true, the returned evaluation object excludes these fields: fitted_params_per_fold, report_per_fold, train_test_rows.\n\nSee also evaluate, PerformanceEvaluation, CompactPerformanceEvaluation.\n\n\n\n\n\n","category":"method"},{"location":"resampling/#MLJBase.log_evaluation-Tuple{Any, Any}","page":"Resampling","title":"MLJBase.log_evaluation","text":"log_evaluation(logger, performance_evaluation)\n\nLog a performance evaluation to logger, an object specific to some logging platform, such as mlflow. If logger=nothing then no logging is performed. The method is called at the end of every call to evaluate/evaluate! using the logger provided by the logger keyword argument.\n\nImplementations for new logging platforms\n\nJulia interfaces to workflow logging platforms, such as mlflow (provided by the MLFlowClient.jl interface) should overload log_evaluation(logger::LoggerType, performance_evaluation), where LoggerType is a platform-specific type for logger objects. For an example, see the implementation provided by the MLJFlow.jl package.\n\n\n\n\n\n","category":"method"},{"location":"resampling/#MLJModelInterface.evaluate-Tuple{Union{Annotator, Supervised}, Vararg{Any}}","page":"Resampling","title":"MLJModelInterface.evaluate","text":"evaluate(model, data...; cache=true, options...)\n\nEquivalent to evaluate!(machine(model, data..., cache=cache); options...). See the machine version evaluate! for the complete list of options.\n\nReturns a PerformanceEvaluation object.\n\nSee also evaluate!.\n\n\n\n\n\n","category":"method"},{"location":"composition/#Composition","page":"Composition","title":"Composition","text":"","category":"section"},{"location":"composition/#Composites","page":"Composition","title":"Composites","text":"","category":"section"},{"location":"composition/","page":"Composition","title":"Composition","text":"Modules = [MLJBase]\nPages = [\"composition/composites.jl\"]","category":"page"},{"location":"composition/#Networks","page":"Composition","title":"Networks","text":"","category":"section"},{"location":"composition/","page":"Composition","title":"Composition","text":"Modules = [MLJBase]\nPages = [\"composition/networks.jl\"]","category":"page"},{"location":"composition/#Pipelines","page":"Composition","title":"Pipelines","text":"","category":"section"},{"location":"composition/","page":"Composition","title":"Composition","text":"Modules = [MLJBase]\nPages = [\"composition/pipeline_static.jl\", \"composition/pipelines.jl\"]","category":"page"},{"location":"#MLJBase.jl","page":"Home","title":"MLJBase.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"These docs are bare-bones and auto-generated. Complete MLJ documentation is here.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For MLJBase-specific developer information, see also the README.md file.","category":"page"},{"location":"datasets/#Datasets","page":"Datasets","title":"Datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Pages = [\"data/datasets_synthetic.jl\"]","category":"page"},{"location":"datasets/#Standard-datasets","page":"Datasets","title":"Standard datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"To add a new dataset assuming it has a header and is, at path data/newdataset.csv","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Start by loading it with CSV:","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"fpath = joinpath(\"datadir\", \"newdataset.csv\")\ndata = CSV.read(fpath, copycols=true,\n categorical=true)","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Load it with DelimitedFiles and Tables","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"data_raw, data_header = readdlm(fpath, ',', header=true)\ndata_table = Tables.table(data_raw; header=Symbol.(vec(data_header)))","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Retrieve the conversions:","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"for (n, st) in zip(names(data), scitype_union.(eachcol(data)))\n println(\":$n=>$st,\")\nend","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Copy and paste the result in a coerce","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"data_table = coerce(data_table, ...)","category":"page"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MLJBase]\nPages = [\"data/datasets.jl\"]","category":"page"},{"location":"datasets/#MLJBase.load_dataset-Tuple{String, Tuple}","page":"Datasets","title":"MLJBase.load_dataset","text":"load_dataset(fpath, coercions)\n\nLoad one of standard dataset like Boston etc assuming the file is a comma separated file with a header.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.load_sunspots-Tuple{}","page":"Datasets","title":"MLJBase.load_sunspots","text":"Load a well-known sunspot time series (table with one column). [https://www.sws.bom.gov.au/Educational/2/3/6]](https://www.sws.bom.gov.au/Educational/2/3/6)\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.@load_ames-Tuple{}","page":"Datasets","title":"MLJBase.@load_ames","text":"Load the full version of the well-known Ames Housing task.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_boston-Tuple{}","page":"Datasets","title":"MLJBase.@load_boston","text":"Load a well-known public regression dataset with Continuous features.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_crabs-Tuple{}","page":"Datasets","title":"MLJBase.@load_crabs","text":"Load a well-known crab classification dataset with nominal features.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_iris-Tuple{}","page":"Datasets","title":"MLJBase.@load_iris","text":"Load a well-known public classification task with nominal features.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_reduced_ames-Tuple{}","page":"Datasets","title":"MLJBase.@load_reduced_ames","text":"Load a reduced version of the well-known Ames Housing task\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_smarket-Tuple{}","page":"Datasets","title":"MLJBase.@load_smarket","text":"Load S&P Stock Market dataset, as used in (An Introduction to Statistical Learning with applications in R)https://rdrr.io/cran/ISLR/man/Smarket.html, by Witten et al (2013), Springer-Verlag, New York.\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#MLJBase.@load_sunspots-Tuple{}","page":"Datasets","title":"MLJBase.@load_sunspots","text":"Load a well-known sunspot time series (single table with one column).\n\n\n\n\n\n","category":"macro"},{"location":"datasets/#Synthetic-datasets","page":"Datasets","title":"Synthetic datasets","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MLJBase]\nPages = [\"data/datasets_synthetic.jl\"]","category":"page"},{"location":"datasets/#MLJBase.augment_X-Tuple{Matrix{<:Real}, Bool}","page":"Datasets","title":"MLJBase.augment_X","text":"augment_X(X, fit_intercept)\n\nGiven a matrix X, append a column of ones if fit_intercept is true. See make_regression.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.finalize_Xy-NTuple{6, Any}","page":"Datasets","title":"MLJBase.finalize_Xy","text":"finalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)\n\nInternal function to finalize the make_* functions.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.make_blobs","page":"Datasets","title":"MLJBase.make_blobs","text":"X, y = make_blobs(n=100, p=2; kwargs...)\n\nGenerate Gaussian blobs for clustering and classification problems.\n\nReturn value\n\nBy default, a table X with p columns (features) and n rows (observations), together with a corresponding vector of n Multiclass target observations y, indicating blob membership.\n\nKeyword arguments\n\nshuffle=true: whether to shuffle the resulting points,\ncenters=3: either a number of centers or a c x p matrix with c pre-determined centers,\ncluster_std=1.0: the standard deviation(s) of each blob,\ncenter_box=(-10. => 10.): the limits of the p-dimensional cube within which the cluster centers are drawn if they are not provided,\neltype=Float64: machine type of points (any subtype of AbstractFloat).\nrng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).\nas_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type. \n\nExample\n\nX, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MLJBase.make_circles","page":"Datasets","title":"MLJBase.make_circles","text":"X, y = make_circles(n=100; kwargs...)\n\nGenerate n labeled points close to two concentric circles for classification and clustering models.\n\nReturn value\n\nBy default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the smaller or larger circle, respectively.\n\nKeyword arguments\n\nshuffle=true: whether to shuffle the resulting points,\nnoise=0: standard deviation of the Gaussian noise added to the data,\nfactor=0.8: ratio of the smaller radius over the larger one,\n\neltype=Float64: machine type of points (any subtype of AbstractFloat).\nrng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).\nas_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type. \n\nExample\n\nX, y = make_circles(100; noise=0.5, factor=0.3)\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MLJBase.make_moons","page":"Datasets","title":"MLJBase.make_moons","text":"make_moons(n::Int=100; kwargs...)\n\nGenerates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.\n\nReturn value\n\nBy default, a table X with 2 columns and n rows (observations), together with a corresponding vector of n Multiclass target observations y. The target is either 0 or 1, corresponding to membership to the left or right semi-circle.\n\nKeyword arguments\n\nshuffle=true: whether to shuffle the resulting points,\nnoise=0.1: standard deviation of the Gaussian noise added to the data,\nxshift=1.0: horizontal translation of the second center with respect to the first one.\nyshift=0.3: vertical translation of the second center with respect to the first one. \neltype=Float64: machine type of points (any subtype of AbstractFloat).\nrng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).\nas_table=true: whether to return the points as a table (true) or a matrix (false). If false the target y has integer element type. \n\nExample\n\nX, y = make_moons(100; noise=0.5)\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MLJBase.make_regression","page":"Datasets","title":"MLJBase.make_regression","text":"make_regression(n, p; kwargs...)\n\nGenerate Gaussian input features and a linear response with Gaussian noise, for use with regression models.\n\nReturn value\n\nBy default, a tuple (X, y) where table X has p columns and n rows (observations), together with a corresponding vector of n Continuous target observations y.\n\nKeywords\n\nintercept=true: Whether to generate data from a model with intercept.\nn_targets=1: Number of columns in the target.\nsparse=0: Proportion of the generating weight vector that is sparse.\nnoise=0.1: Standard deviation of the Gaussian noise added to the response (target).\noutliers=0: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary is false.)\nas_table=true: Whether X (and y, if n_targets > 1) should be a table or a matrix.\neltype=Float64: Element type for X and y. Must subtype AbstractFloat.\nbinary=false: Whether the target should be binarized (via a sigmoid).\neltype=Float64: machine type of points (any subtype of AbstractFloat).\nrng=Random.GLOBAL_RNG: any AbstractRNG object, or integer to seed a MersenneTwister (for reproducibility).\nas_table=true: whether to return the points as a table (true) or a matrix (false). \n\nExample\n\nX, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)\n\n\n\n\n\n","category":"function"},{"location":"datasets/#MLJBase.outlify!-Tuple{Any, Any, Any}","page":"Datasets","title":"MLJBase.outlify!","text":"Add outliers to portion s of vector.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.runif_ab-NTuple{5, Any}","page":"Datasets","title":"MLJBase.runif_ab","text":"runif_ab(rng, n, p, a, b)\n\nInternal function to generate n points in [a, b]ᵖ uniformly at random.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.sigmoid-Tuple{Float64}","page":"Datasets","title":"MLJBase.sigmoid","text":"sigmoid(x)\n\nReturn the sigmoid computed in a numerically stable way: σ(x) = 1(1+exp(-x))\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.sparsify!-Tuple{Any, Any, Any}","page":"Datasets","title":"MLJBase.sparsify!","text":"sparsify!(rng, θ, s)\n\nMake portion s of vector θ exactly 0.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#Utility-functions","page":"Datasets","title":"Utility functions","text":"","category":"section"},{"location":"datasets/","page":"Datasets","title":"Datasets","text":"Modules = [MLJBase]\nPages = [\"data/data.jl\"]","category":"page"},{"location":"datasets/#MLJBase.complement-Tuple{Any, Any}","page":"Datasets","title":"MLJBase.complement","text":"complement(folds, i)\n\nThe complement of the ith fold of folds in the concatenation of all elements of folds. Here folds is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.\n\ncomplement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.corestrict-Union{Tuple{N}, Tuple{Tuple{Vararg{T, N}} where T, Any}} where N","page":"Datasets","title":"MLJBase.corestrict","text":"corestrict(X, folds, i)\n\nThe restriction of X, a vector, matrix or table, to the complement of the ith fold of folds, where folds is a tuple of vectors of row indices.\n\nThe method is curried, so that corestrict(folds, i) is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i).\n\nExample\n\nfolds = ([1, 2], [3, 4, 5], [6,])\ncorestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.partition-Tuple{Any, Vararg{Real}}","page":"Datasets","title":"MLJBase.partition","text":"partition(X, fractions...;\n shuffle=nothing,\n rng=Random.GLOBAL_RNG,\n stratify=nothing,\n multi=false)\n\nSplits the vector, matrix or table X into a tuple of objects of the same type, whose vertical concatenation is X. The number of rows in each component of the return value is determined by the corresponding fractions of length(nrows(X)), where valid fractions are floats between 0 and 1 whose sum is less than one. The last fraction is not provided, as it is inferred from the preceding ones.\n\nFor synchronized partitioning of multiple objects, use the multi=true option.\n\njulia> partition(1:1000, 0.8)\n([1,...,800], [801,...,1000])\n\njulia> partition(1:1000, 0.2, 0.7)\n([1,...,200], [201,...,900], [901,...,1000])\n\njulia> partition(reshape(1:10, 5, 2), 0.2, 0.4)\n([1 6], [2 7; 3 8], [4 9; 5 10])\n\njulia> X, y = make_blobs() # a table and vector\njulia> Xtrain, Xtest = partition(X, 0.8, stratify=y)\n\nHere's an example of synchronized partitioning of multiple objects:\n\njulia> (Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)\n\nKeywords\n\nshuffle=nothing: if set to true, shuffles the rows before taking fractions.\nrng=Random.GLOBAL_RNG: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing is interpreted as true.\nstratify=nothing: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle cannot be false.\nmulti=false: if true then X is expected to be a tuple of objects sharing a common length, which are each partitioned separately using the same specified fractions and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.restrict-Union{Tuple{N}, Tuple{Tuple{Vararg{T, N}} where T, Any}} where N","page":"Datasets","title":"MLJBase.restrict","text":"restrict(X, folds, i)\n\nThe restriction of X, a vector, matrix or table, to the ith fold of folds, where folds is a tuple of vectors of row indices.\n\nThe method is curried, so that restrict(folds, i) is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i).\n\nExample\n\n\n\nfolds = ([1, 2], [3, 4, 5], [6,])\nrestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]\n\nSee also corestrict\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.skipinvalid-Tuple{Any}","page":"Datasets","title":"MLJBase.skipinvalid","text":"skipinvalid(itr)\n\nReturn an iterator over the elements in itr skipping missing and NaN values. Behaviour is similar to skipmissing.\n\nskipinvalid(A, B)\n\nFor vectors A and B of the same length, return a tuple of vectors (A[mask], B[mask]) where mask[i] is true if and only if A[i] and B[i] are both valid (non-missing and non-NaN). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing from the element types if present in the original iterators.\n\n\n\n\n\n","category":"method"},{"location":"datasets/#MLJBase.unpack-Tuple{Any, Vararg{Any}}","page":"Datasets","title":"MLJBase.unpack","text":"unpack(table, f1, f2, ... fk;\n wrap_singles=false,\n shuffle=false,\n rng::Union{AbstractRNG,Int,Nothing}=nothing,\n coerce_options...)\n\nHorizontally split any Tables.jl compatible table into smaller tables or vectors by making column selections determined by the predicates f1, f2, ..., fk. Selection from the column names is without replacement. A predicate is any object f such that f(name) is true or false for each column name::Symbol of table.\n\nReturns a tuple of tables/vectors with length one greater than the number of supplied predicates, with the last component including all previously unselected columns.\n\njulia> table = DataFrame(x=[1,2], y=['a', 'b'], z=[10.0, 20.0], w=[\"A\", \"B\"])\n2×4 DataFrame\n Row │ x y z w\n │ Int64 Char Float64 String\n─────┼──────────────────────────────\n 1 │ 1 a 10.0 A\n 2 │ 2 b 20.0 B\n\njulia> Z, XY, W = unpack(table, ==(:z), !=(:w));\njulia> Z\n2-element Vector{Float64}:\n 10.0\n 20.0\n\njulia> XY\n2×2 DataFrame\n Row │ x y\n │ Int64 Char\n─────┼─────────────\n 1 │ 1 a\n 2 │ 2 b\n\njulia> W # the column(s) left over\n2-element Vector{String}:\n \"A\"\n \"B\"\n\nWhenever a returned table contains a single column, it is converted to a vector unless wrap_singles=true.\n\nIf coerce_options are specified then table is first replaced with coerce(table, coerce_options). See ScientificTypes.coerce for details.\n\nIf shuffle=true then the rows of table are first shuffled, using the global RNG, unless rng is specified; if rng is an integer, it specifies the seed of an automatically generated Mersenne twister. If rng is specified then shuffle=true is implicit.\n\n\n\n\n\n","category":"method"}] } diff --git a/dev/utilities/index.html b/dev/utilities/index.html index 145799f0..6bc932ce 100644 --- a/dev/utilities/index.html +++ b/dev/utilities/index.html @@ -1,11 +1,11 @@ -Utilities · MLJBase.jl

      Utilities

      Machines

      Base.replaceMethod
      replace(mach::Machine, field1 => value1, field2 => value2, ...)

      Private method.

      Return a shallow copy of the machine mach with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay, which is always a new instance Channel{Bool}(1).

      The following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):

      replace(mach, :args => (), :data => (), :data_resampled_data => (), :cache => nothing)
      source
      MLJBase.ageMethod
      age(mach::Machine)

      Return an integer representing the number of times mach has been trained or updated. For more detail, see the discussion of training logic at fit_only!.

      source
      MLJBase.ancestorsMethod
      ancestors(mach::Machine; self=false)

      All ancestors of mach, including mach if self=true.

      source
      MLJBase.default_scitype_check_levelFunction
      default_scitype_check_level()

      Return the current global default value for scientific type checking when constructing machines.

      default_scitype_check_level(i::Integer)

      Set the global default value for scientific type checking to i.

      The effect of the scitype_check_level option in calls of the form machine(model, data, scitype_check_level=...) is summarized below:

      scitype_check_levelInspect scitypes?If Unknown in scitypesIf other scitype mismatch
      0×
      1 (value at startup)warning
      2warningwarning
      3warningerror
      4errorerror

      See also machine

      source
      MLJBase.fit_only!Method
      MLJBase.fit_only!(
      +Utilities · MLJBase.jl

      Utilities

      Machines

      Base.replaceMethod
      replace(mach::Machine, field1 => value1, field2 => value2, ...)

      Private method.

      Return a shallow copy of the machine mach with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay, which is always a new instance Channel{Bool}(1).

      The following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):

      replace(mach, :args => (), :data => (), :data_resampled_data => (), :cache => nothing)
      source
      MLJBase.ageMethod
      age(mach::Machine)

      Return an integer representing the number of times mach has been trained or updated. For more detail, see the discussion of training logic at fit_only!.

      source
      MLJBase.ancestorsMethod
      ancestors(mach::Machine; self=false)

      All ancestors of mach, including mach if self=true.

      source
      MLJBase.default_scitype_check_levelFunction
      default_scitype_check_level()

      Return the current global default value for scientific type checking when constructing machines.

      default_scitype_check_level(i::Integer)

      Set the global default value for scientific type checking to i.

      The effect of the scitype_check_level option in calls of the form machine(model, data, scitype_check_level=...) is summarized below:

      scitype_check_levelInspect scitypes?If Unknown in scitypesIf other scitype mismatch
      0×
      1 (value at startup)warning
      2warningwarning
      3warningerror
      4errorerror

      See also machine

      source
      MLJBase.fit_only!Method
      MLJBase.fit_only!(
           mach::Machine;
           rows=nothing,
           verbosity=1,
           force=false,
           composite=nothing,
      -)

      Without mutating any other machine on which it may depend, perform one of the following actions to the machine mach, using the data and model bound to it, and restricting the data to rows if specified:

      • Ab initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment mach.state.

      • Training update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements MLJBase.update). Increment mach.state.

      • No-operation. Leave existing learned parameters untouched. Do not increment mach.state.

      If the model, model, bound to mach is a symbol, then instead perform the action using the true model given by getproperty(composite, model). See also machine.

      Training action logic

      For the action to be a no-operation, either mach.frozen == true or or none of the following apply:

      • (i) mach has never been trained (mach.state == 0).

      • (ii) force == true.

      • (iii) The state of some other machine on which mach depends has changed since the last time mach was trained (ie, the last time mach.state was last incremented).

      • (iv) The specified rows have changed since the last retraining and mach.model does not have Static type.

      • (v) mach.model is a model and different from the last model used for training, but has the same type.

      • (vi) mach.model is a model but has a type different from the last model used for training.

      • (vii) mach.model is a symbol and (composite, mach.model) is different from the last model used for training, but has the same type.

      • (viii) mach.model is a symbol and (composite, mach.model) has a different type from the last model used for training.

      In any of the cases (i) - (iv), (vi), or (viii), mach is trained ab initio. If (v) or (vii) is true, then a training update is applied.

      To freeze or unfreeze mach, use freeze!(mach) or thaw!(mach).

      Implementation details

      The data to which a machine is bound is stored in mach.args. Each element of args is either a Node object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source node. In all cases, to obtain concrete data for actual training, each argument N is called, as in N() or N(rows=rows), and either MLJBase.fit (ab initio training) or MLJBase.update (training update) is dispatched on mach.model and this data. See the "Adding models for general use" section of the MLJ documentation for more on these lower-level training methods.

      source
      MLJBase.freeze!Method
      freeze!(mach)

      Freeze the machine mach so that it will never be retrained (unless thawed).

      See also thaw!.

      source
      MLJBase.last_modelMethod

      last_model(mach::Machine)

      Return the last model used to train the machine mach. This is a bona fide model, even if mach.model is a symbol.

      Returns nothing if mach has not been trained.

      source
      MLJBase.machineFunction
      machine(model, args...; cache=true, scitype_check_level=1)

      Construct a Machine object binding a model, storing hyper-parameters of some machine learning algorithm, to some data, args. Calling fit! on a Machine instance mach stores outcomes of applying the algorithm in mach, which can be inspected using fitted_params(mach) (learned paramters) and report(mach) (other outcomes). This in turn enables generalization to new data using operations such as predict or transform:

      using MLJModels
      +)

      Without mutating any other machine on which it may depend, perform one of the following actions to the machine mach, using the data and model bound to it, and restricting the data to rows if specified:

      • Ab initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment mach.state.

      • Training update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements MLJBase.update). Increment mach.state.

      • No-operation. Leave existing learned parameters untouched. Do not increment mach.state.

      If the model, model, bound to mach is a symbol, then instead perform the action using the true model given by getproperty(composite, model). See also machine.

      Training action logic

      For the action to be a no-operation, either mach.frozen == true or or none of the following apply:

      • (i) mach has never been trained (mach.state == 0).

      • (ii) force == true.

      • (iii) The state of some other machine on which mach depends has changed since the last time mach was trained (ie, the last time mach.state was last incremented).

      • (iv) The specified rows have changed since the last retraining and mach.model does not have Static type.

      • (v) mach.model is a model and different from the last model used for training, but has the same type.

      • (vi) mach.model is a model but has a type different from the last model used for training.

      • (vii) mach.model is a symbol and (composite, mach.model) is different from the last model used for training, but has the same type.

      • (viii) mach.model is a symbol and (composite, mach.model) has a different type from the last model used for training.

      In any of the cases (i) - (iv), (vi), or (viii), mach is trained ab initio. If (v) or (vii) is true, then a training update is applied.

      To freeze or unfreeze mach, use freeze!(mach) or thaw!(mach).

      Implementation details

      The data to which a machine is bound is stored in mach.args. Each element of args is either a Node object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source node. In all cases, to obtain concrete data for actual training, each argument N is called, as in N() or N(rows=rows), and either MLJBase.fit (ab initio training) or MLJBase.update (training update) is dispatched on mach.model and this data. See the "Adding models for general use" section of the MLJ documentation for more on these lower-level training methods.

      source
      MLJBase.freeze!Method
      freeze!(mach)

      Freeze the machine mach so that it will never be retrained (unless thawed).

      See also thaw!.

      source
      MLJBase.last_modelMethod

      last_model(mach::Machine)

      Return the last model used to train the machine mach. This is a bona fide model, even if mach.model is a symbol.

      Returns nothing if mach has not been trained.

      source
      MLJBase.machineFunction
      machine(model, args...; cache=true, scitype_check_level=1)

      Construct a Machine object binding a model, storing hyper-parameters of some machine learning algorithm, to some data, args. Calling fit! on a Machine instance mach stores outcomes of applying the algorithm in mach, which can be inspected using fitted_params(mach) (learned paramters) and report(mach) (other outcomes). This in turn enables generalization to new data using operations such as predict or transform:

      using MLJModels
       X, y = make_regression()
       
       PCA = @load PCA pkg=MultivariateStats
      @@ -28,7 +28,7 @@
       X, y = make_blobs()
       mach = machine(:classifier, X, y)
       fit!(mach, composite=my_composite)

      The last two lines are equivalent to

      mach = machine(ConstantClassifier(), X, y)
      -fit!(mach)

      Delaying model specification is used when exporting learning networks as new stand-alone model types. See prefit and the MLJ documentation on learning networks.

      See also fit!, default_scitype_check_level, MLJBase.save, serializable.

      source
      MLJBase.machineMethod
      machine(file::Union{String, IO})

      Rebuild from a file a machine that has been serialized using the default Serialization module.

      source
      MLJBase.reportMethod
      report(mach)

      Return the report for a machine mach that has been fit!, for example the coefficients in a linear model.

      This is a named tuple and human-readable if possible.

      If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

      julia> using MLJ
      +fit!(mach)

      Delaying model specification is used when exporting learning networks as new stand-alone model types. See prefit and the MLJ documentation on learning networks.

      See also fit!, default_scitype_check_level, MLJBase.save, serializable.

      source
      MLJBase.machineMethod
      machine(file::Union{String, IO})

      Rebuild from a file a machine that has been serialized using the default Serialization module.

      source
      MLJBase.reportMethod
      report(mach)

      Return the report for a machine mach that has been fit!, for example the coefficients in a linear model.

      This is a named tuple and human-readable if possible.

      If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

      julia> using MLJ
       julia> @load LinearBinaryClassifier pkg=GLM
       julia> X, y = @load_crabs;
       julia> pipe = Standardizer() |> LinearBinaryClassifier();
      @@ -39,7 +39,7 @@
        dof_residual = 195.0,
        stderror = [18954.83496713119, 6502.845740757159, 48484.240246060406, 34971.131004997274, 20654.82322484894, 2111.1294584763386],
        vcov = [3.592857686311793e8 9.122732393971942e6 … -8.454645589364915e7 5.38856837634321e6; 9.122732393971942e6 4.228700272808351e7 … -4.978433790526467e7 -8.442545425533723e6; … ; -8.454645589364915e7 -4.978433790526467e7 … 4.2662172244975924e8 2.1799125705781363e7; 5.38856837634321e6 -8.442545425533723e6 … 2.1799125705781363e7 4.456867590446599e6],)
      -

      See also fitted_params

      source
      MLJBase.report_given_methodMethod
      report_given_method(mach::Machine)

      Same as report(mach) but broken down by the method (fit, predict, etc) that contributed the report.

      A specialized method intended for learning network applications.

      The return value is a dictionary keyed on the symbol representing the method (:fit, :predict, etc) and the values report contributed by that method.

      source
      MLJBase.restore!Function
      restore!(mach::Machine)

      Restore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach, one has mach.state=1. Intended for restoring deserialized machine objects to a useable form.

      For an example see serializable.

      source
      MLJBase.serializableMethod
      serializable(mach::Machine)

      Returns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.

      Any general purpose Julia serializer may be applied to the output of serializable (eg, JLSO, BSON, JLD) but you must call restore!(mach) on the deserialised object mach before using it. See the example below.

      If using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save (or MLJ.save) method.

      A machine returned by serializable is characterized by the property mach.state == -1.

      Example using JLSO

      using MLJ
      +

      See also fitted_params

      source
      MLJBase.report_given_methodMethod
      report_given_method(mach::Machine)

      Same as report(mach) but broken down by the method (fit, predict, etc) that contributed the report.

      A specialized method intended for learning network applications.

      The return value is a dictionary keyed on the symbol representing the method (:fit, :predict, etc) and the values report contributed by that method.

      source
      MLJBase.restore!Function
      restore!(mach::Machine)

      Restore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach, one has mach.state=1. Intended for restoring deserialized machine objects to a useable form.

      For an example see serializable.

      source
      MLJBase.serializableMethod
      serializable(mach::Machine)

      Returns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.

      Any general purpose Julia serializer may be applied to the output of serializable (eg, JLSO, BSON, JLD) but you must call restore!(mach) on the deserialised object mach before using it. See the example below.

      If using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save (or MLJ.save) method.

      A machine returned by serializable is characterized by the property mach.state == -1.

      Example using JLSO

      using MLJ
       using JLSO
       Tree = @load DecisionTreeClassifier
       tree = Tree()
      @@ -55,7 +55,7 @@
       restore!(loaded_mach)
       
       predict(loaded_mach, X)
      -predict(mach, X)

      See also restore!, MLJBase.save.

      source
      MLJModelInterface.fitted_paramsMethod
      fitted_params(mach)

      Return the learned parameters for a machine mach that has been fit!, for example the coefficients in a linear model.

      This is a named tuple and human-readable if possible.

      If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

      julia> using MLJ
      +predict(mach, X)

      See also restore!, MLJBase.save.

      source
      MLJModelInterface.fitted_paramsMethod
      fitted_params(mach)

      Return the learned parameters for a machine mach that has been fit!, for example the coefficients in a linear model.

      This is a named tuple and human-readable if possible.

      If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

      julia> using MLJ
       julia> @load LogisticClassifier pkg=MLJLinearModels
       julia> X, y = @load_crabs;
       julia> pipe = Standardizer() |> LogisticClassifier();
      @@ -64,7 +64,7 @@
       julia> fitted_params(mach).logistic_classifier
       (classes = CategoricalArrays.CategoricalValue{String,UInt32}["B", "O"],
        coefs = Pair{Symbol,Float64}[:FL => 3.7095037897680405, :RW => 0.1135739140854546, :CL => -1.6036892745322038, :CW => -4.415667573486482, :BD => 3.238476051092471],
      - intercept = 0.0883301599726305,)

      See also report

      source
      MLJModelInterface.saveMethod
      MLJ.save(filename, mach::Machine)
       MLJ.save(io, mach::Machine)
       
       MLJBase.save(filename, mach::Machine)
      @@ -82,12 +82,12 @@
       MLJ.save(io, mach)
       seekstart(io)
       predict_only_mach = machine(io)
      -predict(predict_only_mach, X)
      Only load files from trusted sources

      Maliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.

      See also serializable, machine.

      source
      StatsAPI.fit!Method
      fit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)

      Fit the machine mach. In the case that mach has Node arguments, first train all other machines on which mach depends.

      To attempt to fit a machine without touching any other machine, use fit_only!. For more on options and the the internal logic of fitting see fit_only!

      source

      Parameter Inspection

      Show

      MLJBase._recursive_showMethod
      _recursive_show(stream, object, current_depth, depth)

      Private method.

      Generate a table of the properties of the MLJType object, dislaying each property value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:

      1. If f is itself a MLJType object, then its short form is shown and _recursive_show generates as separate table for each of its properties (and so on, up to a depth of argument depth).

      2. Otherwise f is displayed as "(omitted T)" where T = typeof(f), unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME"plain/text") form of f is shown. To override this behaviour, overload the _show method for the type in question.

      source
      MLJBase.color_offMethod
      color_off()

      Suppress color and bold output at the REPL for displaying MLJ objects.

      source
      MLJBase.color_onMethod
      color_on()

      Enable color and bold output at the REPL, for enhanced display of MLJ objects.

      source
      MLJBase.handleMethod

      return abbreviated object id (as string) or it's registered handle (as string) if this exists

      source
      MLJBase.@constantMacro
      @constant x = value

      Private method (used in testing).

      Equivalent to const x = value but registers the binding thus:

      MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x

      Registered objects get displayed using the variable name to which it was bound in calls to show(x), etc.

      WARNING: As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.

      source
      MLJBase.@moreMacro
      @more

      Entered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all properties of the last REPL value.

      source

      Utility functions

      MLJBase._permute_rowsMethod
      _permute_rows(obj, perm)

      Internal function to return a vector or matrix with permuted rows given the permutation perm.

      source
      MLJBase.available_nameMethod
      available_name(modl::Module, name::Symbol)

      Function to replace, if necessary, a given name with a modified one that ensures it is not the name of any existing object in the global scope of modl. Modifications are created with numerical suffixes.

      source
      MLJBase.check_same_nrowsMethod
      check_same_nrows(X, Y)

      Internal function to check two objects, each a vector or a matrix, have the same number of rows.

      source
      MLJBase.chunksMethod
      chunks(range, n)

      Split an AbstractRange into n subranges of approximately equal length.

      Example

      julia> collect(chunks(1:5, 2))
      +predict(predict_only_mach, X)
      Only load files from trusted sources

      Maliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.

      See also serializable, machine.

      source
      StatsAPI.fit!Method
      fit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)

      Fit the machine mach. In the case that mach has Node arguments, first train all other machines on which mach depends.

      To attempt to fit a machine without touching any other machine, use fit_only!. For more on options and the the internal logic of fitting see fit_only!

      source

      Parameter Inspection

      Show

      MLJBase._recursive_showMethod
      _recursive_show(stream, object, current_depth, depth)

      Private method.

      Generate a table of the properties of the MLJType object, dislaying each property value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:

      1. If f is itself a MLJType object, then its short form is shown and _recursive_show generates as separate table for each of its properties (and so on, up to a depth of argument depth).

      2. Otherwise f is displayed as "(omitted T)" where T = typeof(f), unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME"plain/text") form of f is shown. To override this behaviour, overload the _show method for the type in question.

      source
      MLJBase.color_offMethod
      color_off()

      Suppress color and bold output at the REPL for displaying MLJ objects.

      source
      MLJBase.color_onMethod
      color_on()

      Enable color and bold output at the REPL, for enhanced display of MLJ objects.

      source
      MLJBase.handleMethod

      return abbreviated object id (as string) or it's registered handle (as string) if this exists

      source
      MLJBase.@constantMacro
      @constant x = value

      Private method (used in testing).

      Equivalent to const x = value but registers the binding thus:

      MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x

      Registered objects get displayed using the variable name to which it was bound in calls to show(x), etc.

      WARNING: As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.

      source
      MLJBase.@moreMacro
      @more

      Entered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all properties of the last REPL value.

      source

      Utility functions

      MLJBase._permute_rowsMethod
      _permute_rows(obj, perm)

      Internal function to return a vector or matrix with permuted rows given the permutation perm.

      source
      MLJBase.available_nameMethod
      available_name(modl::Module, name::Symbol)

      Function to replace, if necessary, a given name with a modified one that ensures it is not the name of any existing object in the global scope of modl. Modifications are created with numerical suffixes.

      source
      MLJBase.check_same_nrowsMethod
      check_same_nrows(X, Y)

      Internal function to check two objects, each a vector or a matrix, have the same number of rows.

      source
      MLJBase.chunksMethod
      chunks(range, n)

      Split an AbstractRange into n subranges of approximately equal length.

      Example

      julia> collect(chunks(1:5, 2))
       2-element Array{UnitRange{Int64},1}:
        1:3
      - 4:5

      Private method

      source
      MLJBase.flat_valuesMethod
      flat_values(t::NamedTuple)

      View a nested named tuple t as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.

      julia> t = (X = (x = 1, y = 2), Y = 3);
      + 4:5

      Private method

      source
      MLJBase.flat_valuesMethod
      flat_values(t::NamedTuple)

      View a nested named tuple t as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.

      julia> t = (X = (x = 1, y = 2), Y = 3);
       julia> flat_values(t)
      -(1, 2, 3)
      source
      MLJBase.generate_name!Method
      generate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)

      Given a type M (e.g., MyEvenInteger{N}) return a symbolic, snake-case, representation of the type name (such as my_even_integer). The symbol is pushed to existing_names, which must be an AbstractVector to which a Symbol can be pushed.

      If the snake-case representation already exists in existing_names a suitable integer is appended to the name.

      If only is specified, then the operation is restricted to those M for which M isa only. In all other cases the symbolic name is generated using substitute as the base symbol.

      julia> existing_names = [];
      +(1, 2, 3)
      source
      MLJBase.generate_name!Method
      generate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)

      Given a type M (e.g., MyEvenInteger{N}) return a symbolic, snake-case, representation of the type name (such as my_even_integer). The symbol is pushed to existing_names, which must be an AbstractVector to which a Symbol can be pushed.

      If the snake-case representation already exists in existing_names a suitable integer is appended to the name.

      If only is specified, then the operation is restricted to those M for which M isa only. In all other cases the symbolic name is generated using substitute as the base symbol.

      julia> existing_names = [];
       julia> generate_name!(Vector{Int}, existing_names)
       :vector
       
      @@ -101,7 +101,7 @@
       :not_array
       
       julia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)
      -:not_array2
      source
      MLJBase.guess_model_target_observation_scitypeMethod
      guess_model_targetobservation_scitype(model)

      Private method

      Try to infer a lowest upper bound on the scitype of target observations acceptable to model, by inspecting target_scitype(model). Return Unknown if unable to draw reliable inferrence.

      The observation scitype for a table is here understood as the scitype of a row converted to a vector.

      source
      MLJBase.guess_observation_scitypeMethod
      guess_observation_scitype(y)

      Private method.

      If y is an AbstractArray, return the scitype of y[:, :, ..., :, 1]. If y is a table, return the scitype of the first row, converted to a vector, unless this row has missing elements, in which case return Unknown.

      In all other cases, Unknown.

      julia> guess_observation_scitype([missing, 1, 2, 3])
      +:not_array2
      source
      MLJBase.guess_model_target_observation_scitypeMethod
      guess_model_targetobservation_scitype(model)

      Private method

      Try to infer a lowest upper bound on the scitype of target observations acceptable to model, by inspecting target_scitype(model). Return Unknown if unable to draw reliable inferrence.

      The observation scitype for a table is here understood as the scitype of a row converted to a vector.

      source
      MLJBase.guess_observation_scitypeMethod
      guess_observation_scitype(y)

      Private method.

      If y is an AbstractArray, return the scitype of y[:, :, ..., :, 1]. If y is a table, return the scitype of the first row, converted to a vector, unless this row has missing elements, in which case return Unknown.

      In all other cases, Unknown.

      julia> guess_observation_scitype([missing, 1, 2, 3])
       Union{Missing, Count}
       
       julia> guess_observation_scitype(rand(3, 2))
      @@ -111,16 +111,16 @@
       AbstractVector{Union{Continuous, Count}}
       
       julia> guess_observation_scitype((x=[missing, 1, 2], y=[1, 2, 3]))
      -Unknown
      source
      MLJBase.init_rngMethod
      init_rng(rng)

      Create an AbstractRNG from rng. If rng is a non-negative Integer, it returns a MersenneTwister random number generator seeded with rng; If rng is an AbstractRNG object it returns rng, otherwise it throws an error.

      source
      MLJBase.observationMethod
      observation(S)

      Private method.

      Tries to infer the per-observation scitype from the scitype of S, when S is known to be the scitype of some container with multiple observations; here we view the scitype for one row of a table to be the scitype of the row converted to a vector. Return Unknown if unable to draw reliable inferrence.

      The observation scitype for a table is here understood as the scitype of a row converted to a vector.

      source
      MLJBase.prependMethod
      MLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})

      For prepending symbols in expressions like :(y.w) and :(x1.x2.x3).

      julia> prepend(:x, :y)
      +Unknown
      source
      MLJBase.init_rngMethod
      init_rng(rng)

      Create an AbstractRNG from rng. If rng is a non-negative Integer, it returns a MersenneTwister random number generator seeded with rng; If rng is an AbstractRNG object it returns rng, otherwise it throws an error.

      source
      MLJBase.observationMethod
      observation(S)

      Private method.

      Tries to infer the per-observation scitype from the scitype of S, when S is known to be the scitype of some container with multiple observations; here we view the scitype for one row of a table to be the scitype of the row converted to a vector. Return Unknown if unable to draw reliable inferrence.

      The observation scitype for a table is here understood as the scitype of a row converted to a vector.

      source
      MLJBase.prependMethod
      MLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})

      For prepending symbols in expressions like :(y.w) and :(x1.x2.x3).

      julia> prepend(:x, :y)
       :(x.y)
       
       julia> prepend(:x, :(y.z))
       :(x.y.z)
       
       julia> prepend(:w, ans)
      -:(w.x.y.z)

      If the second argument is nothing, then nothing is returned.

      source
      MLJBase.recursive_getpropertyMethod
      recursive_getproperty(object, nested_name::Expr)

      Call getproperty recursively on object to extract the value of some nested property, as in the following example:

      julia> object = (X = (x = 1, y = 2), Y = 3);
      +:(w.x.y.z)

      If the second argument is nothing, then nothing is returned.

      source
      MLJBase.recursive_getpropertyMethod
      recursive_getproperty(object, nested_name::Expr)

      Call getproperty recursively on object to extract the value of some nested property, as in the following example:

      julia> object = (X = (x = 1, y = 2), Y = 3);
       julia> recursive_getproperty(object, :(X.y))
      -2
      source
      MLJBase.recursive_setproperty!Method
      recursively_setproperty!(object, nested_name::Expr, value)

      Set a nested property of an object to value, as in the following example:

      julia> mutable struct Foo
      +2
      source
      MLJBase.recursive_setproperty!Method
      recursively_setproperty!(object, nested_name::Expr, value)

      Set a nested property of an object to value, as in the following example:

      julia> mutable struct Foo
                  X
                  Y
              end
      @@ -137,10 +137,10 @@
       42
       
       julia> object
      -Foo(Bar(1, 42), 3)
      source
      MLJBase.sequence_stringMethod
      sequence_string(itr, n=3)

      Return a "sequence" string from the first n elements generated by itr.

      julia> MLJBase.sequence_string(1:10, 4)
      -"1, 2, 3, 4, ..."

      Private method.

      source
      MLJBase.sequence_stringMethod
      sequence_string(itr, n=3)

      Return a "sequence" string from the first n elements generated by itr.

      julia> MLJBase.sequence_string(1:10, 4)
      +"1, 2, 3, 4, ..."

      Private method.

      source
      MLJBase.shuffle_rowsMethod
      shuffle_rows(X::AbstractVecOrMat,
                    Y::AbstractVecOrMat;
      -             rng::AbstractRNG=Random.GLOBAL_RNG)

      Return row-shuffled vectors or matrices using a random permutation of X and Y. An optional random number generator can be specified using the rng argument.

      source
      MLJBase.unwindMethod
      unwind(iterators...)

      Represent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.

      Example

      julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
      +             rng::AbstractRNG=Random.GLOBAL_RNG)

      Return row-shuffled vectors or matrices using a random permutation of X and Y. An optional random number generator can be specified using the rng argument.

      source
      MLJBase.unwindMethod
      unwind(iterators...)

      Represent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.

      Example

      julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
       julia> MLJTuning.unwind(iterators...)
       12×3 Array{Any,2}:
        1  "a"  "x"
      @@ -154,4 +154,4 @@
        1  "a"  "z"
        2  "a"  "z"
        1  "b"  "z"
      - 2  "b"  "z"
      source
      + 2 "b" "z"
      source