Settings
This document was generated with Documenter.jl version 0.27.25 on Friday 1 March 2024. Using Julia version 1.10.1.
To add a new dataset assuming it has a header and is, at path data/newdataset.csv
Start by loading it with CSV:
fpath = joinpath("datadir", "newdataset.csv")
+data = CSV.read(fpath, copycols=true,
+ categorical=true)
Load it with DelimitedFiles and Tables
data_raw, data_header = readdlm(fpath, ',', header=true)
+data_table = Tables.table(data_raw; header=Symbol.(vec(data_header)))
Retrieve the conversions:
for (n, st) in zip(names(data), scitype_union.(eachcol(data)))
+ println(":$n=>$st,")
+end
Copy and paste the result in a coerce
data_table = coerce(data_table, ...)
MLJBase.load_dataset
— Methodload_dataset(fpath, coercions)
Load one of standard dataset like Boston etc assuming the file is a comma separated file with a header.
MLJBase.load_sunspots
— MethodLoad a well-known sunspot time series (table with one column). [https://www.sws.bom.gov.au/Educational/2/3/6]](https://www.sws.bom.gov.au/Educational/2/3/6)
MLJBase.@load_ames
— MacroLoad the full version of the well-known Ames Housing task.
MLJBase.@load_boston
— MacroLoad a well-known public regression dataset with Continuous
features.
MLJBase.@load_crabs
— MacroLoad a well-known crab classification dataset with nominal features.
MLJBase.@load_iris
— MacroLoad a well-known public classification task with nominal features.
MLJBase.@load_reduced_ames
— MacroLoad a reduced version of the well-known Ames Housing task
MLJBase.@load_smarket
— MacroLoad S&P Stock Market dataset, as used in (An Introduction to Statistical Learning with applications in R)https://rdrr.io/cran/ISLR/man/Smarket.html, by Witten et al (2013), Springer-Verlag, New York.
MLJBase.@load_sunspots
— MacroLoad a well-known sunspot time series (single table with one column).
MLJBase.x
— Constantfinalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)
Internal function to finalize the make_*
functions.
MLJBase.augment_X
— Methodaugment_X(X, fit_intercept)
Given a matrix X
, append a column of ones if fit_intercept
is true. See make_regression
.
MLJBase.make_blobs
— FunctionX, y = make_blobs(n=100, p=2; kwargs...)
Generate Gaussian blobs for clustering and classification problems.
Return value
By default, a table X
with p
columns (features) and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
, indicating blob membership.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
centers=3
: either a number of centers or a c x p
matrix with c
pre-determined centers,
cluster_std=1.0
: the standard deviation(s) of each blob,
center_box=(-10. => 10.)
: the limits of the p
-dimensional cube within which the cluster centers are drawn if they are not provided,
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])
MLJBase.make_circles
— FunctionX, y = make_circles(n=100; kwargs...)
Generate n
labeled points close to two concentric circles for classification and clustering models.
Return value
By default, a table X
with 2
columns and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
. The target is either 0
or 1
, corresponding to membership to the smaller or larger circle, respectively.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
noise=0
: standard deviation of the Gaussian noise added to the data,
factor=0.8
: ratio of the smaller radius over the larger one,
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_circles(100; noise=0.5, factor=0.3)
MLJBase.make_moons
— Function make_moons(n::Int=100; kwargs...)
Generates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.
Return value
By default, a table X
with 2
columns and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
. The target is either 0
or 1
, corresponding to membership to the left or right semi-circle.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
noise=0.1
: standard deviation of the Gaussian noise added to the data,
xshift=1.0
: horizontal translation of the second center with respect to the first one.
yshift=0.3
: vertical translation of the second center with respect to the first one.
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_moons(100; noise=0.5)
MLJBase.make_regression
— Functionmake_regression(n, p; kwargs...)
Generate Gaussian input features and a linear response with Gaussian noise, for use with regression models.
Return value
By default, a tuple (X, y)
where table X
has p
columns and n
rows (observations), together with a corresponding vector of n
Continuous
target observations y
.
Keywords
intercept=true
: Whether to generate data from a model with intercept.
n_targets=1
: Number of columns in the target.
sparse=0
: Proportion of the generating weight vector that is sparse.
noise=0.1
: Standard deviation of the Gaussian noise added to the response (target).
outliers=0
: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary
is false
.)
as_table=true
: Whether X
(and y
, if n_targets > 1
) should be a table or a matrix.
eltype=Float64
: Element type for X
and y
. Must subtype AbstractFloat
.
binary=false
: Whether the target should be binarized (via a sigmoid).
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false).
Example
X, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)
MLJBase.outlify!
— MethodAdd outliers to portion s of vector.
MLJBase.runif_ab
— Methodrunif_ab(rng, n, p, a, b)
Internal function to generate n
points in [a, b]ᵖ
uniformly at random.
MLJBase.sigmoid
— Methodsigmoid(x)
Return the sigmoid computed in a numerically stable way:
$σ(x) = 1/(1+exp(-x))$
MLJBase.sparsify!
— Methodsparsify!(rng, θ, s)
Make portion s
of vector θ
exactly 0.
MLJBase.complement
— Methodcomplement(folds, i)
The complement of the i
th fold of folds
in the concatenation of all elements of folds
. Here folds
is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.
complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
MLJBase.corestrict
— Methodcorestrict(X, folds, i)
The restriction of X
, a vector, matrix or table, to the complement of the i
th fold of folds
, where folds
is a tuple of vectors of row indices.
The method is curried, so that corestrict(folds, i)
is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i)
.
Example
folds = ([1, 2], [3, 4, 5], [6,])
+corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
MLJBase.partition
— Methodpartition(X, fractions...;
+ shuffle=nothing,
+ rng=Random.GLOBAL_RNG,
+ stratify=nothing,
+ multi=false)
Splits the vector, matrix or table X
into a tuple of objects of the same type, whose vertical concatenation is X
. The number of rows in each component of the return value is determined by the corresponding fractions
of length(nrows(X))
, where valid fractions are floats between 0 and 1 whose sum is less than one. The last fraction is not provided, as it is inferred from the preceding ones.
For "synchronized" partitioning of multiple objects, use the multi=true
option described below.
julia> partition(1:1000, 0.8)
+([1,...,800], [801,...,1000])
+
+julia> partition(1:1000, 0.2, 0.7)
+([1,...,200], [201,...,900], [901,...,1000])
+
+julia> partition(reshape(1:10, 5, 2), 0.2, 0.4)
+([1 6], [2 7; 3 8], [4 9; 5 10])
+
+X, y = make_blobs() # a table and vector
+Xtrain, Xtest = partition(X, 0.8, stratify=y)
+
+(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)
Keywords
shuffle=nothing
: if set to true
, shuffles the rows before taking fractions.
rng=Random.GLOBAL_RNG
: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing
is interpreted as true.
stratify=nothing
: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle
cannot be false
.
multi=false
: if true
then X
is expected to be a tuple
of objects sharing a common length, which are each partitioned separately using the same specified fractions
and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).
MLJBase.restrict
— Methodrestrict(X, folds, i)
The restriction of X
, a vector, matrix or table, to the i
th fold of folds
, where folds
is a tuple of vectors of row indices.
The method is curried, so that restrict(folds, i)
is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i)
.
Example
folds = ([1, 2], [3, 4, 5], [6,])
+restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]
See also corestrict
MLJBase.skipinvalid
— Methodskipinvalid(itr)
Return an iterator over the elements in itr
skipping missing
and NaN
values. Behaviour is similar to skipmissing
.
skipinvalid(A, B)
For vectors A
and B
of the same length, return a tuple of vectors (A[mask], B[mask])
where mask[i]
is true
if and only if A[i]
and B[i]
are both valid (non-missing
and non-NaN
). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing
from the element types if present in the original iterators.
MLJBase.unpack
— Methodunpack(table, f1, f2, ... fk;
+ wrap_singles=false,
+ shuffle=false,
+ rng::Union{AbstractRNG,Int,Nothing}=nothing,
+ coerce_options...)
Horizontally split any Tables.jl compatible table
into smaller tables or vectors by making column selections determined by the predicates f1
, f2
, ..., fk
. Selection from the column names is without replacement. A predicate is any object f
such that f(name)
is true
or false
for each column name::Symbol
of table
.
Returns a tuple of tables/vectors with length one greater than the number of supplied predicates, with the last component including all previously unselected columns.
julia> table = DataFrame(x=[1,2], y=['a', 'b'], z=[10.0, 20.0], w=["A", "B"])
+2×4 DataFrame
+ Row │ x y z w
+ │ Int64 Char Float64 String
+─────┼──────────────────────────────
+ 1 │ 1 a 10.0 A
+ 2 │ 2 b 20.0 B
+
+Z, XY, W = unpack(table, ==(:z), !=(:w))
+julia> Z
+2-element Vector{Float64}:
+ 10.0
+ 20.0
+
+julia> XY
+2×2 DataFrame
+ Row │ x y
+ │ Int64 Char
+─────┼─────────────
+ 1 │ 1 a
+ 2 │ 2 b
+
+julia> W # the column(s) left over
+2-element Vector{String}:
+ "A"
+ "B"
Whenever a returned table contains a single column, it is converted to a vector unless wrap_singles=true
.
If coerce_options
are specified then table
is first replaced with coerce(table, coerce_options)
. See ScientificTypes.coerce
for details.
If shuffle=true
then the rows of table
are first shuffled, using the global RNG, unless rng
is specified; if rng
is an integer, it specifies the seed of an automatically generated Mersenne twister. If rng
is specified then shuffle=true
is implicit.
Settings
This document was generated with Documenter.jl version 0.27.25 on Friday 1 March 2024. Using Julia version 1.10.1.
Distributions.sampler
— Methodsampler(r::NominalRange, probs::AbstractVector{<:Real})
+sampler(r::NominalRange)
+sampler(r::NumericRange{T}, d)
Construct an object s
which can be used to generate random samples from a ParamRange
object r
(a one-dimensional range) using one of the following calls:
rand(s) # for one sample
+rand(s, n) # for n samples
+rand(rng, s [, n]) # to specify an RNG
The argument probs
can be any probability vector with the same length as r.values
. The second sampler
method above calls the first with a uniform probs
vector.
The argument d
can be either an arbitrary instance of UnivariateDistribution
from the Distributions.jl package, or one of a Distributions.jl types for which fit(d, ::NumericRange)
is defined. These include: Arcsine
, Uniform
, Biweight
, Cosine
, Epanechnikov
, SymTriangularDist
, Triweight
, Normal
, Gamma
, InverseGaussian
, Logistic
, LogNormal
, Cauchy
, Gumbel
, Laplace
, and Poisson
; but see the doc-string for Distributions.fit
for an up-to-date list.
If d
is an instance, then sampling is from a truncated form of the supplied distribution d
, the truncation bounds being r.lower
and r.upper
(the attributes r.origin
and r.unit
attributes are ignored). For discrete numeric ranges (T <: Integer
) the samples are rounded.
If d
is a type then a suitably truncated distribution is automatically generated using Distributions.fit(d, r)
.
Important. Values are generated with no regard to r.scale
, except in the special case r.scale
is a callable object f
. In that case, f
is applied to all values generated by rand
as described above (prior to rounding, in the case of discrete numeric ranges).
Examples
r = range(Char, :letter, values=collect("abc"))
+s = sampler(r, [0.1, 0.2, 0.7])
+samples = rand(s, 1000);
+StatsBase.countmap(samples)
+Dict{Char,Int64} with 3 entries:
+ 'a' => 107
+ 'b' => 205
+ 'c' => 688
+
+r = range(Int, :k, lower=2, upper=6) # numeric but discrete
+s = sampler(r, Normal)
+samples = rand(s, 1000);
+UnicodePlots.histogram(samples)
+ ┌ ┐
+[2.0, 2.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 119
+[2.5, 3.0) ┤ 0
+[3.0, 3.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 296
+[3.5, 4.0) ┤ 0
+[4.0, 4.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 275
+[4.5, 5.0) ┤ 0
+[5.0, 5.5) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 221
+[5.5, 6.0) ┤ 0
+[6.0, 6.5) ┤▇▇▇▇▇▇▇▇▇▇▇ 89
+ └ ┘
MLJBase.iterator
— Methoditerator([rng, ], r::NominalRange, [,n])
+iterator([rng, ], r::NumericRange, n)
Return an iterator (currently a vector) for a ParamRange
object r
. In the first case iteration is over all values
stored in the range (or just the first n
, if n
is specified). In the second case, the iteration is over approximately n
ordered values, generated as follows:
(i) First, exactly n
values are generated between U
and L
, with a spacing determined by r.scale
(uniform if scale=:linear
) where U
and L
are given by the following table:
r.lower | r.upper | L | U |
---|---|---|---|
finite | finite | r.lower | r.upper |
-Inf | finite | r.upper - 2r.unit | r.upper |
finite | Inf | r.lower | r.lower + 2r.unit |
-Inf | Inf | r.origin - r.unit | r.origin + r.unit |
(ii) If a callable f
is provided as scale
, then a uniform spacing is always applied in (i) but f
is broadcast over the results. (Unlike ordinary scales, this alters the effective range of values generated, instead of just altering the spacing.)
(iii) If r
is a discrete numeric range (r isa NumericRange{<:Integer}
) then the values are additionally rounded, with any duplicate values removed. Otherwise all the values are used (and there are exacltly n
of them).
(iv) Finally, if a random number generator rng
is specified, then the values are returned in random order (sampling without replacement), and otherwise they are returned in numeric order, or in the order provided to the range constructor, in the case of a NominalRange
.
MLJBase.scale
— Methodscale(r::ParamRange)
Return the scale associated with a ParamRange
object r
. The possible return values are: :none
(for a NominalRange
), :linear
, :log
, :log10
, :log2
, or :custom
(if r.scale
is a callable object).
StatsAPI.fit
— MethodDistributions.fit(D, r::MLJBase.NumericRange)
Fit and return a distribution d
of type D
to the one-dimensional range r
.
Only types D
in the table below are supported.
The distribution d
is constructed in two stages. First, a distributon d0
, characterized by the conditions in the second column of the table, is fit to r
. Then d0
is truncated between r.lower
and r.upper
to obtain d
.
Distribution type D | Characterization of d0 |
---|---|
Arcsine , Uniform , Biweight , Cosine , Epanechnikov , SymTriangularDist , Triweight | minimum(d) = r.lower , maximum(d) = r.upper |
Normal , Gamma , InverseGaussian , Logistic , LogNormal | mean(d) = r.origin , std(d) = r.unit |
Cauchy , Gumbel , Laplace , (Normal ) | Dist.location(d) = r.origin , Dist.scale(d) = r.unit |
Poisson | Dist.mean(d) = r.unit |
Here Dist = Distributions
.
Base.range
— Methodr = range(model, :hyper; values=nothing)
Define a one-dimensional NominalRange
object for a field hyper
of model
. Note that r
is not directly iterable but iterator(r)
is.
A nested hyperparameter is specified using dot notation. For example, :(atom.max_depth)
specifies the max_depth
hyperparameter of the submodel model.atom
.
r = range(model, :hyper; upper=nothing, lower=nothing,
+ scale=nothing, values=nothing)
Assuming values
is not specified, define a one-dimensional NumericRange
object for a Real
field hyper
of model
. Note that r
is not directly iteratable but iterator(r, n)
is an iterator of length n
. To generate random elements from r
, instead apply rand
methods to sampler(r)
. The supported scales are :linear
,:log
, :logminus
, :log10
, :log10minus
, :log2
, or a callable object.
Note that r
is not directly iterable, but iterator(r, n)
is, for given resolution (length) n
.
By default, the behaviour of the constructed object depends on the type of the value of the hyperparameter :hyper
at model
at the time of construction. To override this behaviour (for instance if model
is not available) specify a type in place of model
so the behaviour is determined by the value of the specified type.
A nested hyperparameter is specified using dot notation (see above).
If scale
is unspecified, it is set to :linear
, :log
, :log10minus
, or :linear
, according to whether the interval (lower, upper)
is bounded, right-unbounded, left-unbounded, or doubly unbounded, respectively. Note upper=Inf
and lower=-Inf
are allowed.
If values
is specified, the other keyword arguments are ignored and a NominalRange
object is returned (see above).
Settings
This document was generated with Documenter.jl version 0.27.25 on Friday 1 March 2024. Using Julia version 1.10.1.
These docs are bare-bones and auto-generated. Complete MLJ documentation is here.
For MLJBase-specific developer information, see also the README.md file.
Settings
This document was generated with Documenter.jl version 0.27.25 on Friday 1 March 2024. Using Julia version 1.10.1.
MLJBase.CV
— Typecv = CV(; nfolds=6, shuffle=nothing, rng=nothing)
Cross-validation resampling strategy, for use in evaluate!
, evaluate
and tuning.
train_test_pairs(cv, rows)
Returns an nfolds
-length iterator of (train, test)
pairs of vectors (row indices), where each train
and test
is a sub-vector of rows
. The test
vectors are mutually exclusive and exhaust rows
. Each train
vector is the complement of the corresponding test
vector. With no row pre-shuffling, the order of rows
is preserved, in the sense that rows
coincides precisely with the concatenation of the test
vectors, in the order they are generated. The first r
test vectors have length n + 1
, where n, r = divrem(length(rows), nfolds)
, and the remaining test vectors have length n
.
Pre-shuffling of rows
is controlled by rng
and shuffle
. If rng
is an integer, then the CV
keyword constructor resets it to MersenneTwister(rng)
. Otherwise some AbstractRNG
object is expected.
If rng
is left unspecified, rng
is reset to Random.GLOBAL_RNG
, in which case rows are only pre-shuffled if shuffle=true
is explicitly specified.
MLJBase.Holdout
— Typeholdout = Holdout(; fraction_train=0.7,
+ shuffle=nothing,
+ rng=nothing)
Holdout resampling strategy, for use in evaluate!
, evaluate
and in tuning.
train_test_pairs(holdout, rows)
Returns the pair [(train, test)]
, where train
and test
are vectors such that rows=vcat(train, test)
and length(train)/length(rows)
is approximatey equal to fraction_train`.
Pre-shuffling of rows
is controlled by rng
and shuffle
. If rng
is an integer, then the Holdout
keyword constructor resets it to MersenneTwister(rng)
. Otherwise some AbstractRNG
object is expected.
If rng
is left unspecified, rng
is reset to Random.GLOBAL_RNG
, in which case rows are only pre-shuffled if shuffle=true
is specified.
MLJBase.PerformanceEvaluation
— TypePerformanceEvaluation
Type of object returned by evaluate
(for models plus data) or evaluate!
(for machines). Such objects encode estimates of the performance (generalization error) of a supervised model or outlier detection model.
When evaluate
/evaluate!
is called, a number of train/test pairs ("folds") of row indices are generated, according to the options provided, which are discussed in the evaluate!
doc-string. Rows correspond to observations. The generated train/test pairs are recorded in the train_test_rows
field of the PerformanceEvaluation
struct, and the corresponding estimates, aggregated over all train/test pairs, are recorded in measurement
, a vector with one entry for each measure (metric) recorded in measure
.
When displayed, a PerformanceEvalution
object includes a value under the heading 1.96*SE
, derived from the standard error of the per_fold
entries. This value is suitable for constructing a formal 95% confidence interval for the given measurement
. Such intervals should be interpreted with caution. See, for example, Bates et al. (2021).
Fields
These fields are part of the public API of the PerformanceEvaluation
struct.
model
: model used to create the performance evaluation. In the case a tuning model, this is the best model found.
measure
: vector of measures (metrics) used to evaluate performance
measurement
: vector of measurements - one for each element of measure
- aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measure m
is StatisticalMeasuresBase.external_aggregation_mode(m)
(commonly Mean()
or Sum()
)
operation
(e.g., predict_mode
): the operations applied for each measure to generate predictions to be evaluated. Possibilities are: predict
, predict_mean
, predict_mode
, predict_median
, or predict_joint
.
per_fold
: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.
per_observation
: a vector of vectors of vectors containing individual per-observation measurements: for an evaluation e
, e.per_observation[m][f][i]
is the measurement for the i
th observation in the f
th test fold, evaluated using the m
th measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measure measure
is repeated across all observations in a fold if StatisticalMeasures.can_report_unaggregated(measure) == true
. If e
has been computed with the per_observation=false
option, then e_per_observation
is a vector of missings
.
fitted_params_per_fold
: a vector containing fitted params(mach)
for each machine mach
trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.
report_per_fold
: a vector containing report(mach)
for each machine mach
training in resampling - one machine per train/test pair.
train_test_rows
: a vector of tuples, each of the form (train, test)
, where train
and test
are vectors of row (observation) indices for training and evaluation respectively.
resampling
: the resampling strategy used to generate the train/test pairs.
repeats
: the number of times the resampling strategy was repeated.
MLJBase.Resampler
— Typeresampler = Resampler(
+ model=ConstantRegressor(),
+ resampling=CV(),
+ measure=nothing,
+ weights=nothing,
+ class_weights=nothing
+ operation=predict,
+ repeats = 1,
+ acceleration=default_resource(),
+ check_measure=true,
+ per_observation=true,
+ logger=nothing,
+)
Resampling model wrapper, used internally by the fit
method of TunedModel
instances and IteratedModel
instances. See `evaluate! for options. Not intended for use by general user, who will ordinarily use evaluate!
directly.
Given a machine mach = machine(resampler, args...)
one obtains a performance evaluation of the specified model
, performed according to the prescribed resampling
strategy and other parameters, using data args...
, by calling fit!(mach)
followed by evaluate(mach)
.
On subsequent calls to fit!(mach)
new train/test pairs of row indices are only regenerated if resampling
, repeats
or cache
fields of resampler
have changed. The evolution of an RNG field of resampler
does not constitute a change (==
for MLJType
objects is not sensitive to such changes; see is_same_except
).
If there is single train/test pair, then warm-restart behavior of the wrapped model resampler.model
will extend to warm-restart behaviour of the wrapper resampler
, with respect to mutations of the wrapped model.
The sample weights
are passed to the specified performance measures that support weights for evaluation. These weights are not to be confused with any weights bound to a Resampler
instance in a machine, used for training the wrapped model
when supported.
The sample class_weights
are passed to the specified performance measures that support per-class weights for evaluation. These weights are not to be confused with any weights bound to a Resampler
instance in a machine, used for training the wrapped model
when supported.
MLJBase.StratifiedCV
— Typestratified_cv = StratifiedCV(; nfolds=6,
+ shuffle=false,
+ rng=Random.GLOBAL_RNG)
Stratified cross-validation resampling strategy, for use in evaluate!
, evaluate
and in tuning. Applies only to classification problems (OrderedFactor
or Multiclass
targets).
train_test_pairs(stratified_cv, rows, y)
Returns an nfolds
-length iterator of (train, test)
pairs of vectors (row indices) where each train
and test
is a sub-vector of rows
. The test
vectors are mutually exclusive and exhaust rows
. Each train
vector is the complement of the corresponding test
vector.
Unlike regular cross-validation, the distribution of the levels of the target y
corresponding to each train
and test
is constrained, as far as possible, to replicate that of y[rows]
as a whole.
The stratified train_test_pairs
algorithm is invariant to label renaming. For example, if you run replace!(y, 'a' => 'b', 'b' => 'a')
and then re-run train_test_pairs
, the returned (train, test)
pairs will be the same.
Pre-shuffling of rows
is controlled by rng
and shuffle
. If rng
is an integer, then the StratifedCV
keywod constructor resets it to MersenneTwister(rng)
. Otherwise some AbstractRNG
object is expected.
If rng
is left unspecified, rng
is reset to Random.GLOBAL_RNG
, in which case rows are only pre-shuffled if shuffle=true
is explicitly specified.
MLJBase.TimeSeriesCV
— Typetscv = TimeSeriesCV(; nfolds=4)
Cross-validation resampling strategy, for use in evaluate!
, evaluate
and tuning, when observations are chronological and not expected to be independent.
train_test_pairs(tscv, rows)
Returns an nfolds
-length iterator of (train, test)
pairs of vectors (row indices), where each train
and test
is a sub-vector of rows
. The rows are partitioned sequentially into nfolds + 1
approximately equal length partitions, where the first partition is the first train set, and the second partition is the first test set. The second train set consists of the first two partitions, and the second test set consists of the third partition, and so on for each fold.
The first partition (which is the first train set) has length n + r
, where n, r = divrem(length(rows), nfolds + 1)
, and the remaining partitions (all of the test folds) have length n
.
Examples
julia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)
+3-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:
+ (1:4, 5:6)
+ (1:6, 7:8)
+ (1:8, 9:10)
+
+julia> model = (@load RidgeRegressor pkg=MultivariateStats verbosity=0)();
+
+julia> data = @load_sunspots;
+
+julia> X = (lag1 = data.sunspot_number[2:end-1],
+ lag2 = data.sunspot_number[1:end-2]);
+
+julia> y = data.sunspot_number[3:end];
+
+julia> tscv = TimeSeriesCV(nfolds=3);
+
+julia> evaluate(model, X, y, resampling=tscv, measure=rmse, verbosity=0)
+┌───────────────────────────┬───────────────┬────────────────────┐
+│ _.measure │ _.measurement │ _.per_fold │
+├───────────────────────────┼───────────────┼────────────────────┤
+│ RootMeanSquaredError @753 │ 21.7 │ [25.4, 16.3, 22.4] │
+└───────────────────────────┴───────────────┴────────────────────┘
+_.per_observation = [missing]
+_.fitted_params_per_fold = [ … ]
+_.report_per_fold = [ … ]
+_.train_test_rows = [ … ]
MLJBase.evaluate!
— Methodevaluate!(mach; resampling=CV(), measure=nothing, options...)
Estimate the performance of a machine mach
wrapping a supervised model in data, using the specified resampling
strategy (defaulting to 6-fold cross-validation) and measure
, which can be a single measure or vector. Returns a PerformanceEvaluation
object.
Available resampling strategies are CV
, Holdout
, StratifiedCV
and TimeSeriesCV
. If resampling
is not an instance of one of these, then a vector of tuples of the form (train_rows, test_rows)
is expected. For example, setting
resampling = [((1:100), (101:200)),
+ ((101:200), (1:100))]
gives two-fold cross-validation using the first 200 rows of data.
Any measure conforming to the StatisticalMeasuresBase.jl API can be provided, assuming it can consume multiple observations.
Although evaluate!
is mutating, mach.model
and mach.args
are not mutated.
Additional keyword options
rows
- vector of observation indices from which both train and test folds are constructed (default is all observations)
operation
/operations=nothing
- One of predict
, predict_mean
, predict_mode
, predict_median
, or predict_joint
, or a vector of these of the same length as measure
/measures
. Automatically inferred if left unspecified. For example, predict_mode
will be used for a Multiclass
target, if model
is a probabilistic predictor, but measure
is expects literal (point) target predictions. Operations actually applied can be inspected from the operation
field of the object returned.
weights
- per-sample Real
weights for measures that support them (not to be confused with weights used in training, such as the w
in mach = machine(model, X, y, w)
).
class_weights
- dictionary of Real
per-class weights for use with measures that support these, in classification problems (not to be confused with weights used in training, such as the w
in mach = machine(model, X, y, w)
).
repeats::Int=1
: set to a higher value for repeated (Monte Carlo) resampling. For example, if repeats = 10
, then resampling = CV(nfolds=5, shuffle=true)
, generates a total of 50 (train, test)
pairs for evaluation and subsequent aggregation.
acceleration=CPU1()
: acceleration/parallelization option; can be any instance of CPU1
, (single-threaded computation), CPUThreads
(multi-threaded computation) or CPUProcesses
(multi-process computation); default is default_resource()
. These types are owned by ComputationalResources.jl.
force=false
: set to true
to force cold-restart of each training event
verbosity::Int=1
logging level; can be negative
check_measure=true
: whether to screen measures for possible incompatibility with the model. Will not catch all incompatibilities.
per_observation=true
: whether to calculate estimates for individual observations; if false
the per_observation
field of the returned object is populated with missing
s. Setting to false
may reduce compute time and allocations.
logger
- a logger object (see MLJBase.log_evaluation
)
See also evaluate
, PerformanceEvaluation
MLJBase.log_evaluation
— Methodlog_evaluation(logger, performance_evaluation)
Log a performance evaluation to logger
, an object specific to some logging platform, such as mlflow. If logger=nothing
then no logging is performed. The method is called at the end of every call to evaluate/evaluate!
using the logger provided by the logger
keyword argument.
Implementations for new logging platforms
Julia interfaces to workflow logging platforms, such as mlflow (provided by the MLFlowClient.jl interface) should overload log_evaluation(logger::LoggerType, performance_evaluation)
, where LoggerType
is a platform-specific type for logger objects. For an example, see the implementation provided by the MLJFlow.jl package.
MLJModelInterface.evaluate
— Methodevaluate(model, data...; cache=true, options...)
Equivalent to evaluate!(machine(model, data..., cache=cache); options...)
. See the machine version evaluate!
for the complete list of options.
Returns a PerformanceEvaluation
object.
See also evaluate!
.
Settings
This document was generated with Documenter.jl version 0.27.25 on Friday 1 March 2024. Using Julia version 1.10.1.
Settings
This document was generated with Documenter.jl version 0.27.25 on Friday 1 March 2024. Using Julia version 1.10.1.
Base.replace
— Methodreplace(mach::Machine, field1 => value1, field2 => value2, ...)
Private method.
Return a shallow copy of the machine mach
with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay
, which is always a new instance Channel{Bool}(1)
.
The following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):
```julia replace(mach, :args => (), :data => (), :dataresampleddata => (), :cache => nothing)
MLJBase.age
— Methodage(mach::Machine)
Return an integer representing the number of times mach
has been trained or updated. For more detail, see the discussion of training logic at fit_only!
.
MLJBase.ancestors
— Methodancestors(mach::Machine; self=false)
All ancestors of mach
, including mach
if self=true
.
MLJBase.default_scitype_check_level
— Functiondefault_scitype_check_level()
Return the current global default value for scientific type checking when constructing machines.
default_scitype_check_level(i::Integer)
Set the global default value for scientific type checking to i
.
The effect of the scitype_check_level
option in calls of the form machine(model, data, scitype_check_level=...)
is summarized below:
scitype_check_level | Inspect scitypes? | If Unknown in scitypes | If other scitype mismatch |
---|---|---|---|
0 | × | ||
1 (value at startup) | ✓ | warning | |
2 | ✓ | warning | warning |
3 | ✓ | warning | error |
4 | ✓ | error | error |
See also machine
MLJBase.fit_only!
— MethodMLJBase.fit_only!(
+ mach::Machine;
+ rows=nothing,
+ verbosity=1,
+ force=false,
+ composite=nothing,
+)
Without mutating any other machine on which it may depend, perform one of the following actions to the machine mach
, using the data and model bound to it, and restricting the data to rows
if specified:
Ab initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment mach.state
.
Training update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements MLJBase.update
). Increment mach.state
.
No-operation. Leave existing learned parameters untouched. Do not increment mach.state
.
If the model, model
, bound to mach
is a symbol, then instead perform the action using the true model given by getproperty(composite, model)
. See also machine
.
Training action logic
For the action to be a no-operation, either mach.frozen == true
or or none of the following apply:
(i) mach
has never been trained (mach.state == 0
).
(ii) force == true
.
(iii) The state
of some other machine on which mach
depends has changed since the last time mach
was trained (ie, the last time mach.state
was last incremented).
(iv) The specified rows
have changed since the last retraining and mach.model
does not have Static
type.
(v) mach.model
is a model and different from the last model used for training, but has the same type.
(vi) mach.model
is a model but has a type different from the last model used for training.
(vii) mach.model
is a symbol and (composite, mach.model)
is different from the last model used for training, but has the same type.
(viii) mach.model
is a symbol and (composite, mach.model)
has a different type from the last model used for training.
In any of the cases (i) - (iv), (vi), or (viii), mach
is trained ab initio. If (v) or (vii) is true, then a training update is applied.
To freeze or unfreeze mach
, use freeze!(mach)
or thaw!(mach)
.
Implementation details
The data to which a machine is bound is stored in mach.args
. Each element of args
is either a Node
object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source
node. In all cases, to obtain concrete data for actual training, each argument N
is called, as in N()
or N(rows=rows)
, and either MLJBase.fit
(ab initio training) or MLJBase.update
(training update) is dispatched on mach.model
and this data. See the "Adding models for general use" section of the MLJ documentation for more on these lower-level training methods.
MLJBase.freeze!
— Methodfreeze!(mach)
Freeze the machine mach
so that it will never be retrained (unless thawed).
See also thaw!
.
MLJBase.last_model
— Methodlast_model(mach::Machine)
Return the last model used to train the machine mach
. This is a bona fide model, even if mach.model
is a symbol.
Returns nothing
if mach
has not been trained.
MLJBase.machine
— Functionmachine(model, args...; cache=true, scitype_check_level=1)
Construct a Machine
object binding a model
, storing hyper-parameters of some machine learning algorithm, to some data, args
. Calling fit!
on a Machine
instance mach
stores outcomes of applying the algorithm in mach
, which can be inspected using fitted_params(mach)
(learned paramters) and report(mach)
(other outcomes). This in turn enables generalization to new data using operations such as predict
or transform
:
using MLJModels
+X, y = make_regression()
+
+PCA = @load PCA pkg=MultivariateStats
+model = PCA()
+mach = machine(model, X)
+fit!(mach, rows=1:50)
+transform(mach, selectrows(X, 51:100)) # or transform(mach, rows=51:100)
+
+DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
+model = DecisionTreeRegressor()
+mach = machine(model, X, y)
+fit!(mach, rows=1:50)
+predict(mach, selectrows(X, 51:100)) # or predict(mach, rows=51:100)
Specify cache=false
to prioritize memory management over speed.
When building a learning network, Node
objects can be substituted for the concrete data but no type or dimension checks are applied.
Checks on the types of training data
A model articulates its data requirements using scientific types, i.e., using the scitype
function instead of the typeof
function.
If scitype_check_level > 0
then the scitype of each arg
in args
is computed, and this is compared with the scitypes expected by the model, unless args
contains Unknown
scitypes and scitype_check_level < 4
, in which case no further action is taken. Whether warnings are issued or errors thrown depends the level. For details, see default_scitype_check_level
, a method to inspect or change the default level (1
at startup).
Machines with model placeholders
A symbol can be substituted for a model in machine constructors to act as a placeholder for a model specified at training time. The symbol must be the field name for a struct whose corresponding value is a model, as shown in the following example:
mutable struct MyComposite
+ transformer
+ classifier
+end
+
+my_composite = MyComposite(Standardizer(), ConstantClassifier)
+
+X, y = make_blobs()
+mach = machine(:classifier, X, y)
+fit!(mach, composite=my_composite)
The last two lines are equivalent to
mach = machine(ConstantClassifier(), X, y)
+fit!(mach)
Delaying model specification is used when exporting learning networks as new stand-alone model types. See prefit
and the MLJ documentation on learning networks.
See also fit!
, default_scitype_check_level
, MLJBase.save
, serializable
.
MLJBase.machine
— Methodmachine(file::Union{String, IO})
Rebuild from a file a machine that has been serialized using the default Serialization module.
MLJBase.report
— Methodreport(mach)
Return the report for a machine mach
that has been fit!
, for example the coefficients in a linear model.
This is a named tuple and human-readable if possible.
If mach
is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ...
, then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)
using MLJ
+@load LinearBinaryClassifier pkg=GLM
+X, y = @load_crabs;
+pipe = Standardizer() |> LinearBinaryClassifier()
+mach = machine(pipe, X, y) |> fit!
+
+julia> report(mach).linear_binary_classifier
+(deviance = 3.8893386087844543e-7,
+ dof_residual = 195.0,
+ stderror = [18954.83496713119, 6502.845740757159, 48484.240246060406, 34971.131004997274, 20654.82322484894, 2111.1294584763386],
+ vcov = [3.592857686311793e8 9.122732393971942e6 … -8.454645589364915e7 5.38856837634321e6; 9.122732393971942e6 4.228700272808351e7 … -4.978433790526467e7 -8.442545425533723e6; … ; -8.454645589364915e7 -4.978433790526467e7 … 4.2662172244975924e8 2.1799125705781363e7; 5.38856837634321e6 -8.442545425533723e6 … 2.1799125705781363e7 4.456867590446599e6],)
+
Additional keys, machines
and report_given_machine
, give a list of all machines in the underlying network, and a dictionary of reports keyed on those machines.
See also fitted_params
MLJBase.report_given_method
— Methodreport_given_method(mach::Machine)
Same as report(mach)
but broken down by the method (fit
, predict
, etc) that contributed the report.
A specialized method intended for learning network applications.
The return value is a dictionary keyed on the symbol representing the method (:fit
, :predict
, etc) and the values report contributed by that method.
MLJBase.restore!
— Functionrestore!(mach::Machine)
Restore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach
, one has mach.state=1
. Intended for restoring deserialized machine objects to a useable form.
For an example see serializable
.
MLJBase.serializable
— Methodserializable(mach::Machine)
Returns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.
Any general purpose Julia serializer may be applied to the output of serializable
(eg, JLSO, BSON, JLD) but you must call restore!(mach)
on the deserialised object mach
before using it. See the example below.
If using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save
(or MLJ.save
) method.
A machine returned by serializable
is characterized by the property mach.state == -1
.
Example using JLSO
using MLJ
+using JLSO
+Tree = @load DecisionTreeClassifier
+tree = Tree()
+X, y = @load_iris
+mach = fit!(machine(tree, X, y))
+
+# This machine can now be serialized
+smach = serializable(mach)
+JLSO.save("machine.jlso", :machine => smach)
+
+# Deserialize and restore learned parameters to useable form:
+loaded_mach = JLSO.load("machine.jlso")[:machine]
+restore!(loaded_mach)
+
+predict(loaded_mach, X)
+predict(mach, X)
See also restore!
, MLJBase.save
.
MLJBase.thaw!
— MethodMLJModelInterface.feature_importances
— Methodfeature_importances(mach::Machine)
Return a list of feature => importance
pairs for a fitted machine, mach
, for supported models. Otherwise return nothing
.
MLJModelInterface.fitted_params
— Methodfitted_params(mach)
Return the learned parameters for a machine mach
that has been fit!
, for example the coefficients in a linear model.
This is a named tuple and human-readable if possible.
If mach
is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ...
, then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)
using MLJ
+@load LogisticClassifier pkg=MLJLinearModels
+X, y = @load_crabs;
+pipe = Standardizer() |> LogisticClassifier()
+mach = machine(pipe, X, y) |> fit!
+
+julia> fitted_params(mach).logistic_classifier
+(classes = CategoricalArrays.CategoricalValue{String,UInt32}["B", "O"],
+ coefs = Pair{Symbol,Float64}[:FL => 3.7095037897680405, :RW => 0.1135739140854546, :CL => -1.6036892745322038, :CW => -4.415667573486482, :BD => 3.238476051092471],
+ intercept = 0.0883301599726305,)
Additional keys, machines
and fitted_params_given_machine
, give a list of all machines in the underlying network, and a dictionary of fitted parameters keyed on those machines.
See also report
MLJModelInterface.save
— MethodMLJ.save(filename, mach::Machine)
+MLJ.save(io, mach::Machine)
+
+MLJBase.save(filename, mach::Machine)
+MLJBase.save(io, mach::Machine)
Serialize the machine mach
to a file with path filename
, or to an input/output stream io
(at least IOBuffer
instances are supported) using the Serialization module.
To serialise using a different format, see serializable
.
Machines are deserialized using the machine
constructor as shown in the example below.
The implementation of
save
for machines changed in MLJ 0.18 (MLJBase 0.20). You can only restore a machine saved using older versions of MLJ using an older version.
Example
using MLJ
+Tree = @load DecisionTreeClassifier
+X, y = @load_iris
+mach = fit!(machine(Tree(), X, y))
+
+MLJ.save("tree.jls", mach)
+mach_predict_only = machine("tree.jls")
+predict(mach_predict_only, X)
+
+# using a buffer:
+io = IOBuffer()
+MLJ.save(io, mach)
+seekstart(io)
+predict_only_mach = machine(io)
+predict(predict_only_mach, X)
Maliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.
See also serializable
, machine
.
StatsAPI.fit!
— Methodfit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)
Fit the machine mach
. In the case that mach
has Node
arguments, first train all other machines on which mach
depends.
To attempt to fit a machine without touching any other machine, use fit_only!
. For more on options and the the internal logic of fitting see fit_only!
MLJBase._recursive_show
— Method_recursive_show(stream, object, current_depth, depth)
Generate a table of the properties of the MLJType
object, dislaying each property value by calling the method _show
on it. The behaviour of _show(stream, f)
is as follows:
f
is itself a MLJType
object, then its short form is shownand _recursive_show
generates as separate table for each of its properties (and so on, up to a depth of argument depth
).
f
is displayed as "(omitted T)" where T = typeof(f)
,unless istoobig(f)
is false (the istoobig
fall-back for arbitrary types being true
). In the latter case, the long (ie, MIME"plain/text") form of f
is shown. To override this behaviour, overload the _show
method for the type in question.
MLJBase.abbreviated
— Methodto display abbreviated versions of integers
MLJBase.color_off
— Methodcolor_off()
Suppress color and bold output at the REPL for displaying MLJ objects.
MLJBase.color_on
— Methodcolor_on()
Enable color and bold output at the REPL, for enhanced display of MLJ objects.
MLJBase.handle
— Methodreturn abbreviated object id (as string) or it's registered handle (as string) if this exists
MLJBase.@constant
— Macro@constant x = value
Private method (used in testing).
Equivalent to const x = value
but registers the binding thus:
MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x
Registered objects get displayed using the variable name to which it was bound in calls to show(x)
, etc.
WARNING: As with any const
declaration, binding x
to new value of the same type is not prevented and the registration will not be updated.
MLJBase.@more
— Macro@more
Entered at the REPL, equivalent to show(ans, 100)
. Use to get a recursive description of all properties of the last REPL value.
MLJBase._permute_rows
— Methodpermuterows(obj, perm)
Internal function to return a vector or matrix with permuted rows given the permutation perm
.
MLJBase.available_name
— Methodavailable_name(modl::Module, name::Symbol)
Function to replace, if necessary, a given name
with a modified one that ensures it is not the name of any existing object in the global scope of modl
. Modifications are created with numerical suffixes.
MLJBase.check_same_nrows
— Methodcheck_same_nrows(X, Y)
Internal function to check two objects, each a vector or a matrix, have the same number of rows.
MLJBase.chunks
— Methodchunks(range, n)
Split an AbstractRange
into n
subranges of approximately equal length.
Example
julia> collect(chunks(1:5, 2))
+2-element Array{UnitRange{Int64},1}:
+ 1:3
+ 4:5
+
+**Private method**
+
MLJBase.flat_values
— Methodflat_values(t::NamedTuple)
View a nested named tuple t
as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.
julia> t = (X = (x = 1, y = 2), Y = 3)
+julia> flat_values(t)
+(1, 2, 3)
MLJBase.generate_name!
— Methodgenerate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)
Given a type M
(e.g., MyEvenInteger{N}
) return a symbolic, snake-case, representation of the type name (such as my_even_integer
). The symbol is pushed to existing_names
, which must be an AbstractVector
to which a Symbol
can be pushed.
If the snake-case representation already exists in existing_names
a suitable integer is appended to the name.
If only
is specified, then the operation is restricted to those M
for which M isa only
. In all other cases the symbolic name is generated using substitute
as the base symbol.
existing_names = []
+julia> generate_name!(Vector{Int}, existing_names)
+:vector
+
+julia> generate_name!(Vector{Int}, existing_names)
+:vector2
+
+julia> generate_name!(AbstractFloat, existing_names)
+:abstract_float
+
+julia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)
+:not_array
+
+julia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)
+:not_array2
MLJBase.guess_model_target_observation_scitype
— Methodguess_model_targetobservation_scitype(model)
Private method
Try to infer a lowest upper bound on the scitype of target observations acceptable to model
, by inspecting target_scitype(model)
. Return Unknown
if unable to draw reliable inferrence.
The observation scitype for a table is here understood as the scitype of a row converted to a vector.
MLJBase.guess_observation_scitype
— Methodguess_observation_scitype(y)
Private method.
If y
is an AbstractArray
, return the scitype of y[:, :, ..., :, 1]
. If y
is a table, return the scitype of the first row, converted to a vector, unless this row has missing
elements, in which case return Unknown
.
In all other cases, Unknown
.
julia> guess_observation_scitype([missing, 1, 2, 3])
+Union{Missing, Count}
+
+julia> guess_observation_scitype(rand(3, 2))
+AbstractVector{Continuous}
+
+julia> guess_observation_scitype((x=rand(3), y=rand(Bool, 3)))
+AbstractVector{Union{Continuous, Count}}
+
+julia> guess_observation_scitype((x=[missing, 1, 2], y=[1, 2, 3]))
+Unknown
MLJBase.init_rng
— Methodinit_rng(rng)
Create an AbstractRNG
from rng
. If rng
is a non-negative Integer
, it returns a MersenneTwister
random number generator seeded with rng
; If rng
is an AbstractRNG
object it returns rng
, otherwise it throws an error.
MLJBase.observation
— Methodobservation(S)
Private method.
Tries to infer the per-observation scitype from the scitype of S
, when S
is known to be the scitype of some container with multiple observations; here we view the scitype for one row of a table to be the scitype of the row converted to a vector. Return Unknown
if unable to draw reliable inferrence.
The observation scitype for a table is here understood as the scitype of a row converted to a vector.
MLJBase.prepend
— MethodMLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})
For prepending symbols in expressions like :(y.w)
and :(x1.x2.x3)
.
julia> prepend(:x, :y) :(x.y)
julia> prepend(:x, :(y.z)) :(x.y.z)
julia> prepend(:w, ans) :(w.x.y.z)
If the second argument is nothing
, then nothing
is returned.
MLJBase.recursive_getproperty
— Methodrecursive_getproperty(object, nested_name::Expr)
Call getproperty recursively on object
to extract the value of some nested property, as in the following example:
julia> object = (X = (x = 1, y = 2), Y = 3)
+julia> recursive_getproperty(object, :(X.y))
+2
MLJBase.recursive_setproperty!
— Methodrecursively_setproperty!(object, nested_name::Expr, value)
Set a nested property of an object
to value
, as in the following example:
julia> mutable struct Foo
+ X
+ Y
+ end
+
+julia> mutable struct Bar
+ x
+ y
+ end
+
+julia> object = Foo(Bar(1, 2), 3)
+Foo(Bar(1, 2), 3)
+
+julia> recursively_setproperty!(object, :(X.y), 42)
+42
+
+julia> object
+Foo(Bar(1, 42), 3)
MLJBase.sequence_string
— Methodsequence_string(itr, n=3)
Return a "sequence" string from the first n
elements generated by itr
.
julia> MLJBase.sequence_string(1:10, 4)
+"1, 2, 3, 4, ..."
Private method.
MLJBase.shuffle_rows
— Methodshuffle_rows(X::AbstractVecOrMat,
+ Y::AbstractVecOrMat;
+ rng::AbstractRNG=Random.GLOBAL_RNG)
Return row-shuffled vectors or matrices using a random permutation of X
and Y
. An optional random number generator can be specified using the rng
argument.
MLJBase.unwind
— Methodunwind(iterators...)
Represent all possible combinations of values generated by iterators
as rows of a matrix A
. In more detail, A
has one column for each iterator in iterators
and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.
Example
julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
+julia> MLJTuning.unwind(iterators...)
+12×3 Array{Any,2}:
+ 1 "a" "x"
+ 2 "a" "x"
+ 1 "b" "x"
+ 2 "b" "x"
+ 1 "a" "y"
+ 2 "a" "y"
+ 1 "b" "y"
+ 2 "b" "y"
+ 1 "a" "z"
+ 2 "a" "z"
+ 1 "b" "z"
+ 2 "b" "z"
Settings
This document was generated with Documenter.jl version 0.27.25 on Friday 1 March 2024. Using Julia version 1.10.1.