You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Number of Random Forests to build. In most situations after 100 it does not improve dramatically more (int) .
trees
Number of trees in each Forest. The default is 1 which basically connotes a adatreeclassifier (int).
weight_thresold
Affects the weight (importance) of each new estimator via setting this initial threshold. This may be regarded as a shrinkage parameter. Needs to be between 0 and 1 (double). This is important.
max_depth
Maximum depth of the tree (double). This is important.
Objective
The objective to optimise in split. It may be “ENTROPY “, “GINI” or “AUC”. ENTROPY (default) almost always performs best. This is important.
row_subsample
Proportion of observations to consider (double). This is important.
max_features
Proportion of columns (features) to consider in each level (double). This is important.
cut_off_subsample
Proportion of best cut offs to consider. This controls how Extremely Randomized the tree will be (double).
feature_subselection
Proportion of columns (features) to consider for the whole tree (double).
min_leaf
Minimum weighted sum to keep after splitting node (double).
min_split
Minimum weighted sum to split a node (double).
rounding
Digits of rounding to prevent overfitting. It could help in certain situations(double).
max_tree_size
Maximum number of nodes allowed (int)
offset
Adds a constant when calculating the objective in a split. It prevents overfitting (double).
Number of Random Forests to build. In most situations after 100 it does not improve dramatically more (int) .
trees
Number of trees in each Forest. The default is 1 which basically connotes a adatreeclassifier (int).
shrinkage
Penalty applied to each estimator . Smaller values prevent overfitting. Needs to be between 0 and 1 (double). There is also a fairly linear negative correlation between estimators and shrinkage. This is important.
max_depth
Maximum depth of the tree (double). This is important.
Objective
The objective to optimise inside the split. It may be “RMSE“ or “MAE”. Bear in mind the underlying estimators are regressors.
row_subsample
Proportion of observations to consider (double). This is important.
max_features
Proportion of columns (features) to consider in each level (double). This is important.
cut_off_subsample
Proportions of best cut offs to consider. This controls how Extremely Randomized the tree will be. Very low value means only a few cut-offs are explored (double).
feature_subselection
Proportions of columns (features) to consider for the whole tree (double).
min_leaf
Minimum weighted sum to keep after splitting node (double).
min_split
Minimum weighted sum to split a node (double).
rounding
Digits of rounding to prevent overfitting. It could help in certain situations (double).
max_tree_size
Maximum number of nodes allowed (int) .
offset
Adds a constant when calculating the objective in a split. It prevents overfitting (double).
Regularization value, the more, the stronger the regularization(double). This is important.
l1C
L1 Regularization C value for FTRL Type (double).
Type
Can be one of “Liblinear”, “Routine”, “SGD”, “FTRL”. Default is Liblinear. SGD and FTRL use adagrad. Routine is based on Matrix multiplications and the Newton-Raphson method.
RegularizationType
Can be either "L2" or “L1”. Default is “L2”. “L1” is only supported via Liblnear and FTRL. This is important.
Toral Comma-separated integers defining the number of hidden layers along with teh hidden units. This is important
droupouts
Toral Comma-separated floats defining the dropout in each layer (defined by hidden). This is important
l2
Toral Comma-separated floats defining the l2 regularization term on the weights in each layer (defined by hidden). This is important
activation
Toral Comma-separated strings defining the activation in each hidden layer. This is important
lr
The learning rate used. This is important
epochs
Maximum number of iterations. This is important
batch_normalization
true to add a batch normlization to the layers. This is important
batch_size
Number of cases(samples) in a batch. This is important
weight_init
The distribution from which initial weights are to be drawn. Has to be RandomNormal, RandomUniform, TruncatedNormal, VarianceScaling, Orthogonal, Identity, lecun_uniform, glorot_normal,glorot_uniform, he_normal, lecun_normal, he_uniform, he_normal.
optimizer
Has to be adam, adagrad, nadam, adadelta or sgd.
loss
Has to be categorical_crossentropy, categorical_hinge, logcosh, Kullback–Leibler divergence.
momentum
Only applicable for optimizer=sgd. Nesterov's is on by default.
shuffle
true Enable shuffling of training data (on each epoc).
standardize
true to standardize dense data (e.g use_dense=true) or to use absmaxscaling on sparse data (use_dense=false).
use_log1p
converts the data matrix to log plus 1.
validation_split
Split percentage to use for early stopping. Best epoc is determived via cv after 2 consecutive worse loss estimateds.
stopping_rounds
Split percentage to use for early stopping. Best epoc is determived via cv after 2 consecutive worse loss estimateds.
use_dense
True to Use dense data. If your data is in dense format, do select true as all files get loaded as sparse by default in python-based modules.
PythonGenericClassifier
The user can run his/her own python script as long as it is placed in lib/python/ and named after PythonGenericClassifier[INDEX]. Index will be a hyper parameter. Look for PythonGenericClassifier0.py in lib/python/ for an example.
activation functions. Has to be between 'Rectifier', 'Tanh', 'ExpRectifier' or 'Maxout'
adaptive_rate
true to use The implemented adaptive learning rate algorithm (ADADELTA) which automatically combines the benefits of learning rate annealing and momentum training to avoid slow convergence.
rho
The first of two hyper parameters for ADADELTA. It is like momentum. This is important
epsilon
The second of two hyper parameters for ADADELTA. This is important
balance_classes
Specify whether to oversample the minority classes to balance the class distribution.
dropouts
dropout ratios for each hidden layer,comma separated .Has to match in length the 'hidden' parameter. This is important
epochs
Number of iterations to train the DL model. This is important
fast_mode
True for faster convergence (but potential loss in accuracy)
hidden
Number of hidden neurons, comma separated.The length connotes the number of hidden layers too. This is important
input_dropout_ratio
dropout from to the input layer
l1
regularization on the weights.
l2
regularization on the weights. This is important
max_w2
A maximum on the sum of the squared incoming weights into any one neuron.
mini_batch_size
minimum number of cases in batch.
momentum_ramp
The momentum_ramp parameter controls the amount of learning for which momentum increases (assuming momentum_stable is larger than momentum_start).
momentum_stable
The momentum_stable parameter controls the final momentum value reached after momentum_ramp training samples.
momentum_start
The momentum_start parameter controls the amount of momentum at the beginning of training.
nesterov_accelerated_gradient
True to enable Nesterov accelerated gradient descent method.
rate
When adaptive learning rate is disabled, the magnitude of the weight updates are determined by the user specified learning rate (potentially annealed), and are a function of the difference between the predicted value and the target value.
rate_annealing
Learning rate annealing reduces the learning rate to “freeze” into local minima in the optimization landscape.
rate_decay
The learning rate decay parameter controls the change of learning rate across layers.
sample_rate
Proportions of rows consider in each epoc.
shuffle
true to enable shuffling of training data (on each node).
tandardize
true to standardize the input data.
weight_init
The distribution from which initial weights are to be drawn. Has to be 'UniformAdaptive', 'Uniform' or 'Normal'
Number of iterations to build the model. This is important
beta_epsilon
tolerance of the coefficients
bjective_epsilon
tolerance of the objective function
balance_classes
true to Specify whether to oversample the minority classes to balance the class distribution.
standardize
true to standardize input features or not
OriginalLibFMClassifier
Wraps the original implementation of ibFM, made from Steffen Rendle. The reason this implementation is made, is because internal results show that it has better performance (as in accuracy) than StackNet's internal implementation and has more training methods than just sgd.
This implementation may not include all libFM features plus it actually uses a version of it that had a bug(!) on purpose. You can find more information about why this was chosen in the following python wrapper for libFM. It basically had this bug that was allowing you to get the parameters of the trained models for all training methods. These parameters are now extracted once a model has been trained and the scoring uses only these parameters (e.g. not the libFM executable).
Don't forget to acknowledge libFM if you publish results produced with this software. Also take note of its licence GNU. More information can be found on libFM's repo on github.
Number of hidden units to use in a sigmoidal feedforward network with nn hidden units
initial_t
Initial t value. Affects learning rate's updates
power_t
t power value. Affects learning rate's updates
ftrl_alpha
ftrl alpha parameter when using ftrl This is important
ftrl_beta
ftrl beta stability patameter when using ftrl This is important
learning_rate
learning Rate for gradient-based updates
l1
L1 regularization
l2
L2 regularization This is important
use_ftrl
true to use the ftrl optimization option (instead of adaptive). It is on by default.
make2way
if true it creates all possible 2-way interactions of all features
make3way
if true it creates all possible 3-way interactions of all features
use_dropout
when nn>0, train or test sigmoidal feedforward network using dropout.
use_meanfield
when nn>0, train or test sigmoidal feedforward network using mean field.
libffmClassifier
Wraps Libffm. Note this method either requires the user to manually add comma separated indices that form a field or they need to use some self-made heuristics. This is controlled by parameter opt.
true to allow instance-wise normalization. This is important
opt
method for determining the factors. The best way (but not the default) is to provide a list with comma separated indices. Consider this String '1,4,7,123,546'. This would mean that the 0 column is a field on its own, {1,2,3} form another field, {4,5,6} another. {7,8...122} form another field and so on. Another possible value is 'no_order' (default). This looks at the proportion of zeros in neighbouring columns to determine if they form a field. The last possible value is 'order'. This calculates frequencies of non-zero values for all columns and then orders them based on frequency. Columns that have a few missing values form their own fields. Weaker columns (frequency-wise) are joined together to form fields.
Number of Random Forests to build. In most situations after 100 it does not improve dramatically more (int) .
trees
Number of trees in each Forest. The default is 1 which basically connotes a adatreeregressor (int).
weight_thresold
Affects the weight (importance) of each new estimator via setting this initial threshold. This may be regarded as a shrinkage parameter. Needs to be positive (double). This is important.
max_depth
Maximum depth of the tree (double). This is important.
Objective
The objective to optimise in split. It may be “RMSE “ or “MAE”.
row_subsample
Proportion of observations to consider (double). This is important.
max_features
Proportion of columns (features) to consider in each level (double). This is important.
cut_off_subsample
Proportion of best cut offs to consider. This controls how Extremely Randomized the tree will be (double).
feature_subselection
Proportion of columns (features) to consider for the whole tree (double).
min_leaf
Minimum weighted sum to keep after splitting node (double).
min_split
Minimum weighted sum to split a node (double).
rounding
Digits of rounding to prevent overfitting. It could help in certain situations(double).
max_tree_size
Maximum number of nodes allowed (int)
offset
Adds a constant when calculating the objective in a split. It prevents overfitting (double).
Number of Random Forests to build. In most situations after 100 it does not improve dramatically more (int) .
trees
Number of trees in each Forest. The default is 1 which basically connotes a adatreeclassifier (int).
shrinkage
Penalty applied to each estimator . Smaller values prevent overfitting. Needs to be between 0 and 1 (double). There is also a fairly linear negative correlation between estimators and shrinkage. This is important.
max_depth
Maximum depth of the tree (double). This is important.
Objective
The objective to optimise inside the split. It may be “RMSE“ or “MAE”.
row_subsample
Proportion of observations to consider (double). This is important.
max_features
Proportion of columns (features) to consider in each level (double). This is important.
cut_off_subsample
Proportions of best cut offs to consider. This controls how Extremely Randomized the tree will be. Very low value means only a few cut-offs are explored (double).
feature_subselection
Proportions of columns (features) to consider for the whole tree (double).
min_leaf
Minimum weighted sum to keep after splitting node (double).
min_split
Minimum weighted sum to split a node (double).
rounding
Digits of rounding to prevent overfitting. It could help in certain situations (double).
max_tree_size
Maximum number of nodes allowed (int) .
offset
Adds a constant when calculating the objective in a split. It prevents overfitting (double).
Regularization value, the more, the stronger the regularization(double). A value here basically triggers a Ridge regression. This is important.
l1C
L1 Regularization C value for FTRL Type (double).
Type
Can be one of “Routine”, “SGD” or “FTRL”. SGD and FTRL use adagrad. Routine is the Ordinary Least Squares method which is solved with matrix multiplications.
Penalty applied to each estimator. Needs to be between 0 and 1 (double). This is important.
max_depth
Maximum depth of the tree (int). This is important.
Objective
Can be one of ['reg:linear','count:poisson','reg:gamma' ,'rank:pairwise','reg:tweedie']. Note that rank:pairwise is not a regressor but its output was more convenient for a regerssion method.
subsample
Proportion of observations to consider (double). This is important.
colsample_bylevel
Proportion of columns (features) to consider in each level (double).
colsample_bytree
Proportion of columns (features) to consider in each Tree (double) This is important.
max_delta_step
controls optimization step (double).
gamma
controls minimum change requirements in loss to allow for a split (double).
Proportions of columns (features) to consider at each level of a given tree. This is important
learn_rate
weight on each estimator. This is important
max_depth
maximum depth of the tree. This is important
ntrees
Number of trees to build This is important
sample_rate
Proportions of rows consider This is important
col_sample_rate_per_tree
Proportions of columns (features) to consider within a tree.
balance_classes
whether to oversample the minority classes to balance the class distribution.
min_rows
minimum number of cases in a node.
nbins
The number of bins for the histogram to build.
tweedie_power
Only applicable if Tweedie is specified for distribution) Specify the Tweedie power. The range is from 1 to 2. For a normal distribution, enter 0. For Poisson distribution, enter 1. For a gamma distribution, enter 2. For a compound Poisson-gamma distribution, enter a value greater than 1 but less than 2.
quantile_alpha
Only applicable if Quantile is specified for distribution) Specify the quantile to be used for Quantile Regression.
objective
The objective has to be one of [auto, gamma gaussian huber laplace poisson quantile tweedie].
activation functions. Has to be between 'Rectifier', 'Tanh', 'ExpRectifier' or 'Maxout'
adaptive_rate
true to use The implemented adaptive learning rate algorithm (ADADELTA) which automatically combines the benefits of learning rate annealing and momentum training to avoid slow convergence.
rho
The first of two hyper parameters for ADADELTA. It is like momentum. This is important
epsilon
The second of two hyper parameters for ADADELTA. This is important
balance_classes
Specify whether to oversample the minority classes to balance the class distribution.
dropouts
dropout ratios for each hidden layer,comma separated .Has to match in length the 'hidden' parameter. This is important
epochs
Number of iterations to train the DL model. This is important
fast_mode
True for faster convergence (but potential loss in accuracy)
hidden
Number of hidden neurons, comma separated.The length connotes the number of hidden layers too. This is important
input_dropout_ratio
dropout from to the input layer
l1
regularization on the weights.
l2
regularization on the weights. This is important
max_w2
A maximum on the sum of the squared incoming weights into any one neuron.
mini_batch_size
minimum number of cases in batch.
momentum_ramp
The momentum_ramp parameter controls the amount of learning for which momentum increases (assuming momentum_stable is larger than momentum_start).
momentum_stable
The momentum_stable parameter controls the final momentum value reached after momentum_ramp training samples.
momentum_start
The momentum_start parameter controls the amount of momentum at the beginning of training.
nesterov_accelerated_gradient
True to enable Nesterov accelerated gradient descent method.
rate
When adaptive learning rate is disabled, the magnitude of the weight updates are determined by the user specified learning rate (potentially annealed), and are a function of the difference between the predicted value and the target value.
rate_annealing
Learning rate annealing reduces the learning rate to “freeze” into local minima in the optimization landscape.
rate_decay
The learning rate decay parameter controls the change of learning rate across layers.
sample_rate
Proportions of rows consider in each epoc.
shuffle
true to enable shuffling of training data (on each node).
tandardize
true to standardize the input data.
weight_init
The distribution from which initial weights are to be drawn. Has to be 'UniformAdaptive', 'Uniform' or 'Normal'
tweedie_power
Only applicable if Tweedie is specified for distribution) Specify the Tweedie power. The range is from 1 to 2. For a normal distribution, enter 0. For Poisson distribution, enter 1. For a gamma distribution, enter 2. For a compound Poisson-gamma distribution, enter a value greater than 1 but less than 2.
quantile_alpha
Only applicable if Quantile is specified for distribution) Specify the quantile to be used for Quantile Regression.
objective
The objective has to be of [auto, gamma ,gaussian ,huber ,laplace ,poisson ,quantile ,tweedie].
loss
The loss has to be one of [Automatic ,Absolute, Huber, Quadratic or Quantile]
Proportions of columns (features) to consider within a tree.
balance_classes
whether to oversample the minority classes to balance the class distribution.
min_rows
minimum number of cases in a node.
nbins
The number of bins for the histogram to build.
tweedie_power
Only applicable if Tweedie is specified for distribution) Specify the Tweedie power. The range is from 1 to 2. For a normal distribution, enter 0. For Poisson distribution, enter 1. For a gamma distribution, enter 2. For a compound Poisson-gamma distribution, enter a value greater than 1 but less than 2.
quantile_alpha
Only applicable if Quantile is specified for distribution) Specify the quantile to be used for Quantile Regression.
objective
The objective has to be one of [auto, ,gamma ,gaussian ,huber ,laplace ,poisson ,quantile ,tweedie].
Number of iterations to build the model. This is important
beta_epsilon
tolerance of the coefficients
bjective_epsilon
tolerance of the objective function
balance_classes
true to Specify whether to oversample the minority classes to balance the class distribution.
standardize
true to standardize input features or not
tweedie_power
Only applicable if Tweedie is specified for distribution) Specify the Tweedie power. The range is from 1 to 2. For a normal distribution, enter 0. For Poisson distribution, enter 1. For a gamma distribution, enter 2. For a compound Poisson-gamma distribution, enter a value greater than 1 but less than 2.
quantile_alpha
Only applicable if Quantile is specified for distribution) Specify the quantile to be used for Quantile Regression.
family
The family has to be one of [auto, gamma ,gaussian ,poisson ,tweedie]
link
The link has to be one of [auto, log ,identity ,inverse ,tweedie]
The (initial) learning rate used. This is important
learning_rate
Could be optimal, constant or invscaling.
loss
could be squared_loss, huber, epsilon_insensitive or squared_epsilon_insensitive.
epsilon
For huber, determines the threshold at which it becomes less important to get the prediction right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this.
l1_ratio
The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1.
penalty
The penalty (aka regularization term) to be used. could be l2, l1, or elasticnet .
power_t
The exponent for inverse scaling learning rate [default 0.5].
shuffle
true Enable shuffling of training data (on each iteration).
standardize
true to standardize dense data (e.g use_dense=true) or to use absmaxscaling on sparse data (use_dense=false).
use_log1p
converts the data matrix to log plus 1.
use_dense
True to Use dense data. If your data is in dense format, do select true as all files get loaded as sparse by default in python-based modules.
Toral Comma-separated integers defining the number of hidden layers along with teh hidden units. This is important
droupouts
Toral Comma-separated floats defining the dropout in each layer (defined by hidden). This is important
l2
Toral Comma-separated floats defining the l2 regularization term on the weights in each layer (defined by hidden). This is important
activation
Toral Comma-separated strings defining the activation in each hidden layer. This is important
lr
The learning rate used. This is important
epochs
Maximum number of iterations. This is important
batch_normalization
true to add a batch normlization to the layers. This is important
batch_size
Number of cases(samples) in a batch. This is important
weight_init
The distribution from which initial weights are to be drawn. Has to be RandomNormal, RandomUniform, TruncatedNormal, VarianceScaling, Orthogonal, Identity, lecun_uniform, glorot_normal,glorot_uniform, he_normal, lecun_normal, he_uniform, he_normal.
optimizer
Has to be adam, adagrad, nadam, adadelta or sgd.
loss
Has to be mean_squared_error, mean_absolute_error, mean_squared_logarithmic_error, squared_hinge, hinge, poisson.
momentum
Only applicable for optimizer=sgd. Nesterov's is on by default.
shuffle
true Enable shuffling of training data (on each epoc).
standardize
true to standardize dense data (e.g use_dense=true) or to use absmaxscaling on sparse data (use_dense=false).
use_log1p
converts the data matrix to log plus 1.
validation_split
Split percentage to use for early stopping. Best epoc is determived via cv after 2 consecutive worse loss estimateds.
stopping_rounds
Split percentage to use for early stopping. Best epoc is determived via cv after 2 consecutive worse loss estimateds.
use_dense
True to Use dense data. If your data is in dense format, do select true as all files get loaded as sparse by default in python-based modules.
PythonGenericRegressor
The user can run his/her own python script as long as it is placed in lib/python/ and named after PythonGenericRegressor[INDEX]. Index will be a hyper parameter. Look for PythonGenericRegressor0.py in lib/python/ for an example.
L2 regularization on the weights. This is important
new_tree_gain_ratio
new tree is created when leaf-nodes gain < this value * estimated gain of creating new three. This is important
lamL1
L1 regularization on the weights.
stepsize
Step size of epsilon-greedy boosting (inactive for rgf).
min_occurrences
minimum number of occurrences for a feature to be selected.
min_sample
minimum samples in node.
max_nodes
maximum number of nodes.
loss
Type of loss. could be LS, MODLS (modified least squares loss), or LOGISTIC.
opt
optimization method for training forest. Could be rgf or epsilon-greedy.
sparse_lamL2
L2 regularization parameter for sparse data.
min_bucket_weights
Minimum sum of data weights for each discretized value.
dense_max_buckets
Maximum bins for dense data.
sparse_max_features
You may try a different value in [1000,10000000] for fetaures allowed.
dense_max_buckets
Maximum bins for dense data.
OriginalLibFMRegressor
Wraps the original implementation of ibFM, made from Steffen Rendle. The reason this implementation is made, is because internal results show that it has better performance (as in accuracy) than StackNet's internal implementation and has more training methods than just sgd.
This implementation may not include all libFM features plus it actually uses a version of it that had a bug(!) on purpose. You can find more information about why this was chosen in the following python wrapper for libFM. It basically had this bug that was allowing you to get the parameters of the trained models for all training methods. These parameters are now extracted once a model has been trained and the scoring uses only these parameters (e.g. not the libFM executable).
Don't forget to acknowledge libFM if you publish results produced with this software. Also take note of its licence GNU. More information can be found on libFM's repo on github.