easytidymodels

The goal of easytidymodels is to make running analyses in R using the tidymodels framework both easier and more reproducible. This is a wrapper for the tidymodels packages so that, after your data pre-processing steps, it all runs in one line of code and automatically tunes all the hyperparameters that are offered.

If you are not familiar with tidymodels, I recommend learning more here or here.

For more details on how the functions work in this package, I recommend checking out the reference page, referencing the vignettes on this site, or calling help on the function of interest in R to learn more. Here I will just give a brief overview of the workflow of this package.

Installation

You can install easytidymodels like this:

# install.packages("devtools")
devtools::install_github("amanda-park/easytidymodels")

Preparing Data for Analysis

There are three main functions to prepare your data for analysis:

trainTestSplit lets you split data into training and testing sets, with the ability to stratify on a variable and split based on a point in time.
cvFolds splits your data into cross-validation folds to allow the model’s hyperparameters to be tuned.
createRecipe does some basic data preprocessing on your dataset. NOTE: I recommend calling recipe() and creating a recipe object specific to your dataset’s needs, as every dataset will require its own preprocessing prior to analysis.

Classification Functions

The binary classification machine learning models available are as follows:

XGBoost (function xgBinaryClassif)
Logistic Regression (function logRegBinary)
K-Nearest Neighbors (function knnClassif)
Support Vector Machine (function svmClassif)

The multiclass classifications available are as follows:

XGBoost (function xgMultiClassif)
Multinomial Regression (function logRegMulti)
K-Nearest Neighbors (function knnClassif)
Support Vector Machine (function svmClassif)

Each of these models will tune the appropriate hyperparameters in the mode. However, these models allow for optimizing hyperparameters based on a specific evaluation metric. The list of metrics are as follows:

Balanced Accuracy (Average of Sensitivity and Specificity, call “bal_accuracy”)
Mean Log Loss (Call “mn_log_loss”)
ROC AUC (Area Under the Receiver Operating Curve, call “roc_auc”)
MCC (Matthew’s Correlation Coefficient, call “mcc”)
Kappa (Normalized Accuracy, call “kap”)
Sensitivity (Call “sens”)
Specificity (Call “spec”)
Precision (Call “precision”)
Recall (Call “recall”)

Save the model output to an object; the model will return the following in a list (can be accessed using $):

Confusion matrix on training data
Accuracy evaluation on training data
Confusion matrix on testing data
Accuracy evaluation on testing data
Description of final model chosen
A tuned version of the model (in the case you want to try model stacking or seeing the optimal model fit based on a different evaluation metric)

Regression Functions

The regression functions available are as follows:

Random Forest (function rfRegress)
XGBoost (function xgRegress)
Linear Regression (function linearRegress)
MARS (function marsRegress)
K-Nearest Neighbor Regression (function knnRegress)
Support Vector Machine Regression (function svmRegress)

These models allow for optimizing hyperparameters based on a specific evaluation metric as well. The list of metrics are as follows:

RMSE (Root Mean Squared Error, call “rmse”)
MAE (Mean Absolute Error, call “mae”)
RSQ (R-Squared, call “rsq”)
MASE (Mean Absolute Scaled Error, call “mase”)
CCC (Concordance Correlation Coefficient, call “ccc”)
IIC (Index of Ideality of Correlation, call “iic”)
HUBER_LOSS (Huber loss, call “huber_loss”)

Save the model output to an object; the model will return the following in a list (can be accessed using $):

Predictions on training data
RMSE and MAE evaluation on training data
Predictions on testing data
RMSE and MAE evaluation on testing data
Description of final model chosen
A tuned version of the model (in the case you want to try model stacking or seeing the optimal model fit based on a different evaluation metric)

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
R		R
docs		docs
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
easytidymodels.Rproj		easytidymodels.Rproj
pbp_db.sqlite		pbp_db.sqlite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

easytidymodels

Installation

Preparing Data for Analysis

Classification Functions

Regression Functions

About

Licenses found

Contributors 2

Languages

License

Licenses found

amanda-park/easytidymodels

Folders and files

Latest commit

History

Repository files navigation

easytidymodels

Installation

Preparing Data for Analysis

Classification Functions

Regression Functions

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Contributors 2

Languages