A compilation of custom R functions related to data science designed to make your work easier.
This function allows you to quickly run (seperately or together) a Fast-forward, Lasso, Ridge and Elastic Net regressions. It uses RMSE and MAE to compare the models from each technique and makes a final recommendation for variable selection.
Beside a final conclusion on variable selection, you are also provided all information that was used for said conclusion. Finally, the created models are saved to your environment, in case they are needed.
You can also call ?dreemstat::varselector() to read a comprehensive documentation.
df: Dataframe object that is unsplit in terms of train/test data. Also, subsetted, meaning remove any unwanted columns you don't want in the model. | e.g., df = select(analysis_df, -id)
y: String that represents the name of your dependent variable, as it is called in your dataframe. | e.g., y = 'clicks'
cv: trainControl object that specifies the cross-validation (CV) folds. See documentation of caret package for additional information: ?caret::train() & ?caret::trainControl() | e.g., cv = trainControl(method = "cv", number = 5)
lambda: Specify the lambda value used for tuning of the ridge and lasso models in the tuneGrid parameter of caret::train(). | e.g., lambda = c(seq(0.1, 2, by =0.1) , seq(2, 5, 0.5) , seq(5, 25, 5))
alpha: Specify the alpha value used for tuning of the elastic net model in the tuneGrid parameter of caret::train(). This value is 0 by default for Ridge and 1 for Lasso. In the Elastic net it is systematically varied to find the balance between Lasso-Ridge. | e.g., alpha = seq(0.00, 1, 0.1)
model_id: Specify a string for naming purposes of the model objects generated by varselector. This is useful varselector is used in a loop, so that model object names generated by the function will remain unique. When used in a loop make sure that each iteration of the loop the model_id value changes (e.g., 1,2,3,4...)
mode: Specify which regression to run, default is all 4 methods ('all').
Abbreviations: 'ffr' = run fast-forward; 'rr' = run ridge regression; 'lr' = run lasso regression; 'enr' = elastic net regression
split: A number between 0.01-0.99 specifying the proportion to split your dataframe into test-train datasets within the function. By default the value is 0.80, meaning 80% training and 20% testing split. e.g., split = 0.75
Coming soon