Datasets limited to one-hot encoded files and analysis limited to binary classification of cross-sectional data.
Try it at: https://austinlim.shinyapps.io/DataScienceAnalyticsApp/
During an internship, I had the privilege of learning to use an automated machine learning platform called DataRobot and I was inspired by its easy-to-use point-and-click user interface yet it was able to build incredibly powerful and sophisticated models for prediction. Hence, I wanted to make an application that could help simplify the process of doing data analysis in the same vein where all users need to do is to upload a dataset, afterwhich all they need to do is choose the model settings they want to build and with the press of a button, the model evaluation is done for the user.- Shiny
- DT
- DataExplorer
- 14 different classification models:
logistic regression
xgboost
adaboost
randomforest
isolationforests
svm
knn
assocation rule mining and more
- This application is built for classification tasks, an extention of this application could be built to handle other types of tasks such as regression and anomaly detection.
- This application can only take in data that has already been pre-processed, a complementary project could deal with creating the application for pre-processing the data in a point-and-click fashion similar to DataRobot's sister platform Paxata.
- This application is limited to only cross-sectional analysis. Studies have shown that using cross-sectional methods on time-series may result in inaccurate results and hence an extention could seek to implement time-series models.
- An extention of this application can look into implementation of other features such as feature selection using Boruta package and resampling methods like SMOTE to deal with class imbalances.
- Another feature that would be helpful in simplifying the process of data analysis is an analysis of the feature importance of each variable as this would help users understand their model results as opposed to treating it like a blackbox.