Skip to content

Application for the automation of modelling, evaluation and tuning of machine learning models for classification.

License

Notifications You must be signed in to change notification settings

austinlimjingzhe/datascienceandanalysis-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Science and Analysis Application

Summary

An attempt at creating an easy-to-use application for modelling, evaluation and tuning of machine learning models for classification without the need for any coding knowledge.

Datasets limited to one-hot encoded files and analysis limited to binary classification of cross-sectional data.

Try it at: https://austinlim.shinyapps.io/DataScienceAnalyticsApp/

Background

During an internship, I had the privilege of learning to use an automated machine learning platform called DataRobot and I was inspired by its easy-to-use point-and-click user interface yet it was able to build incredibly powerful and sophisticated models for prediction. Hence, I wanted to make an application that could help simplify the process of doing data analysis in the same vein where all users need to do is to upload a dataset, afterwhich all they need to do is choose the model settings they want to build and with the press of a button, the model evaluation is done for the user.

Tools and Frameworks:

  1. Shiny
  2. DT
  3. DataExplorer
  4. 14 different classification models:
     logistic regression
     xgboost
     adaboost
     randomforest
     isolationforests
     svm
     knn
     assocation rule mining and more

Features

1.Interactive Data Table

Users will be able to specify the type of separators and headings and even inspect the data row by row.

image

2.Automated EDA Reporting

Users will receive an automated report on the basic statistics, missingess, histograms, correlation matrix of their data almost instantly.

image

3.Easy-to-use Model Building and Hyperparameter Tuning

Users will be able to specify the model and target variable they want to build, as well as tweak the model's hyperparameters with the click of a button.

image

4.Model Leaderboards

Built models will immediately be updated in the model leaderboards and the scores of the models can be sorted by accuracy, auc, f1, tpr, fpr.

image

5.Prediction Making

Users can upload their own csv files that they want to make predictions for and download the predictions once they are done.

image

Discussion

This project has room for improvement such as:
  1. This application is built for classification tasks, an extention of this application could be built to handle other types of tasks such as regression and anomaly detection.
  2. This application can only take in data that has already been pre-processed, a complementary project could deal with creating the application for pre-processing the data in a point-and-click fashion similar to DataRobot's sister platform Paxata.
  3. This application is limited to only cross-sectional analysis. Studies have shown that using cross-sectional methods on time-series may result in inaccurate results and hence an extention could seek to implement time-series models.
  4. An extention of this application can look into implementation of other features such as feature selection using Boruta package and resampling methods like SMOTE to deal with class imbalances.
  5. Another feature that would be helpful in simplifying the process of data analysis is an analysis of the feature importance of each variable as this would help users understand their model results as opposed to treating it like a blackbox.

There may be other areas for improvement that I may not have considered. I would love to get into a discussion on what they are and how I can implement them.

About

Application for the automation of modelling, evaluation and tuning of machine learning models for classification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages