Skip to content

This project applies supervised machine learning models to predict credit risk, and compare algorithm effectiveness in an unbalanced classification problem

Notifications You must be signed in to change notification settings

JeffZimmerman/Credit_Risk_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Supervised Machine Learning and Credit Risk Analysis

Overview

This project utilized six machine learning models applied to credit risk, an inherently unbalanced classification problem that requires differentiating large numbers of good loans from a much smaller number of risky loans. The data is drawn from LendingClub, a P2P lending service company, which demonstrates a typical imbalance in classes of loans. Although this problem cannot be overcome completely, these algorithms each provide some benefit in their distinct manners of resampling the data.

Results

  • Naive Random Oversampling: This algorithm produced a balanced accuracy score of 66%, with precision of 99% and recall of 60%.

Screen Shot 2022-02-18 at 1 26 15 PM

  • SMOTE Oversampling: This option produced results similar to Naive Random Oversampling, with a balanced accuracy score of 66%, precision of 99% and recall of 69%.

Screen Shot 2022-02-18 at 1 26 53 PM

  • ClusterCentroids Undersampling: Again, a comparable balanced accuracy score of 66% and precision of 99%, but with a much lower recall of 40%.

Screen Shot 2022-02-18 at 1 27 50 PM

  • SMOTEEN Combination Sampling: This method produced a lower balanced accuracy score of 54%, with precision of 99% and mediocre recall of 58%.

Screen Shot 2022-02-18 at 1 28 57 PM

  • Balanced Random Forest Classifier: This algorithm saw a higher balanced accuracy score of 79%, with precision of 99% and high recall of 87%.

Screen Shot 2022-02-18 at 1 30 42 PM

  • Easy Ensemble AdaBoost Classifier: This approach had the best overall results, with a the highest balanced accuracy score of 93%, precision of 99%, and highest recall of 94%.

Screen Shot 2022-02-18 at 1 31 17 PM

Summary

This project utilized six different algorithms for predicting credit risk, and all six were hampered by the severe imbalance between the number of good loans and the number of risky ones. This is demonstrated by the very low precision scores for identifying high risk loans across all models. The Cluster Centroids approach fared the worst of the six, and the two ensemble approaches scored the best. The Easy Ensemble AdaBoost Classifier is the obvious choice when making a recommendation among these models given the dataset, as it performed most effectively overall.

About

This project applies supervised machine learning models to predict credit risk, and compare algorithm effectiveness in an unbalanced classification problem

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages