Supervised Machine Learning and Credit Risk Analysis

Overview

This project utilized six machine learning models applied to credit risk, an inherently unbalanced classification problem that requires differentiating large numbers of good loans from a much smaller number of risky loans. The data is drawn from LendingClub, a P2P lending service company, which demonstrates a typical imbalance in classes of loans. Although this problem cannot be overcome completely, these algorithms each provide some benefit in their distinct manners of resampling the data.

Results

Naive Random Oversampling: This algorithm produced a balanced accuracy score of 66%, with precision of 99% and recall of 60%.

SMOTE Oversampling: This option produced results similar to Naive Random Oversampling, with a balanced accuracy score of 66%, precision of 99% and recall of 69%.

ClusterCentroids Undersampling: Again, a comparable balanced accuracy score of 66% and precision of 99%, but with a much lower recall of 40%.

SMOTEEN Combination Sampling: This method produced a lower balanced accuracy score of 54%, with precision of 99% and mediocre recall of 58%.

Balanced Random Forest Classifier: This algorithm saw a higher balanced accuracy score of 79%, with precision of 99% and high recall of 87%.

Easy Ensemble AdaBoost Classifier: This approach had the best overall results, with a the highest balanced accuracy score of 93%, precision of 99%, and highest recall of 94%.

Summary

This project utilized six different algorithms for predicting credit risk, and all six were hampered by the severe imbalance between the number of good loans and the number of risky ones. This is demonstrated by the very low precision scores for identifying high risk loans across all models. The Cluster Centroids approach fared the worst of the six, and the two ensemble approaches scored the best. The Easy Ensemble AdaBoost Classifier is the obvious choice when making a recommendation among these models given the dataset, as it performed most effectively overall.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised Machine Learning and Credit Risk Analysis

Overview

Results

Summary

About

Releases

Packages

Languages

JeffZimmerman/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Supervised Machine Learning and Credit Risk Analysis

Overview

Results

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages