Skip to content

About Six different techniques are employed to train and evaluate models with unbalanced classes. Algorithms are used to predict credit risk. Performance of these different models is compared and recommendations are suggested based on results. Topics

Notifications You must be signed in to change notification settings

eric-blankinshp/Credit_Risk_Analysis_Supervised_ML

Repository files navigation

Credit_Risk_Analysis

Overview

The purpose of this challenge is to apply machine learning to predict credit-risk. Using a data set from LendingClub, oversample the data using the RandomOverSampler and SMOTE algorithms. Undersample the data with ClusterCentroids algorithm. The SMOTEENN algorithm is used for a combination of over and under resampling. BalancedRandomForestClassifier and EasyEnsembleClassifier are used to reduce bias to predict credit risk.

Results

Balanced Random Forest Classifier

BRFC_accuracy_score

  • The balanced accuracy score for the Balanced Forest Classifier is ~79%. Although it was not not the best performer, it falls within the realistic range of scores.

BRFC_classification_report

  • High-risk has a precision score of 4% and a F1 score of 7%. Both of these are very low, with a large number of false positives.
  • Low-risk has a precision score of 100% and a F1 score of 95%. These are excellent, very few high-risk were mistaken for low-risk.

Adaboost

Adaboost_balanced_accuracy_score

  • The balanced accuracy score is ~92%. This was the best results of all the tests.

Adaboost_classification_report

  • High-risk has a precision score of 7% and a F1 score of 14%. While slightly better, it will still produce a large number of false-positives.
  • Low-risk had much better results, with a precision score of 100% and a F1 score of 97%.

Naive Random Oversampling

NRO_balanced_accuracy_score

  • A balanced accuracy of ~66% is below what is considered good.

NRO_classification_report

  • High-risk has a precision of 1% and F1 of 2%. Once again these ar very low, producing a high number of false positives.
  • Low-risk has a precision of 100%, but F1 of only 80%.

SMOTE Oversampling

SMOTE_balanced_accuracy_score

  • The oversampling balanced accuracy score was about the same at ~66%, and is not considered a good score.

SMOTE_classification_report

  • High-risk's precision and F1 scores, are once again very low. 1% for precision, and 2% for F1.
  • Low-risk are better at 100% precision, and 80% F1.

Undersampling

Undersampling_balanced_accuracy_score

  • Undersampling produced the worst balanced accuracy score at just ~53%.

Undersampling_classification_report

  • High-risk prescision and F1 scores are both very low at 1%.
  • Low-risk preformed better with a precision score of 100% and F1 of 62%.

Combination (Over and Under) Sampling

SMOTEENN_balanced_accuracy_score

  • The combination sampling was also under-preforming, with a balanced accuracy score of ~62%.

SMOTEENN_classification_report

  • High-risk also did poorly, with a precision of 1% and F1 of 2%
  • Low risk was better with a precision os 100% and F1 of 72%.

Summary

Of all the tests preformed, Adaboost classifier preformed the best. The balanced accuracy score was very high at ~92%. The low-risk precision scores were also very high, leaving few high risk applicants mistaken for low-risk. This is ideal when looking for credit-risk. Having a lot of loans approved for high-risk applicants could result in many not being paid back. However, the low scores for high risk is allowing for a large number of false positives. This could cost the bank a lot in missed revenue. My suggestion would be to look for a better option.

About

About Six different techniques are employed to train and evaluate models with unbalanced classes. Algorithms are used to predict credit risk. Performance of these different models is compared and recommendations are suggested based on results. Topics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages