Skip to content

Latest commit

 

History

History
78 lines (43 loc) · 5.98 KB

File metadata and controls

78 lines (43 loc) · 5.98 KB

Credit_Risk_Analysis

Overview

The purpose of this challenge is to apply machine learning to predict credit-risk. Using a data set from LendingClub, oversample the data using the RandomOverSampler and SMOTE algorithms. Undersample the data with ClusterCentroids algorithm. The SMOTEENN algorithm is used for a combination of over and under resampling. BalancedRandomForestClassifier and EasyEnsembleClassifier are used to reduce bias to predict credit risk.

Results

Balanced Random Forest Classifier

BRFC_accuracy_score

  • The balanced accuracy score for the Balanced Forest Classifier is ~79%. Although it was not not the best performer, it falls within the realistic range of scores.

BRFC_classification_report

  • High-risk has a precision score of 4% and a F1 score of 7%. Both of these are very low, with a large number of false positives.
  • Low-risk has a precision score of 100% and a F1 score of 95%. These are excellent, very few high-risk were mistaken for low-risk.

Adaboost

Adaboost_balanced_accuracy_score

  • The balanced accuracy score is ~92%. This was the best results of all the tests.

Adaboost_classification_report

  • High-risk has a precision score of 7% and a F1 score of 14%. While slightly better, it will still produce a large number of false-positives.
  • Low-risk had much better results, with a precision score of 100% and a F1 score of 97%.

Naive Random Oversampling

NRO_balanced_accuracy_score

  • A balanced accuracy of ~66% is below what is considered good.

NRO_classification_report

  • High-risk has a precision of 1% and F1 of 2%. Once again these ar very low, producing a high number of false positives.
  • Low-risk has a precision of 100%, but F1 of only 80%.

SMOTE Oversampling

SMOTE_balanced_accuracy_score

  • The oversampling balanced accuracy score was about the same at ~66%, and is not considered a good score.

SMOTE_classification_report

  • High-risk's precision and F1 scores, are once again very low. 1% for precision, and 2% for F1.
  • Low-risk are better at 100% precision, and 80% F1.

Undersampling

Undersampling_balanced_accuracy_score

  • Undersampling produced the worst balanced accuracy score at just ~53%.

Undersampling_classification_report

  • High-risk prescision and F1 scores are both very low at 1%.
  • Low-risk preformed better with a precision score of 100% and F1 of 62%.

Combination (Over and Under) Sampling

SMOTEENN_balanced_accuracy_score

  • The combination sampling was also under-preforming, with a balanced accuracy score of ~62%.

SMOTEENN_classification_report

  • High-risk also did poorly, with a precision of 1% and F1 of 2%
  • Low risk was better with a precision os 100% and F1 of 72%.

Summary

Of all the tests preformed, Adaboost classifier preformed the best. The balanced accuracy score was very high at ~92%. The low-risk precision scores were also very high, leaving few high risk applicants mistaken for low-risk. This is ideal when looking for credit-risk. Having a lot of loans approved for high-risk applicants could result in many not being paid back. However, the low scores for high risk is allowing for a large number of false positives. This could cost the bank a lot in missed revenue. My suggestion would be to look for a better option.