Credit_Risk_Analysis

Overview

The purpose of this challenge is to apply machine learning to predict credit-risk. Using a data set from LendingClub, oversample the data using the RandomOverSampler and SMOTE algorithms. Undersample the data with ClusterCentroids algorithm. The SMOTEENN algorithm is used for a combination of over and under resampling. BalancedRandomForestClassifier and EasyEnsembleClassifier are used to reduce bias to predict credit risk.

Results

Balanced Random Forest Classifier

The balanced accuracy score for the Balanced Forest Classifier is ~79%. Although it was not not the best performer, it falls within the realistic range of scores.

High-risk has a precision score of 4% and a F1 score of 7%. Both of these are very low, with a large number of false positives.
Low-risk has a precision score of 100% and a F1 score of 95%. These are excellent, very few high-risk were mistaken for low-risk.

Adaboost

The balanced accuracy score is ~92%. This was the best results of all the tests.

High-risk has a precision score of 7% and a F1 score of 14%. While slightly better, it will still produce a large number of false-positives.
Low-risk had much better results, with a precision score of 100% and a F1 score of 97%.

Naive Random Oversampling

A balanced accuracy of ~66% is below what is considered good.

High-risk has a precision of 1% and F1 of 2%. Once again these ar very low, producing a high number of false positives.
Low-risk has a precision of 100%, but F1 of only 80%.

SMOTE Oversampling

The oversampling balanced accuracy score was about the same at ~66%, and is not considered a good score.

High-risk's precision and F1 scores, are once again very low. 1% for precision, and 2% for F1.
Low-risk are better at 100% precision, and 80% F1.

Undersampling

Undersampling produced the worst balanced accuracy score at just ~53%.

High-risk prescision and F1 scores are both very low at 1%.
Low-risk preformed better with a precision score of 100% and F1 of 62%.

Combination (Over and Under) Sampling

The combination sampling was also under-preforming, with a balanced accuracy score of ~62%.

High-risk also did poorly, with a precision of 1% and F1 of 2%
Low risk was better with a precision os 100% and F1 of 72%.

Summary

Of all the tests preformed, Adaboost classifier preformed the best. The balanced accuracy score was very high at ~92%. The low-risk precision scores were also very high, leaving few high risk applicants mistaken for low-risk. This is ideal when looking for credit-risk. Having a lot of loans approved for high-risk applicants could result in many not being paid back. However, the low scores for high risk is allowing for a large number of false positives. This could cost the bank a lot in missed revenue. My suggestion would be to look for a better option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Credit_Risk_Analysis

Overview

Results

Balanced Random Forest Classifier

Adaboost

Naive Random Oversampling

SMOTE Oversampling

Undersampling

Combination (Over and Under) Sampling

Summary

Files

README.md

Latest commit

History

README.md

File metadata and controls

Credit_Risk_Analysis

Overview

Results

Balanced Random Forest Classifier

Adaboost

Naive Random Oversampling

SMOTE Oversampling

Undersampling

Combination (Over and Under) Sampling

Summary