Determining Credit Risk

NOTE: In the following tables, 0 represent low credit risk and 1 represents high credit risk.

Purpose

The aim of this project is to determine which classification technique produces the best results when applied to the task of determining credit riskiness.

Method

The following classification models were applied to the data and the statistics of their performance documented:

Resampling Techniques

i. Oversampling - Naïve Random Overampling
ii. Oversampling - SMOTE Oversampling
iii. Undersampling
iv. Combination Over- and Undersampling
Ensemble Learners

i. Balance Random Forest Classifier
ii. Easy Ensemble AdaBoost Classifier

Resampling Techniques

Oversampling

Naïve Random Oversampling

Balanced Accuracy score (validation): 0.835
Confusion matrix:

--         --
|14502  2602|
|   18    83|
--         --

Imbalanced Classification Report:

	pre	rec	spe	f1	geo	iba	sup
0	1.00	0.85	0.82	0.92	0.83	0.70	17104
1	0.03	0.82	0.85	0.06	0.83	0.69	101
avg / total	0.99	0.85	0.82	0.91	0.83	0.70	17205

SMOTE Oversampling

Balanced Accuracy score (validation)*: 0.841
Confusion matrix:

--         --
|14883  2221|
|   19    82|
--         --

Imbalanced Classification Report:

	pre	rec	spe	f1	geo	iba	sup
0	1.00	0.87	0.81	0.93	0.84	0.71	17104
1	0.04	0.81	0.87	0.07	0.84	0.70	101
avg / total	0.99	0.87	0.81	0.92	0.84	0.71	17205

Undersampling

Balanced Accuracy score (validation): 0.824
Confusion matrix:

--         --
|13121  3983|
|   12    89|
--         --

Imbalanced Classification Report:

	pre	rec	spe	f1	geo	iba	sup
0	1.00	0.77	0.88	0.87	0.82	0.67	17104
1	0.02	0.88	0.77	0.04	0.82	0.68	101
avg / total	0.99	0.77	0.88	0.86	0.82	0.67	17205

Combination (Over and Under) Sampling

Balanced Accuracy score (validation): 0.841
Confusion matrix:

--         --
|14723  2381|
|   18    83|
--         --

Imbalanced Classification Report:

	pre	rec	spe	f1	geo	iba	sup
0	1.00	0.86	0.82	0.92	0.84	0.71	17104
1	0.03	0.82	0.86	0.06	0.84	0.70	101
avg / total	0.99	0.86	0.82	0.92	0.84	0.71	17205

Ensemble Learners

BalancedRandomForestClassifier

Balanced Accuracy score (validation): 0.792
Confusion matrix:

--         --
|14212  2904|
|   22    67|
--         --

Imbalanced Classification Report:

	pre	rec	spe	f1	geo	iba	sup
0	1.00	0.83	0.75	0.91	0.79	0.63	17116
1	0.02	0.75	0.83	0.04	0.79	0.62	89
avg / total	0.99	0.83	0.75	0.90	0.79	0.63	17205

EasyEnsembleClassifier

Balanced Accuracy score (validation): 0.902
Confusion matrix:

--         --
|15506  1610|
|    9    80|
--         --

Imbalanced Classification Report:

	pre	rec	spe	f1	geo	iba	sup
0	1.00	0.91	0.90	0.95	0.90	0.81	17116
1	0.05	0.90	0.91	0.09	0.90	0.81	89
avg / total	0.99	0.91	0.90	0.95	0.90	0.81	17205

Analysis

Accuracy

Classification method	Accuracy
BalancedRandomForestClassifier	0.792
Undersampling	0.824
Naïve Random Oversampling	0.835
SMOTE Oversampling	0.841
Combination (Over and Under) Sampling	0.841
EasyEnsembleClassifier	0.902

The table above shows the classification methods sorted by the prediction accuracy, in ascending order. It can be observed that the worst accuracy results are achieved with undersampling which, considering that the technique involves shrinking the dataset to a very small number, is not surprising. The best accuracy is achieved by the Ensemble AdaBoost Classifier.

F1 Score

	F1
Undersampling	0.04
BalancedRandomForestClassifier	0.04
Naïve Random Oversamplingsampling	0.06
Combination (Over and Under) Sampling	0.06
SMOTE Oversampling	0.07
EasyEnsembleClassifier	0.09

The table above shows that classification methods with their F1 scores. In a confirmation of the finding related to accuracy, this table also shows that the best F1 score has been achieved by the Ensemble AdaBoot Classifier.

Conclusion

The classification method that achieves the best results is the Ensemble AdaBoost Classifier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Determining Credit Risk

Purpose

Method

Resampling Techniques

Oversampling

Naïve Random Oversampling

SMOTE Oversampling

Undersampling

Combination (Over and Under) Sampling

Ensemble Learners

BalancedRandomForestClassifier

EasyEnsembleClassifier

Analysis

Accuracy

F1 Score

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Determining Credit Risk

Purpose

Method

Resampling Techniques

Oversampling

Naïve Random Oversampling

SMOTE Oversampling

Undersampling

Combination (Over and Under) Sampling

Ensemble Learners

BalancedRandomForestClassifier

EasyEnsembleClassifier

Analysis

Accuracy

F1 Score

Conclusion