This project involves the analysis and development of machine learning models to predict credit risk using historical lending data. The goal is to build models that can classify loans as either healthy (class 0) or high-risk (class 1). The analysis includes data preprocessing, model training, and evaluation.
Credit_Risk/
: Folder containing project files.credit_risk_classification.ipynb
: Jupyter Notebook with the main code for the project.lending_data.csv
: Dataset containing historical lending data.final_report.md
: Detailed report providing insights into the analysis and results.
README.md
: Project documentation and overview.
- Python 3.x
- Jupyter Notebook
- Required Python libraries: numpy, pandas, scikit-learn, imbalanced-learn
- Clone the repository:
git clone https://github.com/nardyjh/credit-risk-classification.git
- Open and run the Jupyter Notebook:
jupyter notebook Credit_Risk/credit_risk_classification.ipynb
- Follow the instructions in the notebook to execute the analysis.
The project includes the development and comparison of two machine learning models.
- Original Logistic Regression Model:
- Accuracy: 99.18%
- Precision (Class 1): 85%
- Recall (Class 1): 91%
- F1-Score (Class 1): 88%
- Logistic Regression Model with Resampled Data:
- Accuracy: 99.38%
- Precision (Class 1): 84%
- Recall (Class 1): 99%
- F1-Score (Class 1): 91%
For a more detailed report and insights, refer to the final_report.md file.
The logistic regression model with resampled data outperformed the original model, showcasing enhanced accuracy and precision, especially in predicting high-risk loans.
Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only. University of Toronto.