Loan Default Prediction Pipeline

Overview

This Python script (loan_checker.py) implements an end-to-end machine learning pipeline to predict the likelihood of loan default based on borrower data. It reads data from CSV-like files, performs cleaning, visualization, class balancing, feature engineering, trains a Decision Tree classifier, evaluates the model, predicts outcomes for new loan requests, and presents the results in an interactive command-line interface.

The pipeline demonstrates common steps in a data science workflow, including data preprocessing, exploratory visualization, model building, and basic deployment via a CLI.

Features

Data Loading: Reads borrower data from specified CSV-like files (credit_risk_train.csv, loan_requests.csv).
Data Cleaning: Removes records with missing values or unrealistic age entries (>= 90). Reports counts of removed records and missing values per column.
Data Visualization: Generates plots using matplotlib to explore:
- Age distribution of defaulters vs. non-defaulters (Histograms).
- Home ownership status among defaulters vs. non-defaulters (Pie Chart).
Class Balancing: Addresses class imbalance in the training data by performing simple undersampling of the majority class (non-defaulters).
Feature Engineering & Selection: Selects specific features (loan_amnt, person_income, cb_person_cred_hist_length) and scales numerical features using StandardScaler.
Model Training: Trains a DecisionTreeClassifier using scikit-learn on the prepared training data.
Model Evaluation: Assesses the trained model's performance on a held-out test set using:
- Accuracy Score.
- Classification Report (Precision, Recall, F1-score).
- Confusion Matrix.
Prediction: Uses the trained model to predict default status for new loan requests from loan_requests.csv.
Interactive Display: Presents the borrower details and predictions using a custom Carousel class, allowing the user to navigate back and forth through the records via the command line.

Input Data Files (Required)

This script requires the following files to be present in the same directory:

credit_risk_train.csv: Contains the historical training data with borrower information and known loan outcomes (loan_status). Expected to be comma-separated with a header row.
loan_requests.csv: Contains new loan applicant data for prediction. Expected to be comma-separated with a header row similar to the training data (excluding loan_status).
carousel.py: Contains the definition for the Carousel class used in the interactive display.

Requirements

Python 3.x
matplotlib
scikit-learn
The custom carousel.py file.

Installation

Place Files: Ensure loan_checker.py, credit_risk_train.csv, loan_requests.csv, and carousel.py are in the same directory.

Install Libraries: Open your terminal or command prompt and run:

pip install matplotlib scikit-learn
# or pip3 install matplotlib scikit-learn

Usage

Navigate: Open your terminal or command prompt and navigate to the directory containing all the required files.

Run the script:

python loan_checker.py
# or python3 loan_checker.py

Observe Output:
- The script will first print logs related to data cleaning, balancing, and model evaluation metrics.
- Plots generated during the visualization step will be displayed sequentially. You may need to close each plot window to proceed.
- Predictions for borrowers in loan_requests.csv will be printed.
- You will be prompted to press Enter to start the interactive carousel display.
Interact with Carousel:
- Use 1 to move to the next borrower, 2 to move to the previous borrower, and 0 to exit the carousel interface.

Code Structure

The script is organized into functions responsible for different pipeline stages:

createDataFrame(): Loads data.
dataCleaning(): Cleans the training data.
dataVisualisation(): Generates plots.
classBalancing(): Undersamples the majority class.
featureSelection(): Selects and scales features.
modelTraining(): Trains the Decision Tree.
modelEvaluation(): Evaluates the model.
borrowerPrediction(): Predicts on new data and populates the carousel.
displayBorrower(): Formats and prints the current borrower's info.
clear(): Clears the console screen.
interface(): Handles user interaction with the carousel.
main(): Orchestrates the execution of the entire pipeline.

License

MIT License

Author

Andrew Obwocha

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
carousel.py		carousel.py
credit_risk_test.csv		credit_risk_test.csv
credit_risk_train.csv		credit_risk_train.csv
loan_checker.py		loan_checker.py
loan_requests.csv		loan_requests.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loan Default Prediction Pipeline

Overview

Features

Input Data Files (Required)

Requirements

Installation

Usage

Code Structure

License

Author

About

Uh oh!

Releases

Packages

Languages

AndrewObwocha/LoanChecker

Folders and files

Latest commit

History

Repository files navigation

Loan Default Prediction Pipeline

Overview

Features

Input Data Files (Required)

Requirements

Installation

Usage

Code Structure

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages