Charity Fundation

Overview

A funding association Alphabet Soup is considering whether client organizations they funded would be successful if making new investments on them. To predict the successfulness of its fundation, a binary classifier was created with machine learning and neural network. For this purpose, more than 34,000 organizations funding history record were retrieved to build the neural network model. The model building including following steps:

Preprocess data for a Neural network model.
Compile, train, evaluate the model and save the model weights.
Optimize the model.

Result

Data Preprocessing

Dataset

EIN and NAME-------------------------------- Identification columns
APPLICATION_TYPE---------------------------- Alphabet Soup application type
AFFILIATION -------------------------------- Affiliated sector of industry
CLASSIFICATION ----------------------------- Government organization classification
USE_CASE ----------------------------------- Use case for funding
ORGANIZATION ------------------------------- Organization type
STATUS ------------------------------------- Active status
INCOME_AMT --------------------------------- Income classification
SPECIAL_CONSIDERATIONS --------------------- Special consideration for application
ASK_AMT ------------------------------------ Funding amount requested
IS_SUCCESSFUL ------------------------------ Was the money used effectively

Target variable: IS_SUCCESSFUL. It stands for whether the applicants were successful or not. Binary values: 1-successful, 0-unsuccessful.
Unrelated variables: EIN and NAME. These are ID columns, can be removed from the input data.
Features: APPLICATION_TYPE, AFFILIATION, CLASSIFICATION, USE_CASE, ORGANIZATION, STATUS, INCOME_AMT, SPECIAL_CONSIDERATIONS, ASK_AMT. These variables potentionally affect applicant's use of funds.
- Categorical features:
  - APPLICATION_TYPE -----> 17 unique values, bucketing into 9. Then encoded.
  - AFFILIATION -----> 6 unique values, Encoded.
  - CLASSIFICATION -----> 17 unique values, bucketing into 6. Then encoded.
  - USE_CASE -----> 5 unique values, Encoded.
  - ORGANIZATION -----> 4 unique values, Encoded.
  - INCOME_AMT -----> 9 unique values, Encoded.
  - SPECIAL_CONSIDERATIONS -----> 2 unique values, binary variable, Y / N.
Numerical features:
- STATUS binary varibale. 1 / 0.
- ASK_AMT continuous values.
Split data into training data and testing data.
Feature scaling Use standard scaler to scale the data to make sure all the feature contributes approximately proportionately to the final distance.

Compiling, Training, and Evaluating the Model

Create Neural Network Model Input layer: 44 trained sample features. First hidden layer: 80 neurons. Activation function:Relu. Second hidden layer: 30 neurons. Activation function:Relu. output layer: 1 target varible. Binary classification. epochs: 50
Save callback files: The model's weights were saved by every 5 epochs.
Model Loss and Accuracy:

Model Optimization

Target model performance: 75%

Proprocessing Optimization Attempts:

Rebucketing APPLICATION_TYPE and CLASSIFICATION.[Considering getting more data back for precision.]
Keep only active STATUS. [Considering only active status affect future funding.]
Bucketing AFFILIATION. [Considering AFFILIATION column is not normal distributed. Data is heavily skewed.]
Binning ASK_AMT. [Considering over three quaters of requesting amount is 5000, and others are scatter over 5000 in a wide range.]
Changing INCOME_AMT to numeric data. [Considering to get more numerical features.]

The accuracy of first epochs are as follows:

  Proprocessing Optimization Attempts: 
- original: loss: 0.5701 - accuracy: 0.7221 
- bucketing application_accounts < 100: loss: 0.5694 - accuracy: 0.7227 √
- bucketing classification_counts < 150: loss: 0.5708 - accuracy: 0.7220 X
- bucketing classification_counts < 500: loss: 0.5728 - accuracy: 0.7203 X
- filtering active status: loss: 0.6929 - accuracy: 0.5582 X
- bucketing affiliation_counts < 50: loss: 0.5705 - accuracy: 0.7206 X
- binning and encoding ASK_AMT: loss: 0.5727 - accuracy: 0.7195 X
- changing INCOME_AMT to numeric data: loss: 0.5727 - accuracy: 0.7232 √

Proprocessing Optimization Attempts:
- Adding one more hidden layer. [Considering to allow neurons to train on activated input values for deep learning.]
- Using a different activation function. [Considering tanh function expands the data range between -1 and 1.]
- Adding neurons. [Considering to get more information from input data.]
- Changing batch size. [Considering to get more information from input data.]
- Adding additional epochs. [Considering to scan more or less samples for each epoch.]
The accuracy of first epochs are as follows:
```
Model Optimization Attempts: 
- adding hidden layer: loss: 0.5726 - accuracy: 0.7238 √
- changing activation function to tanh: loss: 0.5685 - accuracy: 0.7243 √
- adding neurons to hidden layers: loss: 0.5704 - accuracy: 0.7250 √
- decreasing batch size: loss: 0.5744 - accuracy: 0.7222 X
- increasing batch size: loss: 0.5433 - accuracy: 0.7392 √
- adding epochs: loss: 0.5438 - accuracy: 0.7398 √
```
Final model performance

Summary

Unfortunately, target was not achieved. Even if final model performance slightly improved. The loss and accruary was unstable.
Most of columns are not in normal distribution. Standard Scaler is not the best choice to do feature scaling.
Hyperparameter need to be determined and finetuned.
Since raw data are labeled data, output of the model is a class, this can be considered as a supervised learning classification problem. There are over 34000 applicants in record, which is a large dataset. But only have 10 features. It is more suitable to work with random forest algorithm. Comparing to neural network, random forest saves time and codes. It also have a big advantage on ranking the importance of features.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Screenshots		Screenshots
checkpoints		checkpoints
checkpoints_optimization		checkpoints_optimization
.DS_Store		.DS_Store
.Rhistory		.Rhistory
AlphabetSoupCharity.h5		AlphabetSoupCharity.h5
AlphabetSoupCharity.ipynb		AlphabetSoupCharity.ipynb
AlphabetSoupCharity_Optimization.h5		AlphabetSoupCharity_Optimization.h5
AlphabetSoupCharity_Optimization.ipynb		AlphabetSoupCharity_Optimization.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Charity Fundation

Overview

Result

Data Preprocessing

Compiling, Training, and Evaluating the Model

Model Optimization

Summary

About

Releases

Packages

Languages

CelineWW/Charity_Foundation_Neural_Network

Folders and files

Latest commit

History

Repository files navigation

Charity Fundation

Overview

Result

Data Preprocessing

Compiling, Training, and Evaluating the Model

Model Optimization

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages