A simple pipeline of training neural networks in Python to classify Iris flowers from petal and sepal dimensions (can also be used on any other multiclass classification dataset). Implemented two neural network architectures along with the code to load data, train, optimize these networks and classify the dataset. The main.py contains the major logic of this pipeline. You can execute it by invoking the following command where the yaml file contains all the hyper-parameters.
$: python main.py --config configs/config_file.yaml
There are three pre-defined config files under ./configs. Two of them are default hyper-parameters for models (Softmax Regression and 2-layer MLP). Do NOT modify values in these config files. The third config file, config_exp.yaml, is used for your hyper-parameter tuning experiments and you are free to modify values of the hyper-parameters in this file. The script trains a model with the number of epochs specified in the config file. At the end of each epoch, the script evaluates the model on validation set. After the training completes, the script finally evaluate the best model on test data.
Python and dependencies
We will work with Python 3. If you do not have a python distribution installed yet,
we recommend installing Anaconda (or miniconda) with Python 3. We provide environment.yaml which contains a list of libraries needed to set environment
for this implementation. You can use it to create a copy of conda environment.
$: conda env create -f environment.yaml
If you already have your own Python development environment, please refer to this file to find necessary libraries.
The IRIS dataset (iris_train.csv and iris_test.csv ) is present in the ./data folder.
1.1 Data Preparation
To avoid the choice of hyper-parameters "overfits" the test data, it is a common practice to split the training dataset into the actual training data and validation data
and perform hyper-parameter tuning based on results on validation data. Additionally, in deep learning, training data is often forwarded to models in batches for faster training time and noise reduction.
In our pipeline, we first load the entire data into the system, followed by a training/validation split on the training set. We simply use the first 80% of the training set as our training data and use the rest training set as our validation data. We have also organized our data (training, validation, and test) in batches and use different combination of batches in different epochs for training data.
We now implement two networks from scratch: a simple softmax regression and a two-layer multi-layer perceptron (MLP).
Definitions of these classes can be found in ./models. Weights of each model will be randomly initialized upon construction and stored in a weight dictionary.
Meanwhile, a corresponding gradient dictionary is also created and initialized to zeros.
Each model only has one public method called forward, which takes input of batched data and corresponding labels and returns the loss and accuracy of the batch.
Meanwhile, it computes gradients of all weights of the model (even though the method is called forward!) based on the training batch.
2.1 Utility Function
There are a few useful methods defined in ./_base network.py that can be shared by both models.
(a) Activation Functions.
There are two activation functions used for this model: ReLU and Sigmoid.
Implemented both functions as well as their derivatives in ./_base_network.py (i.e, sigmoid, sigmoid_dev, ReLU, and ReLU_dev).
(b) Loss Functions.
The loss function used in this project is Cross Entropy Loss. Implemented both Softmax function and the computation of Cross Entropy Loss in ./_base_network.py.
(c) Accuracy.
We are also interested in knowing how our model is doing on a given batch of data.
Therefore, we have implemented the compute_accuracy method in ./_base_network.py to compute the accuracy of given batch.
2.2 Model Implementation
The Softmax Regression is composed by a fully-connected layer followed by a ReLU activation. The two-layer MLP is composed by two fully-connected layers with a Sigmoid Activation in between followed by the softmax function before computing the loss (in both the models)!
We will use an optimizer to update weights of models. An optimizer is initialized with a specific learning rate and a regularization coefficients.
Before updating model weights, the optimizer applies L2 regularization on the model. Also implemented a vanilla SGD optimizer.
NOTE: Regularization is NOT applied on bias terms!
It is always a good practice to monitor the training process by monitoring the learning curves.
Our training method in main.py stores averaged loss and accuracy of the model on both training and validation data at the end
of each epoch and plots the same.