Implement ResNet from scratch and train them on CIFAR-10, Tiny ImageNet, and ImageNet datasets.
- Construct ResNet56 and train the network on CIFAR-10 datasets to obtain 93.79% accuracy, which replicates the result of original ResNet on CIFAR-10.
- Use ResNet and train the network on Tiny ImageNet Visual Recognition Challenge and claim a top ranking position on Leaderboard.
- Python 3.6
- OpenCV 4.0.0
- keras 2.2.4 for ResNet on CIFAR-10 and 2.1.0 for the rest
- Tensorflow 1.13.0
- cuda toolkit 10.0
- cuDNN 7.4.2
- scikit-learn 0.20.1
- Imutils
- NumPy
The details about CIFAR-10 datasets can be found here.
The ResNet can be found in resnet.py
(check here) under pipeline/nn/conv/
directory. The input to the model includes dimensions of the image (height, width, depth, and number of classes), number of stages, number of filters, regularization coefficient, batch normalization coefficient, batch normalization momentum, and dataset name argument (default to cifar10
). The ResNet in this project contains a pre-activation residual module with bottleneck.
Figure 1 shows an example of the pre-activation residual module. In this project, pre-activation + bottlenect residual module is used. And Table 1 demonstrates the ResNet56 architecture for CIFAR-10. Unlike the original ResNet, which uses 7x7 filters with stride of 2 for the first convolution layer, ResNet for CIFAR-10 uses 3x3 filters with stride of 1 due to small dimensions for CIFAR-10 (32x32x3). For details about the architecture of ResNet56 for CIFAR-10, check here.
Figure 1: Pre-activation residual module (reference).
Table 1: ResNet56 for CIFAR-10
layer name | output size | 56-layer |
---|---|---|
conv1 | 32 x 32 x 64 | 3 x 3, 64, stride 1 |
conv2_x | 32 x 32 x 64 | [1 x 1, 16] [3 x 3, 16] x 9 [1 x 1, 64] |
conv3_x | 16 x 16 x 128 | [1 x 1, 32] [3 x 3, 32] x 9 [1 x 1, 128] |
conv4_x | 8 x 8 x 256 | [1 x 1, 64] [3 x 3, 64] x 9 [1 x 1, 256] |
avg pool | 1 x 1 x 256 | 8 x 8, stride 1 |
linear | 256 | |
softmax | 10 |
The resnet_cifar10.py
(check here) is responsible for training the baseline model by using "ctrl+c" method. I can start training with an initial learning rate (and associated set of hyperparameters), monitor training, and quickly adjust the learning rate based on the results as they come in. The TrainingMonitor
callback is responsible for plotting the loss and accuracy curves of training and validation sets. And the EpochCheckpoint
callback is responsible for saving the model every 5 epochs.
The resnet_cifar10_decay
(check here) switches the method from "ctrl+c" to learning rate decay to train the network. The TrainingMonitor
callback again is responsible for plotting the loss and accuracy curves of training and validation sets. The LearningRateScheduler
callback is responsible for learning rate decay.
Here is the details about two callback classes:
The trainingmonitor.py
(check here) under pipeline/callbacks/
directory create a TrainingMonitor
callback that will be called at the end of every epoch when training a network. The monitor will construct a plot of training loss and accuracy. Applying such callback during training will enable us to babysit the training process and spot overfitting early, allowing us to abort the experiment and continue trying to tune parameters.
The EpochCheckpoint.py
(check here) can help to store individual checkpoints for ResNet so that we do not have to retrain the network from beginning. The model is stored every 5 epochs.
We could use following command to train the model if we start from the beginning.
python resenet_cifar10.py --checkpoints output/checkpoints
If we start the training at middle of the epochs (simply use a number to replace {epoch_number_you_want_to_start}
):
python resenet_cifar10.py --checkpoints output/checkpoints --model output/checkpoints/epoch_{epoch_number_you_want_to_start}.hdf --start_epoch {the_epoch_number_you_want_to_start}
For learning rate decay, just use following command:
python resenet_cifar10_decay.py --model output/resnet_cifar10.hdf5 --output output
The details about the challenge and dataset can be found here.
The tiny_imagenet_config.py
(check here) under config/
directory stores all relevant configurations for the project, including the paths to input images, total number of class labels, information on the training, validation, and testing splits, path to the HDF5 datasets, and path to output models, plots, and etc.
For details about how to build HDF5
file for Tiny ImageNet dataset, check the build_tiny_imagenet.py
in this repo.
The meanpreprocessor.py
(check here) under pipeline/preprocessing/
directory subtracts the mean red, green, and blue pixel intensties across the training set, which is a form of data normalization. Mean subtraction is used to reduce the affects of lighting variations during classification.
The simplepreprocessor.py
(check here) under pipeline/preprocessing/
directory defines a class to change the size of image. This class is just used to ensure that each input image has dimenison of 64x64x3.
The imagetoarraypreprocessor.py
(check here) under pipeline/preprocessing/
directory defines a class to convert the image dataset into keras-compatile arrays.
Table 2 demonstrates architecture of ResNet for Tiny ImageNet. ResNet for Tiny ImageNet uses 5x5 filters with stride of 1 for the first convolution layer due to small dimensions for Tiny ImageNet (64x64x3). For details about the architecture of ResNet for Tiny ImageNet, check here.
Table 2: ResNet for Tiny ImageNet.
layer name | output size | 41-layer |
---|---|---|
conv1 | 64 x 64 x 64 | 5 x 5, 64, stride 1 |
zero padding | 66 x 66 x 64 | 1 x 1, stride 1 |
max pool | 32 x 32 x 64 | 3 x 3, stride 2 |
conv2_x | 32 x 32 x 128 | [1 x 1, 32] [3 x 3, 32] x 3 [1 x 1, 128] |
conv3_x | 16 x 16 x 256 | [1 x 1, 64] [3 x 3, 64] x 4 [1 x 1, 256] |
conv4_x | 8 x 8 x 512 | [1 x 1, 128] [3 x 3, 128] x 6 [1 x 1, 512] |
avg pool | 1 x 1 x 512 | 8 x 8, stride 1 |
linear | 512 | |
softmax | 200 |
The ResNet for Tiny ImageNet can also be found in resnet.py
(check here). Remember to change the dataset name argument to tiny_imagenet
.
I use a "ctrl+c" method to train the model as a baseline. By using this method, I can start training with an initial learning rate (and associated set of hyperparameters), monitor training, and quickly adjust the learning rate based on the results as they come in.
The train.py
(check here) is responsible for training the baseline model. The TrainingMonitor
callback is responsible for plotting the loss and accuracy curves of training and validation sets. And the EpochCheckpoint
callback is responsible for saving the model every 5 epochs.
After getting a sense of baseline model, I will switch to use method of learning rate decay to re-train the model. The train_decay.py
(check here) change the method from "ctrl+c" to learning rate decay to re-train the model. The TrainingMonitor
callback again is responsible for plotting the loss and accuracy curves of training and validation sets. The LearningRateScheduler
callback is responsible for learning rate decay.
The rank_accuracy.py
(check here) measures the rank-1
and rank-5
accuracy of the model by using the testing set.
There are some helper classes for training process, including:
The EpochCheckpoint.py
(check here) can help to store individual checkpoints for ResNet so that we do not have to retrain the network from beginning. The model is stored every 5 epochs.
The hdf5datasetgenerator.py
(check here) under pipeline/io/
directory yields batches of images and labels from HDF5
dataset. This class can help to facilitate our ability to work with datasets that are too big to fit into memory.
The ranked.py
(check here) under pipeline/utils/
directory contains a helper function to measure both the rank-1
and rank-5
accuracy when the model is evaluated by using testing set.
We could use following command to train the model if we start from the beginning.
python train.py --checkpoints output/checkpoints
If we start the training at middle of the epochs (simply use a number to replace {epoch_number_you_want_to_start}
):
python train.py --checkpoints output/checkpoints --model output/checkpoints/epoch_{epoch_number_you_want_to_start}.hdf --start_epoch {the_epoch_number_you_want_to_start}
For learning rate decay, just use following command:
python train_decay.py --model output/resnet_tinyimagenet_decay.hdf5
In order to use testing set to evaluate the network, use the following command:
python rank_accuracy.py
In this experiment, I use original number of filters in ResNet for CIFAR-10, according to He et al (reference), which is (16, 32, 64) respectively for the residual module.
I use "ctrl+c" method with learning rate schedule shown as Table 3. SGD
optimizer with momentum of 0.9 is used.
Table 3: Learning rate schedule for experiment 1.
Epoch | Learning Rate |
---|---|
1 - 50 | 1e-1 |
51 - 75 | 1e-2 |
76 - 85 | 1e-3 |
Figure 2 demonstrates the loss and accuracy curve of training and validation sets. And Figure 3 shows the evaluation of the network, which indicate a 88.18% accuracy. Such accuracy is quite similar to what MiniVGG obtains, according this repo.
Figure 2: Plot of training and validation loss and accuracy for experiment 1.
Figure 3: Evaluation of the network, indicating 88.16% accuracy, for experiment 1.
After experiment 1, I decide to add more filters to the conv
layers so the network can learn richer features. Thus, I change number of filters from (16, 16, 32, 64) to (64, 64, 128, 256).
Figure 4 demonstrates the loss and accuracy curve of training and validation sets for experiment 2. And Figure 5 shows the evaluation of the network, which indicate a 93.22% accuracy, for experiment 2.
Figure 4: Plot of training and validation loss and accuracy for experiment 2.
Figure 5: Evaluation of the network, indicating 93.22% accuracy, for experiment 2.
For experiment 3, I switch method from "ctrl+c" to learning rate decay. The number of filters are still (64, 64, 128, 256). I increase the total number of epochs to 120.
Figure 6 demonstrates the loss and accuracy curve of training and validation sets for experiment 3. And Figure 7 shows the evaluation of the network, which indicate a 93.39% accuracy, for experiment 3.
Figure 6: Plot of training and validation loss and accuracy for experiment 3.
Figure 7: Evaluation of the network, indicating 93.39% accuracy, for experiment 3.
For experiment 4, I still use the method of learning rate decay, but increase the number of epochs to 150.
Figure 8 demonstrates the loss and accuracy curve of training and validation sets for experiment 4. And Figure 9 shows the evaluation of the network, which indicate a 93.79% accuracy, for experiment 4.
Figure 8: Plot of training and validation loss and accuracy for experiment 4.
Figure 9: Evaluation of the network, indicating 93.79% accuracy, for experiment 4.
I obtain 93.79% accuracy, thus successfully replicating the work of He et al (reference) on CIFAR-10 datasets.
For experiment 1, I use "ctrl+c" method with learning rate schedule shown as Table 4. SGD
optimizer with momentum of 0.9 is used.
Table 4: Learning rate schedule for experiment 1.
Epoch | Learning Rate |
---|---|
1 - 30 | 1e-1 |
31 - 50 | 1e-2 |
51 - 70 | 1e-3 |
Figure 10 demonstrates the loss and accuracy curve of training and validation sets for experiment 1. And Figure 11 shows the evaluation of the network, indicating a 57.27% rank-1 accuracy. But clearly, the overfitting occurs, especially for epochs from 40 to 70, and it gets more severe as epoch increments.
Figure 10: Plot of training and validation loss and accuracy for experiment 1.
Figure 11: Evaluation of the network, indicating 57.27% rank-1
accuracy, for experiment 1.
For experiment 2, I switch method from "ctrl+c" to learning rate decay.
Figure 12 demonstrates the loss and accuracy curve of training and validation sets for experiment 2. And Figure 13 shows the evaluation of the network, indicating a 57.93% rank-1 accuracy.
Figure 12: Plot of training and validation loss and accuracy for experiment 2.
Figure 13: Evaluation of the network, indicating 57.93% rank-1
accuracy, for experiment 2.
With such rank-1
accuracy, I can claim #5 on the Leaderboard in Tiny ImageNet Visual Recognition Challenge.