Skip to content

Latest commit

 

History

History
51 lines (29 loc) · 12.9 KB

File metadata and controls

51 lines (29 loc) · 12.9 KB

Classification of X-Ray images using Machine Learning

Key Words: Python, TensorFlow, Machine Learning, Deep Learning, Convolutional Neural Network, Image Classification, CoAtNet, AWS Sagemaker

Overview

This project implemented a CNN model to classify pneumonia, normal patients and Covid-19 positive X-Ray images which is important for lessening the burden of diagnosis for healthcare systems. To do this, we built the CoAtNet model with TensorFlow and added a few layers to this model according to solve this 3-class classification problem. This model has a testing accuracy of 89%, which is higher than that of the basic ResNet model (85%). This result has a great potential in X-Ray image classification for Covid-19 detections.

Background and References

The main motivation of this project is to use machine learning tools to classify medical images in order to help diagnosis. This is a classic machine learning application and we believe an algorithm which can distinguish between common pneumonia and coronavirus disease can make it even more practical. The current Covid-19 pandemic is overwhelming and it brings a huge impact on everyone in this world, especially the medical personnels. The chest X-Ray imaging is cost-effective therefore very useful for screening purposes. However it’s not as sensitive as chest CT for pulmonary diseases so accurate manual diagnosis can be challenging, especially when there are lots of positive cases everyday. So, an accurate and robust classification system will make this process very efficient.

The training and testing data are lung X-Ray images, belonging to three classes: Covid-19, normal and pneumonia. They come from a Github dataset. The authors made this meta dataset by gathering photos from several covid-19 chest X-Ray and radiography databases, as well as some RSNA pneumonia and RSNA international covid-19 databases. They already split the data into two folders, one for training (around 30,000 images) and the other for testing (around 400 images). There are also two csv files that list the name and label for each image, which facilitates our work. The model we’re using is CoAtNet, which is a family of hybrid models that combines the strengths from both Transformers and convolutional networks. When pre-trained with 13M images from ImageNet-21K, the CoAtNet model achieves 88.56% top-1 accuracy. In this project, we use TensorFlow to implement the CoAtNet architecture to classify the X-Ray images.

The idea of this project comes from a Kaggle project. The shortcoming is the lack of dropout for regularization in the neural network architecture, which is useful to prevent overfitting on the training data. Also, the number of training samples of one class is 3 times larger than that of another class, which causes imbalanced classification. We applied this idea on the Covid data set to make it a 3-class project. We used the ResNet model from Keras as a baseline model, and implemented the CoAtNet model to further improve the performances. Compared with the original work, we addressed the problem of imbalance, and utilized the input-adaptive weighting to make the attention model learn the relationship between different elements of the input.

Detailed Description

Methods

To first experiment on our 3-class case, we applied ResNet on the training images. ResNet is constructed with the ResNet50 V2 base model, containing a global average pooling layer, a dense layer with 128 filters and Relu activation function, and an additional dense layer with 3 kernels and softmax activation function, which is commonly used for multi-classification tasks. On the basis of ResNet, we incorporated batch normalization to reduce internal covariate shift and the dropout probability to account for unknown certainty. Class weight assignment was also applied to mitigate the potential impact of imbalanced classification and reduce variance. During model training, we also used other regularization strategies such as early stop and reduced learning rate on the validation loss and to prevent overfitting and avoid learning stagnates. The results of this model would be a baseline to be compared with, and it forms our risk mitigation plan. Both ResNet and CoAtNet models were first implemented and tested with Google Colab. We used free GPU resources to make sure the models can run successfully. The final version of these models were trained with the whole dataset using AWS SageMaker. We created a ml.p3.2xlarge instance with 20GB EBS.

ResNet

Secondly, the ResNet model is also used to roughly test our input dataset. This GitHub dataset includes adequate images (more than 30,000) and there’s no need to clean the images themselves. However, current training data is imbalanced. The number of Covid-19 images is about 2 fold of normal images and 3 fold of pneumonia images. This may result in high accuracy in training data but relatively low accuracy in testing data. To make our model more general, we used the resample function from sklearn package to downsample normal class and Covid-19 class. This resulted in about 16,000 images which is still enough for training. The results of training balanced data with ResNet will address the problem of overfitting. We can adjust the size of input data from each class to get a better testing accuracy before feeding them to our final model.

Thirdly, the main task is to implement the CoAtNet model. The key idea of this model is to hybridize convolutional networks and self-attention models like Transformers in order to take into account generalization as well as model capacity. To effectively combine them, the authors used depthwise convolution (MBConv) since it can be effectively merged into attention layers. A MBConv block (also called an Inverted Residual Block) is a type of residual block used for image models that uses an inverted structure for efficiency reasons. It was originally proposed for the MobileNetV2 CNN architecture [Mark Sandler, 2019]. A traditional Residual block has a wide -> narrow -> wide structure with the number of channels, while a MBConv block follows a narrow -> wide -> narrow approach, hence the inversion.

MBConv

Besides, the authors found simply stacking convolutional and attention layers can achieve better generalization and capacity. They constructed a network of 5 stages (S0 - S4). The building blocks are Convolution block (C), MBConv block (M) and Transformer block (T). Different layouts of stages, number of blocks (L) and hidden dimension (D, number of channels) form a family of CoAtNet models. In this project, we experimented with CoAtNet-0 to CoAtNet-4. They all used the C-M-M-T-T structure and the differences between them are summarized in Table 1.

CoAtNet

Finally, to evaluate our models, we first measured the performance by looking at testing accuracy, learning curves as well as confusion matrices. Secondly, the minimum number of epochs taken to reach the maximum accuracy can reflect the quality of the model. Finally, the comparison between training and testing accuracy will be a good way to address the overfitting problem. Our goal is to achieve at least 85% testing accuracy, the training accuracy converges within 20 epochs and the difference between training accuracy and testing accuracy is not big.

Experiments and Results

We began with the same parameters as used in the Kaggle project and only achieved 60% training accuracy and 50% testing accuracy. We first tried to change the image size with the image data generator. It was interesting that the training accuracy had a huge increase when we changed the image size from (224,224,3) to (300,300,3). Then we tried to adjust the learning rate to further improve this ResNet model. The original learning rate is 1e-4, and there is no improvement when increasing it. However, when we tried to decrease it to 1e-5, we saw a high increase in training accuracy and testing accuracy. For now, we ran this model in Colab with only 3,000 balanced training images and we decided to move to SageMaker in AWS to train the whole dataset. With more images, this model achieved higher accuracy than what we saw in Colab. Although we had 30,000 training images, we only retained 16,000 after balancing the data by downsampling. After adjusting the learning rate in SageMaker, we settled on a model with 96% training accuracy and 85% testing accuracy with 12 epochs. It’s very likely that this model is overfitting, but the results are great so we decided to move forward to CoAtNet. The following plots display how the training and validation loss and accuracy evolve with the epochs. The validation curves (red) did not change together with the training curves (blue), indicating the problem of overfitting. The confusion matrix (see supplementary) shows the classification accuracy of the ResNet model, which performs pretty well on Covid-19 class, while the other two are less accurate.

LC1 CM1

After building the basic CoAtNet model from TensorFlow according to the publication, we first added a global average pooling layer to average on the spatial axis. Then we used the Dense function to decrease the output space to 128 with ReLU as the activation function. After this, we added a dropout layer with 0.2. The final layer was a Dense function with 3 units and the activation function here was softmax. CoAtNet was first published to solve the 1000-class classification problem in ImageNet and we had to modify it to 3. Now this model can complete the 3-class classification task.

The paper introduced 8 CoAtNet models from CoAtNet0 to CoAtNet7, while we tested CoAtNet0 to CoAtNet4 and chose the best. It turned out CoAtNet0 had the best accuracy because others have more blocks (L) and channels (D) as shown in Table 1, thus we had to reduce the batch size to fulfill RAM limitations. This gave bad results so finally we only considered CoAtNet0. Unfortunately, SageMaker does not support the latest version of TensorFlow (2.8) and we can’t run our code on AWS. So the final training and testing accuracy are trained with only around 8,000 images (around 3,000 after down-sampling). The input shape of (224,224,3) had better results this time, and a learning rate of 5e-6 gave the best training result which is much lower than ResNet. The training and validation curves have similar trends here, so this model is not considered overfitting and it is learning a truly correct model, which is good for training data. The final training accuracy is 87% but the testing accuracy is 89%, which is better than the testing accuracy of ResNet. The confusion matrix shows it correctly classifies more images than ResNet. In conclusion, we trained the CoAtNet0 model with around 3,000 images for 15 epochs with a learning rate of 5e-6 and achieved higher testing accuracy than the RestNet model. It could be even better if we trained it with more images.

LC2 CM2 Table2

Conclusion

To conclude, we implemented ResNet and CoAtNet with TensorFlow and successfully classified X-Ray images from Covid-19, pneumonia and normal classes. The CoAtNet0 model was trained with 3,367 images for 15 epochs with a learning rate of 5e-6, achieving a testing accuracy of 89% without signs of overfitting. The results are much better than the baseline RestNet model. During working on this project, we learned how to apply published models on new fields and modify them according to our requirements. Others may find this project valuable when building CNN architectures for image classification. In terms of social impacts, the results of this project reduce the pressure on the healthcare system. The improvement on the classification rate determines the risk of patients quickly and thwarts the pandemic. It could support the long-standing health care technology deficiency on patient matching and disease. For ethical considerations, the X-Ray images are anonymized effectively by removing the identifying individual marks, and the patients could not be identified from the X-ray images. The duty of confidence does not apply to the anonymous information, so the privacy of the patients will not be disclosed.