Project README: Pneumonia Detection from Chest X-Ray Images

This project aims to detect pneumonia from lung X-ray images using various classification approaches: two based on Logistic Regression (LR) and one based on a Convolutional Neural Network (CNN).

1. Introduction

The rise of artificial intelligence and machine learning in healthcare has enabled increasingly efficient medical image analysis tools. Chest X-ray is the most common method to detect pneumonia. This project implements several automated detection methods to help healthcare professionals refine their diagnoses.

2. Project Objective

This project seeks to classify lung X-ray images into two categories:

“Normal” images (no pneumonia)
“Pneumonia” images (presence of pneumonia)

To achieve this, we propose three distinct approaches:

Logistic Regression with a single column containing combined extracted features (Gabor, DCT, Fourier, PHOG).
Logistic Regression with multiple columns, where each column represents one feature type (Gabor, DCT, Fourier, PHOG).
Convolutional Neural Network (CNN), which learns directly from the raw images.

3. Medical Context: Pneumonia

Pneumonia is a respiratory infection that affects the lungs, typically caused by bacteria, viruses, or fungi. Chest X-rays are essential for diagnosis, but reading them can be complex and time-consuming. Automating detection helps reduce diagnosis time and improve accuracy.

4. Methodology

Data Collection

5200 total X-ray images, including 1324 normal images and 3876 images showing pneumonia.
Images were provided by a professor, covering various pneumonia cases (bacterial, viral, etc.).

Image Preprocessing

Histogram equalization to enhance image quality.
Noise reduction (filtering) to remove unwanted artifacts.
Dimension normalization for consistent image sizes across samples.

Feature Extraction (for LR)

We use four types of descriptors to represent the images:

Gabor Filter: highlights textures and orientations.
Fourier Transform: analyzes the image in the frequency domain to detect periodic patterns.
Discrete Cosine Transform (DCT): reduces spatial redundancies, useful for compression and variation detection.
Pyramid Histogram of Oriented Gradients (PHOG): highlights contours by counting oriented gradients.

These features are either:

Combined into a single column (single_feature.csv), or
Separated into four columns (multi_features.csv).

Classification (LR and CNN)

Logistic Regression:
- lr_single_feature.ipynb: training and evaluation using a single column of features.
- lr_multi_features.ipynb: training and evaluation using multiple feature columns (Gabor, DCT, Fourier, PHOG in separate columns).
Convolutional Neural Network:
- tensorflow-cnn.ipynb: building and training a CNN model from raw images.
- cnn_model.h5: a saved version of the best CNN model, used for inference or evaluation.

5. Project Structure

DETECT_PNEUMONIE/     
├── core/
│   ├── ImageFeatureExtractor.py         # CSV filling using feature extraction
│   └── PneumoniaDetectorApp.py          # Tkinter main app
├── data/
│   ├── multi_features.csv               # Gabor, DCT, Fourier, PHOG features in separate columns
│   └── single_feature.csv               # Combined features in a single column
├── metrics/                             # Project models metrics 
│   ├── cnn                              # CNN metrics                             
│   └── logistic_regression              # Logistic regression metrics              
│       ├── multi_features               # single model metrics      
│       └── single_feature               # multi model metrics 
├── model_training/
│   ├── cnn/
│   │   └── tensorflow-cnn.ipynb         # Notebook for CNN training
│   └── logistic_regression/
│       ├── lr_multi_features.ipynb      # Notebook for multi-column LR
│       └── lr_single_feature.ipynb      # Notebook for single-column LR
├── models/
│   ├── logistic_regression_model.pkl     # single feature LR model
│   ├── logreg-v3.pkl                     # Multiple features LR model
│   └── my_model-v4-bestOne.h5            # Best trained CNN model
├── utils/
│   ├── RemplissageCSV.ipynb              # CSV filling using feature extraction
│   ├── Renamepic.ipynb                   # Bulk renaming script for images
│   └── .gitignore
├── environment.yml                       # Conda environment file
├── main.py                               # project entry point
├── README.md                             # This README document
└── test.ipynb                            # Miscellaneous test notebook

6. Prerequisites and Installation

Install Anaconda/Miniconda (highly recommended) or ensure you have Python 3.x.

Clone this repository:

git clone https://github.com/your-repo/detect_pneumonie.git
cd detect_pneumonie

Create an environment from the environment.yml file:
```
conda env create -f environment.yml
conda activate detect_pneumonie_env
```
Or manually install the dependencies listed in environment.yml.
Verify that you have:
- Python 3.7+
- TensorFlow (or PyTorch, if mentioned in the environment)
- NumPy, SciPy, scikit-learn, OpenCV, etc.

7. Usage

Detection Methods

Logistic Regression (multi or single feature)
- Open either lr_single_feature.ipynb or lr_multi_features.ipynb in a Jupyter environment.
- Run the cells to:
  1. Load the CSV data.
  2. Train the LR model.
  3. Evaluate performance (accuracy, recall, F1, confusion matrix).
Convolutional Neural Network (CNN)
- Open tensorflow-cnn.ipynb.
- Run the notebook to:
  1. Load the images.
  2. Build, compile, and train the CNN.
  3. Evaluate performance on a test set.
- The final model is saved in my_model-v4-bestOne.h5.

Running via `main.py`

main.py serve as an entry point to run a complete pipeline or launch a quick test interface.
- For example:
```
python main.py
```
- Options may vary depending on your implementation in PneumoniaDetectorApp.py.

Utility Scripts

RemplissageCSV.ipynb: Demonstration of how to fill in CSVs by extracting features from images.
Renamepic.ipynb: Allows bulk renaming of images (useful for dataset organization).

8. Performance Evaluation

Model performance is evaluated using:

Accuracy, Recall, and F1-score.
Confusion matrix to analyze true positives, false positives, true negatives, and false negatives.

The goal is to minimize false negatives (misdiagnosed patients) and achieve high overall accuracy for a reliable medical tool.

9. Conclusion and Future Work

This project demonstrates how different machine learning approaches (Logistic Regression and CNN) can be applied to pneumonia detection in chest X-rays. Results show that:

Logistic Regression is a simple, fast model that performs well given properly engineered features.
Convolutional Neural Networks (CNN) often yield better performance, provided there is sufficient data and computational power for training.

Future Work:

Use more diverse data from various sources (hospitals, research centers).
Enhance the CNN model (additional layers, fine-tuning on pretrained models, etc.).
Develop a user interface (web API, desktop app) to make integration into medical workflows more seamless.

10. Authors and Credits

Project Team: Chawki Belhadid, Samir Akram OUNIS.
Data: 5200 X-ray images.
Libraries: TensorFlow, scikit-learn, NumPy, OpenCV, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project README: Pneumonia Detection from Chest X-Ray Images

Table of Contents

1. Introduction

2. Project Objective

3. Medical Context: Pneumonia

4. Methodology

Data Collection

Image Preprocessing

Feature Extraction (for LR)

Classification (LR and CNN)

5. Project Structure

6. Prerequisites and Installation

7. Usage

Detection Methods

Running via `main.py`

Utility Scripts

8. Performance Evaluation

9. Conclusion and Future Work

10. Authors and Credits

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
core		core
data		data
metrics		metrics
model_training		model_training
models		models
rapport		rapport
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py
test.ipynb		test.ipynb

License

chawkibhd/detect_Pneumonie

Folders and files

Latest commit

History

Repository files navigation

Project README: Pneumonia Detection from Chest X-Ray Images

Table of Contents

1. Introduction

2. Project Objective

3. Medical Context: Pneumonia

4. Methodology

Data Collection

Image Preprocessing

Feature Extraction (for LR)

Classification (LR and CNN)

5. Project Structure

6. Prerequisites and Installation

7. Usage

Detection Methods

Running via main.py

Utility Scripts

8. Performance Evaluation

9. Conclusion and Future Work

10. Authors and Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Running via `main.py`

Packages