Clinically-Inspired Hierarchical Multi-Label Classification of Chest X-rays with a Penalty-Based Loss Function

This project implements an efficient, single-model hierarchical classifier for chest X-ray (CXR) image analysis, grounded in clinical insights. By leveraging deep learning techniques, the model predicts multiple pathology labels with high accuracy, while offering visual explanations and uncertainty estimation for each prediction. This clinically-informed hierarchical approach enhances interpretability and aligns with diagnostic workflows, addressing limitations in traditional classification models.

Key Features

Dataset: Trained on the CheXpert dataset, utilizing VisualCheXbert labels for improved pathology localization.
Custom Loss Function: Implements a penalty-based hierarchical binary cross-entropy loss to enforce clinically relevant label relationships.
Model Architecture: Developed on DenseNet121, a deep convolutional neural network known for its efficient feature propagation and superior performance in medical imaging tasks.
Uncertainty Quantification: Incorporates Monte-Carlo uncertainty calculations to provide uncertainty estimates for model predictions.
Explainability: Supports Class Activation Map (CAM) heatmaps using Grad-CAM method for visualizing model attention, enhancing interpretability for clinical applications.
Multi-view Support: Predicts pathologies from both frontal and lateral view CXRs.
API Integration: The model is deployed using an async FastAPI server, providing a lightweight, scalable inference service with JSON input/output.

Quickstart

Run a basic inference on CXR images in just a few steps:

Clone the repository and install dependencies:

git clone https://github.com/the-mercury/CIHMLC.git

cd CIHMLC

pip install -r requirements.txt

Start the server and make a prediction:

uvicorn src.cxr_inference_app:app --host 0.0.0.0 --port 8000
 
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cxr_base64": "<base64-encoded-image>"}'

For detailed setup instructions and Docker deployment, see Installation and Usage.

Demo

Here’s an example of a CAM heatmap generated by the model:

(All the CAM heatmaps generated by the default model for this image can be found here)

Ground Truth (GT) Pathologies:

Atelectasis
Cardiomegaly
Edema
Enlarged Cardiomediastinum
Lung Opacity
Pleural Effusion

For a detailed walkthrough of the inference process, refer to the API section.

Project Structure

.
├── data/                                               # Directory for CheXpert dataset and label files
│   ├── CheXpert/                                       # Patient images
│   ├── train_labels.csv                                # Training labels
│   └── val_labels.csv                                  # Validation labels
│
├── docker/                                             # Docker configuration files
│   ├── Dockerfile
│   └── docker-compose.yml
│
├── fresh_models/                                       # Trained model checkpoints
│   └── model_name.keras
│
├── logs/                                               # Logs, heatmaps, and training metrics
│   ├── heatmaps/                                       # Generated CAM visualizations
│   └── tensorboard/                                    # TensorBoard logs for visualization
│
├── src/                                                # Source code
│   ├── helpers/                                        # Utility scripts
│   │   └── cam.py                                      # Visualization
│   ├── data/                                           # Data loaders
│   │   └── chexpert_data_loader.py
│   ├── config.py                                       # Configuration file
│   ├── cxr_inference_app.py                            # FastAPI application for model inference
│   ├── hierarchical_binary_cross_entropy.py            # Custom loss function
│   └── train.py                                        # Script to train the model
│
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt

(For a complete structure, refer to the repository)

Installation and Usage

Prerequisites

Ensure you have the following installed:

Python 3.9+
Docker
Docker Compose

Inference

Running with Docker

Clone the repository:

git clone https://github.com/the-mercury/CIHMLC.git
cd CIHMLC

Build and run the Docker containers:

cd docker
docker compose up --build   # Make sure Docker and Docker Compose are installed

Running without Docker (Custom Configuration)

Clone the repository:

git clone https://github.com/the-mercury/CIHMLC.git
cd CIHMLC

Install the requirements:
```
 pip install -r requirements.txt
```

To start the FastAPI prediction service:

uvicorn src.cxr_inference_app:app --host [IP] --port [port_num] --workers [num_workers]

Make a prediction:

 curl -X POST "http://[IP]:[port_num]/predict" -H "Content-Type: application/json" -d '{"cxr_base64": "<base64-encoded-image>"}'

Note: Replace with actual base64 data. The cxr_base64 field should contain the Base64-encoded string of the CXR image. You can convert an image using Python’s base64 library or online converters.

import base64

with open("path_to_cxr_image.jpg", "rb") as img_file:
  base64_string = base64.b64encode(img_file.read()).decode('utf-8')

print(base64_string)  # Use this string in the API request

API

The API exposes a /predict endpoint to make predictions on CXR images. The request format is as follows:

Request Format

curl -X POST "http://[IP]:[port]/predict" -H "Content-Type: application/json" -d '{"cxr_base64": "<base64-encoded-image>"}'

Replace [IP]:[port] with your designated number.

   {
       "cxr_base64": "<base64-encoded-image>"
   }

Response Format

   {
  "success": true,
  "heatmap": {
    "Atelectasis": "<base64-encoded-heatmap>",
    "Cardiomegaly": "<base64-encoded-heatmap>",
    ...
  },
  "prediction_mean": {
    "Atelectasis": 0.5,
    "Cardiomegaly": 0.8,
    ...
  },
  "prediction_variance": {
    "Atelectasis": 0.03,
    "Cardiomegaly": 0.05,
    ...
  },
  "inference_duration": 20
}

* The CAM heatmaps are also stored in `/logs/heatmaps/[model_name]` directory in `.png` format.

Training the model

To train the model, execute the following steps:

Clone the repository:

git clone https://github.com/the-mercury/CIHMLC.git
cd CIHMLC

Install the requirements:
```
 pip install -r requirements.txt
```
Start the training:
```
python src/train.py
```

NOTE:

The model will be trained based on configurations specified in config.py, and the new models will be stored in fresh_models/[model_name] directory including the best AUROC, and the best loss checkpoints.
If you need to replace the default model, you can move the newly trained model to the src/assets/models directory and rename it, or update the model directory and name settings in config.py.
To monitor training performance, logs will be saved in the /logs/tensorboard directory, which can be visualized using TensorBoard:
```
tensorboard --logdir=logs/tensorboard
```

Configurations

The configuration is managed through the Config class in src/config.py. Key parameters include:

Device settings
Project-specific settings
Model architecture and training settings
Data paths and preprocessing options

Dataset

This project used the CheXpert dataset, and the VisualCheXbert labels. For more details on these resources, please refer to the following publications:

Note: The CheXpert dataset requires registration and approval from the authors. Follow this link for access.

    @inproceedings{irvin2019chexpert,
      title={CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison},
      author={Irvin, Jeremy and Rajpurkar, Pranav and Ko, Michael and Yu, Yifan and Ciurea-Ilcus, Silviana and Chute, Chris and Marklund, Henrik and Haghgoo, Behzad and Ball, Robyn and Shpanskaya, Katie and others},
      booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
      volume={33},
      pages={590--597},
      year={2019}
    }

    @inproceedings{smit2022visualchexbert,
      title={VisualCheXbert: Adaptation of CheXbert for Improved Performance in Localizing Pathologies in Chest X-rays},
      author={Smit, Alice and Taylor, Aaron and Srinivasan, Bharath and Bindal, Akshay and Trivedi, Hiren and Ma, Maxwell and Ng, Andrew Y and Piech, Chris and Rajpurkar, Pranav},
      booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
      year={2022},
      pages={12345-12356}
    }

Citation

If you find this work useful, please cite:

For more details, see the full paper on arXiv.

    @article{asadi2025cihmlc,
        title={Clinically-Inspired Hierarchical Multi-Label Classification of Chest X-rays with a Penalty-Based Loss Function},
        author={Asadi, Mehrdad and Sodoké, Komi and Gerard, Ian J. and Kersten-Oertel, Marta},
        journal={arXiv preprint arXiv:2502.03591},
        year={2025},
        pages={1--9},
        doi={10.48550/arXiv.2502.03591},
        url={https://doi.org/10.48550/arXiv.2502.03591},
    }

Contributing

Contributions are welcomed!

To get involved, please follow these steps:

Fork the repository.
Create a new branch:
```
git checkout -b my-feature-branch
```
Commit your changes:
```
git commit -am 'Add new feature'
```
Push to the branch:
```
git push origin my-feature-branch
```
Submit a pull request for review.

Acknowledgements

Special thanks to the Stanford ML Group for the CheXpert dataset and to the creators of VisualCheXbert.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinically-Inspired Hierarchical Multi-Label Classification of Chest X-rays with a Penalty-Based Loss Function

Key Features

Table of Contents

Quickstart

Demo

Project Structure

Installation and Usage

Prerequisites

Inference

Running with Docker

Running without Docker (Custom Configuration)

API

Request Format

Response Format

* The CAM heatmaps are also stored in `/logs/heatmaps/[model_name]` directory in `.png` format.

Training the model

Configurations

Dataset

Citation

Contributing

Acknowledgements

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
default_model_files		default_model_files
docker		docker
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

the-mercury/CIHMLC

Folders and files

Latest commit

History

Repository files navigation

Clinically-Inspired Hierarchical Multi-Label Classification of Chest X-rays with a Penalty-Based Loss Function

Key Features

Table of Contents

Quickstart

Demo

Project Structure

Installation and Usage

Prerequisites

Inference

Running with Docker

Running without Docker (Custom Configuration)

API

Request Format

Response Format

* The CAM heatmaps are also stored in /logs/heatmaps/[model_name] directory in .png format.

Training the model

Configurations

Dataset

Citation

Contributing

Acknowledgements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

* The CAM heatmaps are also stored in `/logs/heatmaps/[model_name]` directory in `.png` format.

Packages