🧑🏻🔬GL-LCM: Global-Local Latent Consistency Models for Fast High-Resolution Bone Suppression in Chest X-Ray Images
This code is a pytorch implementation of our paper "GL-LCM: Global-Local Latent Consistency Models for Fast High-Resolution Bone Suppression in Chest X-Ray Images".
To overcome these challenges, we propose Global-Local Latent Consistency Model (GL-LCM). This is a novel framework for fast high-resolution bone suppression in CXR images based on Latent Consistency Models (LCMs). Our key contributions are summarized as follows:
-
🕐The GL-LCM architecture facilitates effective bone suppression while retaining texture details. This is achieved through the design of dual-path sampling in the latent space combined with global-local fusion in the pixel space.
-
🕑GL-LCM significantly enhances inference efficiency, which requires only approximately 10% of the inference time of current diffusion-based methods, making it more suitable for clinical applications.
-
🕒We introduce Local-Enhanced Guidance (LEG) to mitigate potential boundary artifacts and detail blurring issues in local-path sampling, without additional training.
-
🕓Extensive experiments on both the self-collected dataset SZCH-X-Rays and the public dataset JSRT demonstrate exceptional performance and efficiency of our GL-LCM.
Overview of GL-LCM framework. (a) Lung segmentation in the pixel space, (b) Dual-path sampling in the latent space, and (c) Global-local fusion in the pixel space.
Method | BSR (%)↑ | MSE (10⁻³)↓ | PSNR↑ | LPIPS↓ |
---|---|---|---|---|
Universal Method | ||||
VAE | 91.281 ± 3.088 | 1.169 ± 1.059 | 30.018 ± 2.007 | 0.237 ± 0.047 |
VQ-VAE | 94.485 ± 2.407 | 0.645 ± 0.596 | 32.600 ± 2.071 | 0.137 ± 0.029 |
VQGAN | 94.330 ± 3.402 | 0.923 ± 2.478 | 32.096 ± 2.420 | 0.083 ± 0.020 |
Task-Specific Method | ||||
Gusarev et al. | 94.142 ± 2.666 | 1.028 ± 2.201 | 31.369 ± 2.385 | 0.156 ± 0.031 |
MCA-Net | 95.442 ± 2.095 | 0.611 ± 0.435 | 32.689 ± 1.939 | 0.079 ± 0.018 |
ResNet-BS | 94.508 ± 1.733 | 0.646 ± 0.339 | 32.265 ± 1.635 | 0.107 ± 0.022 |
Wang et al. | 89.767 ± 6.079 | 1.080 ± 0.610 | 29.963 ± 1.378 | 0.072 ± 0.016 |
BS-Diff | 92.428 ± 3.258 | 0.947 ± 0.510 | 30.627 ± 1.690 | 0.212 ± 0.041 |
BS-LDM | 94.159 ± 2.751 | 0.701 ± 0.293 | 31.953 ± 1.969 | 0.070 ± 0.018 |
GL-LCM (Ours) | 95.611 ± 1.529 | 0.512 ± 0.293 | 33.347 ± 1.829 | 0.056 ± 0.015 |
Method | BSR (%)↑ | MSE (10⁻³)↓ | PSNR↑ | LPIPS↓ |
---|---|---|---|---|
Universal Method | ||||
VAE | 85.646 ± 9.327 | 1.224 ± 0.749 | 29.814 ± 2.364 | 0.155 ± 0.032 |
VQ-VAE | 86.445 ± 8.881 | 0.986 ± 0.596 | 30.712 ± 2.273 | 0.062 ± 0.017 |
VQGAN | 86.594 ± 8.916 | 1.002 ± 0.606 | 30.635 ± 2.255 | 0.061 ± 0.017 |
Task-Specific Method | ||||
Gusarev et al. | 89.283 ± 8.288 | 0.821 ± 0.570 | 31.700 ± 2.594 | 0.100 ± 0.024 |
MCA-Net | 86.887 ± 9.825 | 0.876 ± 0.625 | 31.577 ± 2.905 | 0.057 ± 0.017 |
ResNet-BS | 88.782 ± 8.905 | 0.960 ± 0.661 | 31.021 ± 2.576 | 0.060 ± 0.016 |
Wang et al. | 89.679 ± 9.477 | 1.013 ± 0.655 | 30.681 ± 2.431 | 0.075 ± 0.015 |
BS-Diff | 88.707 ± 8.859 | 1.003 ± 0.655 | 30.765 ± 2.504 | 0.154 ± 0.037 |
BS-LDM | 89.322 ± 9.562 | 0.783 ± 0.632 | 32.307 ± 3.231 | 0.058 ± 0.017 |
GL-LCM (Ours) | 90.056 ± 10.635 | 0.746 ± 0.680 | 32.951 ± 3.799 | 0.052 ± 0.015 |
Method | Sampler | Sampling Steps | Parameters | Inference Time (s) |
---|---|---|---|---|
BS-Diff | DDPM | 1000 | 254.7M | 108.86 |
BS-LDM | DDPM | 1000 | 421.3M | 84.62 |
GL-LCM (Ours) | LCM | 50 | 436.9M | 8.54 |
A pseudo-color zoomed-in view is shown in the bottom right corner, and the green arrows mark the boundary artifacts.
Guidance Method | SZCH-X-Rays | JSRT | ||
---|---|---|---|---|
PSNR↑ | LPIPS↓ | PSNR↑ | LPIPS↓ | |
Vanilla Guidance | 32.777 ± 2.091 | 0.058 ± 0.016 | 32.296 ± 3.454 | 0.073 ± 0.020 |
CFG | 32.315 ± 1.717 | 0.068 ± 0.013 | 32.613 ± 3.604 | 0.070 ± 0.015 |
LEG (Ours) | 33.347 ± 1.829 | 0.056 ± 0.015 | 32.951 ± 3.799 | 0.052 ± 0.015 |
Fusion Strategy | SZCH-X-Rays | JSRT | ||
---|---|---|---|---|
PSNR↑ | LPIPS↓ | PSNR↑ | LPIPS↓ | |
✗ | 31.360 ± 2.079 | 0.091 ± 0.020 | 31.638 ± 3.078 | 0.074 ± 0.021 |
α-Fusion | 29.781 ± 1.522 | 0.181 ± 0.021 | 31.784 ± 3.043 | 0.092 ± 0.013 |
AE Fusion | 30.850 ± 1.806 | 0.141 ± 0.028 | 31.835 ± 3.075 | 0.061 ± 0.017 |
Poisson Fusion (Ours) | 33.347 ± 1.829 | 0.056 ± 0.015 | 32.951 ± 3.799 | 0.052 ± 0.015 |
-
Linux
-
Python>=3.7
-
NVIDIA GPU (memory>=32G) + CUDA cuDNN
VQGAN - SZCH-X-Rays UNet - SZCH-X-Rays VQGAN - JSRT UNet - JSRT
The original JSRT dataset and processed JSRT dataset are located at https://drive.google.com/file/d/1RkiU85FFfouWuKQbpD7Pc7o3aZ7KrpYf/view?usp=sharing and https://drive.google.com/file/d/1o-T5l2RKdT5J75eBsqajqAuHPfZnzPhj/view?usp=sharing, respectively.
Three paired images with CXRs and DES soft-tissues images of SZCH-X-Rays for testing are located at
└─data
├─ CXR
│ ├─ 0.png
│ ├─ 1.png
│ └─ 2.png
└─ BS
├─ 0.png
├─ 1.png
└─ 2.png
To implement lung segmentation in data preparation, please use lungSegmentation.ipynb.
pip install -r requirements.txt
To do the evaluation process of VQGAN for visualization, please run the following command:
python vq-gan_eval.py
To do the evaluation process of GL-LCM, please run the following command:
python batch_lcm_eval.py
If you want to train our model by yourself, you are primarily expected to split the whole dataset into training, validation and testing sets. Please run the following command:
python dataSegmentation.py
Then, you can run the following command to train the VQGAN model:
python vq-gan_train.py
Then after finishing the training of VQGAN, you can use the saved VQGAN model when training the noise estimator network of GL-LCM by running the following command:
python lcm_train.py
You can also run the following command about evaluation metrics including BSR, MSE, PSNR and LPIPS:
python metrics.py