Skip to content

Centralized resource for breast imaging and histopathology datasets (Ultrasound, DBT, Mammography, MRI). Includes use cases, characteristics, and access info—supporting development of diagnostic tools, training algorithms, and advancing breast cancer research.

Notifications You must be signed in to change notification settings

hugofigueiras/Breast-Cancer-Imaging-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 

Repository files navigation

Breast-Cancer-Imaging-Datasets

This repository serves as a centralized resource listing various breast imaging and pathology datasets commonly used in academic research, clinical training, and machine learning applications. The goal is to provide detailed summaries of key characteristics, usage instructions, and guidance on how to obtain these datasets, all in one place. If you know any more datasets, and want to contribute, please submit a pull request and I'll happily approve it.

Introduction

This repository provides a curated list of breast imaging and histopathology datasets, aiming to streamline access for researchers, clinicians, and students. Here, the datasets are separated by each modality for easier understanding. This repository is composed of 35 publicly available datasets.

Below is a histogram showing the number of datasets in each modality, and pie chart representations that visualize the distribution of datasets by modality.

Histogram of Datasets by Modality:

Histogram of Datasets by Modality

Pie Chart of Datasets Distribution by Modality:

Pie Chart of Datasets Distribution

Table of Contents

Datasets

Ultrasound

Dataset Subjects Nº Samples Format Size Year Cite Access data
Breast Ultrasound Images (BUSI) 600 780 PNG 204MB 2020 Dataset of breast ultrasound images Download here
Breast Lesions USG 256 522 PNG 66.67MB 2024 Curated benchmark dataset for ultrasound based breast lesion analysis Download here
UDIAT Breast Ultrasound Dataset B 163 163 N/A N/A 2017 Automated Breast Ultrasound Lesions Detection Using Convolutional Neural Networks Resquest Permission
OASBUD 78 200 Matlab 296.8MB 2017 Open access database of raw ultrasonic signals acquired from malignant and benign breast lesions Downnload here
BUS Synthetic Dataset 0 500 PNG 9.7MB 2023 PDF-UNet: A semi-supervised method for segmentation of breast tumor images using a U-shaped pyramid-dilated network Download here

Summaries:

  • BUSI: Small (around 500×500 px) ultrasound images suitable for classification of benign vs. malignant lesions and segmentation tasks.
  • Breast Lesions USG: Ultrasound images capturing various lesions; ideal for lesion detection, classification, and segmentation.
  • UDIAT Dataset B: Ultrasound scans for lesion analysis; can be used to develop detection and classification methods.
  • OASBUD: Provides raw ultrasound signals, enabling advanced signal processing, segmentation, and classification methods.
  • BUS Synthetic Dataset: Synthetic ultrasound images generated for model training and data augmentation, useful in classification and segmentation tasks.

Digital Breast Tomosynthesis (DBT)

Dataset Subjects Nº Samples Format Size Year Cite Download
Breast Cancer Screening DBT 5060 22 032 DICOM 1.63TB 2024 A Data Set and Deep Learning Algorithm for the Detection of Masses and Architectural Distortions in Digital Breast Tomosynthesis Images Download here
EA1141 1444 500 DICOM 2.82TB 2023 Abbreviated Breast MRI and Digital Tomosynthesis Mammography in Screening Women With Dense Breasts (EA1141) (Version 1) (dataset) Download here
VICTRE 2994 2994 DICOM 1.03TB 2019 The VICTRE Trial: Open-Source, In-Silico Clinical Trial for Evaluating Digital Breast Tomosynthesis Download here

Summaries:

  • Breast Cancer Screening DBT: High-resolution DBT volumes suitable for lesion detection and 3D reconstruction tasks.
  • EA1141: High-quality DBT data, also associated with abbreviated MRI; supports multimodal analysis, lesion detection, and screening optimization.
  • VICTRE: Simulated DBT data for evaluating algorithms in a controlled setting, useful for CAD development and comparative studies.

Mammography

Dataset Subjects Nº Samples Format Size Year Cite Download
CBIS-DDSM 1566 6 671 DICOM 161.51GB 2017 A curated mammography data set for use in computer-aided detection and diagnosis research Download here
CMMD 1775 3 728 DICOM 2021 22.86GB The Chinese Mammography Database (CMMD): An online mammography database with biopsy confirmed types for machine diagnosis of breast Download here
CDD-CESM 326 2 006 JPEG 1.5GB 2021 Categorized Digital Database for Low energy and Subtracted Contrast Enhanced Spectral Mammography images (Dataset) , Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research , The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository Download here
VinDr-Mammo 5000 20 0000 DICOM N/A 2022 A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography Download here
INBreast 115 410 N/a N/A 2012 INbreast: toward a full-field digital mammographic database Contact the authors
MIAS N/A 322 PGM 1.5GB 2015 Mammographic Image Analysis Society (MIAS) database v1.21 Download here
Breast Tumor Mammography Dataset for Computer Vision N/A 3 383 JPG 103.49MB 2024 N/A Download here

Summaries:

  • CBIS-DDSM: A large set of annotated mammograms, excellent for classification, detection of calcifications, and mass segmentation tasks.
  • CMMD: Mammograms from a Chinese cohort, useful for cross-population studies, lesion detection, and classification.
  • CDD-CESM: Contrast-enhanced spectral mammography images supporting advanced analysis of vascularized lesions, aiding classification and differentiation tasks.
  • VinDr-Mammo: Large-scale dataset with diverse annotations for robust AI model training in detection and classification.
  • INBreast: High-quality full-field digital mammograms with detailed annotations, ideal for algorithm benchmarking.
  • MIAS: Classic mammography dataset widely used for initial model training, testing basic classification and detection algorithms.
  • Breast Tumor Mammography Dataset: A smaller dataset well-suited for entry-level experiments in tumor detection and basic classification.

MRI

Dataset Subjects Nº Samples Format Size Year Cite Download
ACRIN-6667 984 984 DICOM 199.59GB 2021 ACRIN-Contralateral-Breast-MR (ACRIN 6667) (Data set) Download here
ACRIN-6698 385 385 DICOM 1.94TB 2021 ACRIN 6698/I-SPY2 Breast DWI (Data set) Download here
ISPY1 222 222 DICOM 78.36GB 2016 Multi-center breast DCE-MRI data and segmentations from patients in the I-SPY 1/ACRIN 6657 trials Download here
ISPY2 719 719 DICOM 4.16TB 2022 I-SPY 2 Breast Dynamic Contrast Enhanced MRI Trial (ISPY2) (Version 1) (Data set) , ACRIN 6698/I-SPY2 Breast DWI (Data set) Download here
Duke Breast Cancer MRI 922 922 DICOM 368.89GB 2022 A machine learning approach to radiogenomics of breast cancer: a study of 922 subjects and 529 DCE-MRI features Download here
Breast Cancer Patients MRI's 700 700 JPG 201.4MB 2021 N/A Download here
Breast MRI NACT Pilot 64 64 DICOM 19.51GB 2023 Single site breast DCE-MRI data and segmentations from patients undergoing neoadjuvant chemotherapy (Version 3) (Data set) Download here
QIN Breast DCE-MRI 10 10 DICOM 15.9GB 2019 Variations of dynamic contrast-enhanced magnetic resonance imaging in evaluation of breast cancer therapy response: a multicenter data analysis challenge (QIN Breast DCE-MRI) (Version 2) (Data set) Download here
QIN-BREAST 67 67 DICOM 11.41GB 2020 Data From QIN-BREAST (Version 2) (Data set) Download here
QIN-BREAST-02 13 13 DICOM 4.19GB 2019 Data from QIN-BREAST-02(Dataset) Download here
Advanced MRI Breast Lesions 632 632 DICOM 646GB 2024 Standard and Delayed Contrast-Enhanced MRI of Malignant and Benign Breast Lesions with Histological and Clinical Supporting Data (Advanced-MRI-Breast-Lesions) (Version 2) (dataset) Download here
BREAST DIAGNOSIS 88 88 DICOM 60.87GB 2011 BREAST-DIAGNOSIS (Data set) Download here

Summaries:

  • ACRIN-6667 & ACRIN-6698: Rich MRI datasets for assessing neoadjuvant chemotherapy response, ideal for detection, segmentation of lesions, and longitudinal analysis.
  • ISPY1 & ISPY2: Multiparametric MRI for evaluating early response to therapy; supports predictive modeling, segmentation, and classification of treatment outcomes.
  • Duke Breast Cancer MRI: High-quality DCE-MRI scans enabling lesion characterization, radiogenomics analysis, and segmentation tasks.
  • Breast Cancer Patients MRI’s: JPG-format MRI slices suited for basic classification and proof-of-concept tasks.
  • Breast MRI NACT Pilot: Focused on patients undergoing neoadjuvant chemotherapy, enabling treatment response analysis and lesion segmentation.
  • QIN (Breast DCE-MRI, QIN-BREAST, QIN-BREAST-02): Small, high-quality sets for benchmarking quantitative imaging biomarkers, segmentation, and modeling treatment response.
  • Advanced MRI Breast Lesions: Large, detailed dataset for evaluating complex MRI models, including advanced lesion segmentation and classification.
  • BREAST DIAGNOSIS: DCE-MRI aimed at diagnostic feature extraction, supporting classification and lesion characterization tasks.

Histopathology

Dataset Subjects Nº Samples Format Size Year Cite Download
Post NAT BRCA 54 54 SVS 42.3GB 2019 Assessment of Residual Breast Cancer Cellularity after Neoadjuvant Chemotherapy using Digital Pathology (Data set) Download here
Breast Histopathology Images 162 162 PNG 1.6GB 2016 Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases Download here
BreakHis 82 7,909 PNG N/A 2016 A Dataset for Breast Cancer Histopathological Image Classification Download here
Breast Cancer Cell Segmentation N/A 58 TIFF 159.82MB 2019 Evaluation and benchmark for biological image segmentation Download here
BCSS 25 151 RGB N/A 2019 Structured crowdsourcing enables convolutional segmentation of histology images Download here
TUPAC16 500 N/A SVS 848GB 2016 Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge Download here
CAMELYON 200 1 399 TIFF N/A 2018 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset Download here
BACH N/A 400 SVS & TIFF 18.23GB 2022 Bach: Grand challenge on breast cancer histology images Download here

Summaries:

  • Post NAT BRCA: High-resolution WSIs post-neoadjuvant therapy, ideal for quantifying residual disease, segmentation, and treatment response analysis.
  • Breast Histopathology Images: Smaller PNG patches well-suited for basic classification and validation of histopathology models.
  • BreakHis: Large-scale histopathology dataset, excellent for classification and detection of tumor subtypes at various magnifications.
  • Breast Cancer Cell Segmentation: Focused on cell-level segmentation tasks, useful for training models that identify and count cells in tissues.
  • BCSS: Annotated crowdsourced dataset enabling segmentation and classification tasks at the tissue level.
  • TUPAC16: Whole-slide images for quantifying tumor proliferation; used in classification and mitosis detection challenges.
  • CAMELYON: WSIs for lymph node metastasis detection and segmentation tasks.
  • BACH: Balanced dataset of four histopathological classes (normal, benign, in situ, invasive) suitable for classification and region-of-interest detection.

Contributing & Contact

  • Contributions:
    Suggestions for new datasets, updates, or corrections are welcome. Please open an issue or submit a pull request.

  • Contact:
    For specific dataset access, follow the provided links or contact dataset maintainers directly. For repository-related questions or inquiries, feel free to email me at [email protected].

About

Centralized resource for breast imaging and histopathology datasets (Ultrasound, DBT, Mammography, MRI). Includes use cases, characteristics, and access info—supporting development of diagnostic tools, training algorithms, and advancing breast cancer research.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published