DataPath is an open-source computational pathology toolbox designed to support researchers and regulatory scientists in the preparation and analysis of whole slide images (WSIs). Developed to streamline and standardize WSI workflows, DataPath offers modular tools for stain normalization, color harmonization, tissue registration, and data stratification. By ensuring consistency and quality across diverse histopathology datasets, it facilitates the creation of AI-ready datasets suitable for robust algorithm development, validation, and evaluation. For more information, please contact: [email protected].
There are several modules in this package including
- WSI Handler: Includes functions and classes for general WSI analysis such as reading whole slide images, extract sub region, and visualize thumbnail.
- Annotation Extraction: Includes several functions for extracting annotated ROIs.
- Patch Extraction: Assists pathologists and developers in extracting image patches from a whole slide image's region of interest.
- Color Normalization: Implements multiple methods (Macenko, Vahadane, Reinhard, and Histogram Matching) to normalize staining variability across slides, ensuring consistency in downstream AI analysis.
- WSI Tissue Registration: Provides classical feature-based registration algorithm - ORB to align serial or cross-stained tissue sections, with support for homography and similarity transforms.
- Stratification: Offers tools to split and visualize datasets based on metadata for balanced train/val/test splits and reproducible AI model training.
- WSI Duplicate Detection: Identifying duplicate whole slide images in a given dataset to assist AI model developers in understanding the dataset for training purposes
Code Documentation: Link Please refer to the code documentation and email [email protected] if you have any questions.
To set up the DataPath environment, first clone this repository and navigate to the project directory:
git clone https://github.com/DIDSR/DataPath.git
cd DataPathCreate a virtual environment and install dependencies from the provided requirements.txt:
python3 -m venv datapath_env
source datapath_env/bin/activate
pip install -r requirements.txtTested Environment:
- Linux (Ubuntu 22.04 LTS recommended)
- Python 3.10+
Some key dependencies include:
numpy==2.1.2
opencv-python==4.11.0.86
scikit-image==0.25.2
scikit-learn==1.6.1
matplotlib==3.10.1
pyfeats==1.0.1
mahotas==1.4.18
torch==2.5.1
torchvision==0.20.1(See requirements.txt for the full list.)
Several Jupyter notebooks and scripts are provided to quickly familiarize you with the capabilities and usage of DataPath:
- WSI Handler
- Annotation Extraction
- Patch Extraction
- Color Normalization
- WSI Tissue Registration
- Stratification
- WSI Duplicate Detection
For any inquiries, suggestions, or collaborative opportunities, please contact Seyed Kahaki either via this GitHub repo or via email ([email protected]).
The enclosed tool is part of the Catalog of Regulatory Science Tools, which provides a peer-reviewed resource for stakeholders to use where standards and qualified Medical Device Development Tools (MDDTs) do not yet exist. These tools do not replace FDA-recognized standards or MDDTs. This catalog collates a variety of regulatory science tools that the FDA’s Center for Devices and Radiological Health’s (CDRH) Office of Science and Engineering Labs (OSEL) developed. These tools use the most innovative science to support medical device development and patient access to safe and effective medical devices. If you are considering using a tool from this catalog in your marketing submissions, note that these tools have not been qualified as Medical Device Development Tools and the FDA has not evaluated the suitability of these tools within any specific context of use. You may request feedback or meetings for medical device submissions as part of the Q-Submission Program. For more information about the Catalog of Regulatory Science Tools, email [email protected].
• RST Reference Number: RSTXXXX.01
• Date of Publication: XX/XX/XXXX
• Recommended Citation:
U.S. Food and Drug Administration. (2024). DataPath: A Whole Slide Image Processing Tool for AI-Ready Dataset Preparation (RSTXXXX.01). https://cdrh-rst.fda.gov/TBD
