Merge pull request #11 from TeamNCMC/generalize-preprocess

Pre-processing scripts
TeamNCMC · Jan 14, 2025 · 71344df · 71344df
2 parents 16629f3 + c76079e
commit 71344df
Show file tree

Hide file tree

Showing 6 changed files with 253 additions and 126 deletions.
diff --git a/docs/guide-create-pyramids.md b/docs/guide-create-pyramids.md
@@ -10,6 +10,9 @@ This script is standalone, eg. it does not rely on the `cuisto` package. But ins
 
 `pyramid-creator` moved to a standalone package that you can find [here](https://github.com/TeamNCMC/pyramid-creator#pyramid_creator) with [installation](https://github.com/TeamNCMC/pyramid-creator#install) and [usage](https://github.com/TeamNCMC/pyramid-creator#usage) instructions.
 
+!!! info
+    You might also have to pre-process your images if there are debris or other artifacts in them. Check the [pre-processing guide](tips-preprocessing.md).
+
 ## Installation
 You will find instructions on the dedicated project page over at [Github](https://github.com/TeamNCMC/pyramid-creator#pyramid_creator).
 

diff --git a/docs/tips-preprocessing.md b/docs/tips-preprocessing.md
@@ -0,0 +1,96 @@
+# Image pre-processing
+
+Preparing slides before image acquisition can be a tedious task : it happens that some slices are flipped (either upside-down or left/right), put too close from each other (resulting in a part of a different slice being visible in an image), too close from the slide edge...
+In such cases, one might need to clean the image so that only the actual slice is visible in the image.
+
+## Pre-processing scripts
+Two scripts are provided in `scripts/preprocessing` to this end. They require first to export the images from the microscope software to standard image files with metadata (eg. [OME-TIFF](tips-formats.md#metadata) files).
+
+The process is then :
+
+1. Split each channel in single-channel images,
+1. Detect automatically the brain contour in the specified target channel,
+1. Save the resulting brain mask as an image,
+1. Apply the mask to all channels and save resulting cleaned images,
+1. Review manually the masks, if not satisfied, manually edit the correspond single-channel image in ImageJ,
+1. Rerun the brain contour detection and re-apply the masks to all channels,
+1. Merge cleaned channels in a multi-channel, pyramidal OME-TIFF image ready to be used in QuPath.
+
+The first script, `preprocess_split_channels.py` handles steps 1-6, `preprocess_merge_channel.py` takes care of the last step.
+
+!!! info
+    The reason we need to split channels is to get images that can be easily openned in a third-party software such as ImageJ for conveninent editing.
+
+## Usage
+First and foremost, export the images from the microscope software to OME-TIFF. For Zeiss ZEN, have a look at [this guide](guide-create-pyramids.md#export-czi-to-ome-tiff). Say the images were exported to a directory called `~/input_directory/`.
+
+### Split channels and find brain mask
+Copy the script `preprocess_split_channels.py` located in `scripts/preprocessing` on your computer. Read the options at the top of the script and edit according to your need.
+
+Especially, the `TASKS` dictionnary what actions are to be performed.
+
+This script will :
+
+1. (if `move=True`) Move images from `~/input_directory` to `~/images/merged_original/`. The files will be renamed depending on the options set in the script header. The `IN_PREFIX` parameter allows the slice number to be parsed. The `OUT_PREFIX` is the prefix of the renamed image and all subsequent use.
+
+    ??? Example
+        ZEN exported images named : `A1A4_s1.ome.tiff`, `A1A4_s2.ome.tiff`, ...  
+        Setting `IN_PREFIX` to `"_s"` and `OUT_PREFIX` to `animalid_` will result in image being moved from `~/input_directory/animalid_s1.ome.tiff` to `~/images/animalid_001.ome.tiff`, and so on. The `images` folder name is customizable but will always be in the parent directory of `input_drectory`.
+
+2. (if `split=True`) While moving and renaming the image, it will also read the actual image data, and split each channel in separate single-channel images. The image files will have the same name and are stored in `~/ch01`, `~/ch02`... folders.
+3. (if `clean=True`) The parameter `DETECTION_CHANNEL` sets which channel will be used to find the brain contour. The corresponding single-channel file is read, [brain detection](#brain-contour-detection) is performed, the resulting mask is saved in `~images/masks`. Since the image is already loaded, the mask is also applied directly to it, and the cleaned, masked image is saved in `~/images/chXX_cleaned`, where `XX` corresponds to `DETECTION_CHANNEL`.
+
+    ??? Info
+        If the mask image file already exists, the image is skipped. Likewise, if `overwrite_cleaned` is turned off (eg. set to `False`), if an image with the same name already exist in the `chXX_cleaned` folders, it will be skipped.
+
+4. The mask is subsequently applied to all other channels in the same manner : cleaned images have the same name as the renamed original file, and stored in their respective `chXX_cleaned` folders.
+5. Visually assess the quality of the masks stored in `~/images/masks/`. Previews are generated in the `previews` folder. If they are satisfactory, skip to the [next section](#merge-channels).
+
+If for some images the mask is not satisfactory, note down their names and :
+
+1. Delete the mask file (not the preview !).
+2. Detele the corresponding cleaned images in each channel.
+3. Open ImageJ, drag & drop the corresponding single-channel original image from the channel used for detection.
+4. Manually edit it so that the brain slice is easily detected. This means deleting the bits not part of the slice, usually when those bits are close to the slice itself. One could for instance use the `Freehand selections` tool, select the parts to remove and hit ++del++.
+5. Save the image (++ctrl+s++), overwritting the original.
+6. Repeat for each un-satisfactory mask.
+7. Back to the script, turn off `reformat` and `split` in `TASKS`, since that's already done. Only the missing masks will be computed, and only the missing images from the `chXX_cleaned` folders will be written (unless `overwrite_cleaned` is set to `True`).
+
+??? Example
+    Automatic brain contour detection failed for `animalid_012.tiff`.  
+    I delete `~/images/masks/animalid_012.tiff`. I also delete `~/images/ch01_cleaned/animalid_012.tiff`, `~/images/ch02_cleaned/animalid_012.tiff` and `~/images/ch03_cleaned/animalid_012.tiff`.  
+    I drag & drop `~/images/ch01/animalid_012.tiff` in ImageJ, draw the brain contour manually with Freehand selections tool, invert the selection, hit ++del++, save the image, overwritting it.  
+    Finally, I edit the script, setting `reformat=False` and `split=False` in `TASKS`, and re-run the script. Only one mask will be computed and applied.
+
+Now, we only have to merge all the channels back to single pyramidal OME-TIFF images ready to be used in QuPath.
+
+### Merge channels
+Copy the `preprocess_merge_channels.py` script on your computer.
+
+This one is more straighfoward :
+
+1. Fill the input directory. This is where the script can find each `chXX_cleaned` folders, `~/images/` in the example above.
+2. Fill the output directory. This could be for instance `~/images/merged_cleaned/`.
+3. Fill the `CHANNELS` parameters. This is a dictionnary, setting the name and color of each channel. The order is important, it needs to be sorted as the `chXX_cleaned` folders are.
+
+    ??? Example
+        The first channel (`ch01_cleaned`) corresponds to the NISSL staining imaged in the CFP channel, the second channel (`ch02_cleaned`) corresponds to the EGFP channel. `CHANNELS` would then look like : `{"CFP": (0, 0, 255), "EGFP": (0, 255, 0)}`.
+
+4. Fill the pyramids and tiles options. The default value should work fine for most use cases.
+5. Run the script. Images in `OUTPUT_DIRECTORY` are ready to be added to a QuPath project !
+
+!!! danger Important
+    The pixel size is read from the OME-TIFF files and propagated along the pre-processing steps until the final images, so make sure it is correct when exporting the files from the microscope software.
+
+### Brain contour detection
+The algorithm to detect the brain contour is defined in the function `find_brain_mask()` in the `preprocess_split_channels.py` script. All the parameters are customizable in the `DETECTION_PARAMETERS` variable.
+In a nutshell :
+
+1. Zeroes are replaced with a fixed background value (`bkg`). This is to account when manually removing parts in ImageJ, the image background will be high compared to the 0 induced by this operation and edge detection will be sub-optimal.
+2. The image is downsampled (`downscale`) for performance -- the full resolution is not needed.
+3. Edge filter with the Canny algorithm (using `cannysigma` and `cannythresh`), implemented in [scikit-image](https://scikit-image.org/docs/stable/api/skimage.feature.html#skimage.feature.canny).
+4. Morphological closing (dilation followed by erosion) to keep only "big" objects, using `closeradius`.
+5. Fill the holes.
+6. Keep only the biggest remaining object.
+7. Resize the mask to the original image resolution.
+
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -40,6 +40,7 @@ nav:
     - tips-formats.md
     - tips-qupath.md
     - tips-brain-contours.md
+    - tips-preprocessing.md
   - main-configuration-files.md
   - Examples:
     - main-using-notebooks.md

diff --git a/scripts/preprocessing/preprocess_invert_orientation.py b/scripts/preprocessing/preprocess_invert_orientation.py
@@ -1,7 +1,9 @@
-"""Simple script to change the order of files.
+"""
+Simple script to change the order of files.
 
-Used to transform file names (xxx_001.tiff) to reverse their order, to go from
-caudo-rostral to rostro-caudal.
+Used to transform file names (xxx_001.tiff) to reverse their order, eg.
+for a 30 image stack, xxx_001.tiff becomes xxx_030.tiff, xxx_030.tiff becomes
+xxx_001.tiff, and so on.
 
 """
 
@@ -12,22 +14,27 @@
 input_directory = "/path/to/directory"  # path to tiff files
 output_directory = "/path/to/directory/new"  # output directory, must be different
 file_extension = ".ome.tiff"  # file extension with dots
-file_prefix = "mouse0_"  # full prefix before the numbering digits
+file_prefix = "animal0_"  # full prefix before the numbering digits
 ndigits = 3  # number of digits for numbering (both inputs and outputs)
 dry_run = True  # if True, do not actually rename the files
 
+# list available files
 list_files = [
     filename
     for filename in os.listdir(input_directory)
     if filename.startswith(file_prefix) & filename.endswith(file_extension)
 ]
 
+# count files
 nfiles = len(list_files)
+# reverse indices
 new_numbers = np.arange(nfiles, 0, -1)
 
+# create output directory if necessary
 if not os.path.isdir(output_directory):
     os.mkdir(output_directory)
 
+# loop over images, build new name and rename
 for oldi, newi in enumerate(new_numbers):
     old_name = f"{file_prefix}{str(oldi + 1).zfill(ndigits)}{file_extension}"
     new_name = f"{file_prefix}{str(newi).zfill(ndigits)}{file_extension}"

diff --git a/scripts/preprocessing/preprocess_merge_channels.py b/scripts/preprocessing/preprocess_merge_channels.py
@@ -1,16 +1,17 @@
 """
 Script for preprocessing.
-Merge channels found in wdir/Stack_RIP/ch*_cleaned, and create pyramidal OME-TIFF.
-Specify options at the top of file.
-`CHANNELS` must be ordered as the channels in the Stack_RIP directory.
+To be used after preprocess_split_channels.py and manual review of brain masks.
+
+This script merges channels found in input_dir/ch*_cleaned, and create pyramidal
+OME-TIFF ready to be used in QuPath.
 
-Double check channel names and colors.
+Specify options at the top of file.
+`CHANNELS` must be ordered as the channels in the input_dir directory.
 
-Credits to Christoph Gohlke, see
-https://forum.image.sc/t/creating-a-multi-channel-pyramid-ome-tiff-with-tiffwriter-in-python/76424/4
+Double check channel names and colors, and run the script.
 
 author : Guillaume Le Goc ([email protected])
-version : 2024.11.19
+version : 2025.1.14
 
 """
 
@@ -24,13 +25,18 @@
 from tqdm import tqdm
 
 # --- Parameters
-EXPID = "animal0"
+# where to find chXX_cleaned folders
+INPUT_DIRECTORY = r"E:\projects\histo\data\GN121\images"
+# where to save merged images
+OUTPUT_DIRECTORY = os.path.join(INPUT_DIRECTORY, "merged_cleaned")
 
 # channels settings : dict mapping channel name to an RGB color. The order must be the
-# same as the channels order in the Stack_RIP directory.
+# same as the channels order in the input directory.
 CHANNELS = {
     "CFP": (0, 0, 255),
     "EGFP": (0, 255, 0),
+    "DsRed": (255, 0, 0),
+    "Cy5": (255, 0, 255),
 }
 
 # pyramidal ome-tiff settings
@@ -41,10 +47,8 @@
 
 IN_EXT = "tiff"
 
-# working directory
-WDIR = "path/to/data"
-
 
+# --- Functions
 def rgb_to_int(rgb):
     """Convert RGB color tuple to integer for OME-TIFF specs.
     Alpha channel is set to 0.
@@ -188,9 +192,10 @@ def im_downscale(img, downfactor, **kwargs):
 
 
 def process_directory(
-    expid: str,
-    levels: tuple,
+    input_directory: str,
+    output_directory: str,
     channels: dict,
+    levels: tuple,
 ):
     """
     Merge TIFF stacks representing different channels and create pyramidal OME-TIFF.
@@ -208,22 +213,16 @@ def process_directory(
 
     """
     # --- Preparation
-    wdir = os.path.abspath(WDIR)
-
-    # build directories names
-    inpdir = os.path.join(wdir, expid, "images")
-    outdir = os.path.join(wdir, expid, "images", "merged_cleaned_pyramid")
-
     # create directory if it does not exist
-    if not os.path.isdir(outdir):
-        os.makedirs(outdir)
+    if not os.path.isdir(output_directory):
+        os.makedirs(output_directory)
 
     # list channel directories
     chandirslist = [
-        os.path.join(inpdir, directory)
-        for directory in os.listdir(inpdir)
+        os.path.join(input_directory, directory)
+        for directory in os.listdir(input_directory)
         if (
-            os.path.isdir(os.path.join(wdir, expid, "images", directory))
+            os.path.isdir(os.path.join(input_directory, directory))
             and directory.startswith("ch")
             and directory.endswith("cleaned")
         )
@@ -256,7 +255,9 @@ def process_directory(
     pbar = tqdm(imgslist)
     for imgfile in pbar:
         # build output image name
-        imgout = os.path.join(outdir, os.path.splitext(imgfile)[0] + ".ome.tiff")
+        imgout = os.path.join(
+            output_directory, os.path.splitext(imgfile)[0] + ".ome.tiff"
+        )
 
         if os.path.isfile(imgout):
             continue
@@ -308,7 +309,8 @@ def process_directory(
 # --- Call
 if __name__ == "__main__":
     process_directory(
-        EXPID,
-        LEVELS,
+        INPUT_DIRECTORY,
+        OUTPUT_DIRECTORY,
         CHANNELS,
+        LEVELS,
     )