Merge branch 'main' into tcatley/dna-width

AFM-SPM · Nov 18, 2024 · c97bee4 · c97bee4
2 parents 3ed5d53 + 4008835
commit c97bee4
Show file tree

Hide file tree

Showing 49 changed files with 5,934 additions and 436 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,30 @@
+# TopoStats Pull Requests
+
+Please provide a descriptive summary of the changes your Pull Request introduces.
+
+The [Software Development](https://afm-spm.github.io/TopoStats/main/contributing.html#software-development) section of
+the Contributing Guidelines may be useful if you are unfamiliar with linting, pre-commit, docstrings and testing.
+
+**NB** - This header should be replaced with the description but please complete the below checklist or a short
+description of why a particular item is not relevant.
+
+---
+
+Before submitting a Pull Request please check the following.
+
+- [ ] Existing tests pass.
+- [ ] Documentation has been updated and builds. Remember to update `configuration.md`, `usage.md`, and relevant
+      processing sections under `advanced.md`.
+- [ ] Pre-commit checks pass.
+- [ ] New functions/methods have typehints and docstrings.
+- [ ] New functions/methods have tests which check the intended behaviour is correct.
+
+## Optional
+
+### `topostats/default_config.yaml`
+
+If adding options to `topostats/default_config.yaml` please ensure.
+
+- [ ] There is a comment adjacent to the option explaining what it is and the valid values.
+- [ ] A check is made in `topostats/validation.py` to ensure entries are valid.
+- [ ] Add the option to the relevant sub-parser in `topostats/entry_point.py`.
diff --git a/.gitignore b/.gitignore
@@ -39,6 +39,7 @@ pytest-debug.ini
 
 # Documentation
 _build/
+!docs/_static/images/**
 
 # MacOS
 .DS_Store

diff --git a/.markdownlint-cli2.yaml b/.markdownlint-cli2.yaml
@@ -10,9 +10,10 @@ config:
       - div
       - br
 
-# Globs
 globs:
   - "**/*.md"
 
-# Fix any fixable errors
+ignores:
+  - "tmp/"
+
 fix: true
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,6 +1,6 @@
 repos:
   - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.6.0 # Use the ref you want to point at
+    rev: v5.0.0 # Use the ref you want to point at
     hooks:
       - id: check-case-conflict
       - id: check-docstring-first
@@ -19,26 +19,26 @@ repos:
         types: [python, yaml, markdown]
 
   - repo: https://github.com/DavidAnson/markdownlint-cli2
-    rev: v0.13.0
+    rev: v0.14.0
     hooks:
       - id: markdownlint-cli2
         args: []
 
   - repo: https://github.com/asottile/pyupgrade
-    rev: v3.17.0
+    rev: v3.19.0
     hooks:
       - id: pyupgrade
         args: [--py38-plus]
 
   - repo: https://github.com/astral-sh/ruff-pre-commit
     # Ruff version.
-    rev: v0.6.3
+    rev: v0.7.2
     hooks:
       - id: ruff
         args: [--fix, --exit-non-zero-on-fix, --show-fixes]
 
   - repo: https://github.com/psf/black-pre-commit-mirror
-    rev: 24.8.0
+    rev: 24.10.0
     hooks:
       - id: black
         types: [python]
@@ -47,7 +47,7 @@ repos:
       - id: black-jupyter
 
   - repo: https://github.com/adamchainz/blacken-docs
-    rev: 1.18.0
+    rev: 1.19.1
     hooks:
       - id: blacken-docs
         additional_dependencies:
@@ -64,7 +64,7 @@ repos:
       - id: prettier
 
   - repo: https://github.com/kynan/nbstripout
-    rev: 0.7.1
+    rev: 0.8.0
     hooks:
       - id: nbstripout
 

diff --git a/README.md b/README.md
@@ -62,12 +62,13 @@ topostats process
 ```
 
 If you have your own YAML configuration file (see [Usage : Configuring
-TopoStats](https://afm-spm.github.io/TopoStats/main/usage.html#configuring_topostats)) then invoke `topostats process`
-and use the argument for `--config <config_file>.yaml` that points to your file.
+TopoStats](https://afm-spm.github.io/TopoStats/main/usage.html#configuring_topostats)) then invoke `topostats`
+and use the argument for `--config <config_file>.yaml` that points to your file with an associated module of
+TopoStats e.g. `process`.
 
 ```bash
 # Edit and save my_config.yaml then run TopoStats with this configuration file
-topostats process --config my_config.yaml
+topostats --config my_config.yaml process
 ```
 
 The configuration file is validated before analysis begins and if there are problems you will see errors messages that
@@ -76,13 +77,6 @@ are hopefully useful in resolving the error(s) in your modified configuration.
 You can generate a sample configuration file using the `topostats create-config` argument which writes the default
 configuration to the file `./config.yaml` (i.e. in the current directory). This will _not_ run any analyses.
 
-**NB** - This feature is only available in versions > v2.0.0 as it was introduced after v2.0.0 was released. In older
-version > 2.0.0 and <= 2.1.2 you can use the older `run_topostats --create-config` option.
-
-```bash
-run_topostats --create-config-file config.yaml
-```
-
 ### Notebooks
 
 Example Jupyter Notebooks have been developed that show how to use TopoStats package interactively which is useful when

diff --git a/docs/_static/images/flattening/flattening_align_rows.png b/docs/_static/images/flattening/flattening_align_rows.png
diff --git a/docs/_static/images/flattening/flattening_binary_mask.png b/docs/_static/images/flattening/flattening_binary_mask.png
diff --git a/docs/_static/images/flattening/flattening_final_flattened_image.png b/docs/_static/images/flattening/flattening_final_flattened_image.png
diff --git a/docs/_static/images/flattening/flattening_height_zeroing.png b/docs/_static/images/flattening/flattening_height_zeroing.png
diff --git a/docs/_static/images/flattening/flattening_pipeline.png b/docs/_static/images/flattening/flattening_pipeline.png
diff --git a/docs/_static/images/flattening/flattening_raw_afm_image.png b/docs/_static/images/flattening/flattening_raw_afm_image.png
diff --git a/docs/_static/images/flattening/flattening_scar_removed.png b/docs/_static/images/flattening/flattening_scar_removed.png
diff --git a/docs/_static/images/flattening/flattening_scarred_image.png b/docs/_static/images/flattening/flattening_scarred_image.png
diff --git a/docs/_static/images/flattening/flattening_tilt_removal.png b/docs/_static/images/flattening/flattening_tilt_removal.png
diff --git a/docs/_static/images/flattening/flattening_tilt_removal_full_zrange.png b/docs/_static/images/flattening/flattening_tilt_removal_full_zrange.png
diff --git a/docs/_static/images/flattening/gaussian_sizes.png b/docs/_static/images/flattening/gaussian_sizes.png
diff --git a/docs/_static/images/flattening/tilt_removed_with_mask.png b/docs/_static/images/flattening/tilt_removed_with_mask.png
diff --git a/docs/_static/images/grain_finding/grain_finding_grain_thresholds.png b/docs/_static/images/grain_finding/grain_finding_grain_thresholds.png
diff --git a/docs/_static/images/grain_finding/grain_finding_minicircles.png b/docs/_static/images/grain_finding/grain_finding_minicircles.png
diff --git a/docs/_static/images/grain_finding/grain_finding_size_thresholding.png b/docs/_static/images/grain_finding/grain_finding_size_thresholding.png
diff --git a/docs/_static/images/grain_finding/grain_finding_tidy_borders.png b/docs/_static/images/grain_finding/grain_finding_tidy_borders.png
diff --git a/docs/_static/images/grain_finding/grain_finding_unet_example.png b/docs/_static/images/grain_finding/grain_finding_unet_example.png
diff --git a/docs/_static/images/grain_finding/grain_finding_unet_multi_class_example.png b/docs/_static/images/grain_finding/grain_finding_unet_multi_class_example.png
diff --git a/docs/_static/images/thresholding/thresholding_histogram.png b/docs/_static/images/thresholding/thresholding_histogram.png
diff --git a/docs/advanced.md b/docs/advanced.md
@@ -2,6 +2,9 @@
 
 You can read more detailed information about the methods implemented in TopoStats in the pages below.
 
+- [Flattening](advanced/flattening.md)
+- [Grain Finding](advanced/grain_finding.md)
+- [Thresholding](advanced/thresholding.md)
 - [Disordered Tracing](advanced/disordered_tracing.md)
 - [Nodestats](advanced/nodestats.md)
 - [Ordered Tracing](advanced/ordered_tracing.md)

diff --git a/docs/advanced/flattening.md b/docs/advanced/flattening.md
@@ -0,0 +1,137 @@
+# Flattening
+
+Flattening is the process of taking a raw AFM image, and removing the image artefacts that are present due to the
+scanning probe microscopy (SPM) and AFM imaging. These encompass, but are not limited to; row alignment from the raster
+scanning motion, and polynomial flattening of a surface from piezoelectric bowing.
+For surface based samples, such as DNA on Mica, this results in an image where the background mica is flat and the
+sample is clearly visible resting on the surface.
+
+Here is a raw, unprocessed AFM image:
+
+![raw AFM image](../_static/images/flattening/flattening_raw_afm_image.png)
+
+You can see there is a large tilt in the image from the bottom right to the top left, as well as lots of horizontal
+banding throughout the rows in the image. These artefacts are removed
+during the flattening process in TopoStats knows as `Filters`.
+
+## At a Glance - Removing AFM Imaging Artefacts
+
+Images are processed by:
+
+- Row alignment (make each row median the same height)
+- Tilt & polynomial removal (fit a plane and quadratic polynomial to the image and subtract)
+- Scar removal (remove long, thin, bright streaks in the data)
+- Zero the average height (lower the image by the mean height) to make the background roughly centred at zero nm
+- Masking (detect objects on the surface and flatten the image again, ignoring the data on the surface)
+- Secondary flattening (re-process the data using the mask to tell us where the background is, and zero the data using
+  the mean of the background mask)
+- Gaussian filter (to smooth pixel differences / high-gain noise)
+
+![flattening pipeline](../_static/images/flattening/flattening_pipeline.png)
+
+## Row alignment
+
+The first step in the flattening process is **row alignment**. Row alignment is a process that adjusts the height of
+each row of the image so that they all share the same median height value. This "median" value is set by the
+`row_alignment_quartile` where the default of 0.5 is the median value, but can be adjusted depending on how much data
+is considered background. This gets rid of some of the horizontal
+banding and produces an image where the rows are aligned, but the image still has a clear tilt.
+
+![row alignment](../_static/images/flattening/flattening_align_rows.png)
+
+## Tilt removal
+
+After row alignment, tilt removal is applied. This is a simple process of fitting and subtracting a plane to the image,
+resulting in a mostly flat image. However as you can see in the following image, it's not perfect and there still
+exists "shadows" on rows with lots of non-background data.
+Two images are provided here, one with the full z-range and one with an adjusted height range (z-range) to show
+the remaining artefacts better, such as the low regions or "shadows" on rows with lots of non-background data.
+
+![tilt_removal_full_zrange](../_static/images/flattening/flattening_tilt_removal_full_zrange.png)
+
+![tilt removal_better_viewing](../_static/images/flattening/flattening_tilt_removal.png)
+
+## Polynomial removal
+
+After the tilt, we remove the polynomial trends. In some images, there is also quadratic or occasionally cubic bowing to
+the image too. We remove this by fitting a two dimensional quadratic polynomial to the image (in the horizontal
+direction), and subtracting it from the image. We then do the same for a nonlinear polynomial (z = a*x*y) to eliminate
+“saddle” trends in the data. We could do all of these at the same time, but we like to be able to see the iterative
+differences.
+
+## Scar removal (optional)
+
+We then optionally run scar removal on the image. This is a special function that detects scars - long, thin, bright / dark
+streaks in the data, caused by physical problems in the AFM process. They are found by the parameters; `threshold_low`
+and `threshold_high` identifying great height changes between rows, and filtered for scars via `max_scar_width`
+and `min_scar_length` in pixel lengths. We are using a different image here as an example since our lovely
+minicircles.spm image doesn’t have any scars.
+
+![scarred image](../_static/images/flattening/flattening_scarred_image.png)
+
+![scar removed](../_static/images/flattening/flattening_scar_removed.png)
+
+**Note that scar removal can distort data, and it’s best to take data without scars if you can.**
+
+## Zero the average height
+
+![height zeroing](../_static/images/flattening/flattening_height_zeroing.png)
+
+We then lower the image by its mean height which causes the background of the image to be roughly centred at zero nm.
+If this function is provided a foreground mask such as in the second iteration of flattening, this function zeros the
+data only on the background data.
+Data zeroing is important since the raw AFM heights are relative, and these processing steps can shift the background
+height away from zero, so this makes it easier to obtain comparative height metrics.
+
+## Masking
+
+Now consider that all the processing we have done has assumed that every pixel of the image is background. We assumed
+that there were no objects on the surface, messing up our fitting, and row alignment. If there was a large amount of
+DNA on one side of the image, then the slope will be affected by it, and so flatten the image poorly.
+
+Because of this, once we have done our initial flattening, we detect our objects on the surface, and then flatten the
+image again! But this time, ignoring the data on the surface, and only considering the background.
+
+How do we do that?
+Well first, we need to find the data on the surface. We do this by thresholding.
+The type of threshold (standard deviation - `std_dev`, absolute - `absolute`, otsu - `otsu`), and the threshold values
+are set by the config file (have a look!). Any pixels that are below the threshold, are considered
+background (sample surface). Any pixels that are above the threshold are considered to be data (useful sample objects).
+This binary classification allows us to make a binary mask of where is foreground data, and where is background.
+
+For more information on thresholding and how to set it, see the [thresholding](thresholding.md) page.
+
+Here is the binary mask for minicircle.spm:
+
+![tilt_removed_with_mask](../_static/images/flattening/tilt_removed_with_mask.png)
+
+So you can see how all the interesting foreground (high) regions are now masked in white, and the background is in
+black.
+
+This allows TopoStats to use only the background (black pixels) in its calculations for slope removal, row alignment
+etc.
+
+So we re-do all the previous processing, but with this new useful binary mask to guide us.
+
+## Secondary flattening
+
+After re-processing the data using the mask to tell us where the background is, we get a better, more accurately
+flattened image. We can see the "shadows" on rows with lots of data have now been flattened correctly.
+
+From here, we can go on to do things like finding our objects of interest (grains) and get stats about them.
+
+![secondary flattening](../_static/images/flattening/flattening_final_flattened_image.png)
+
+## Gaussian filter
+
+Finally, we apply a Gaussian filter to the image to smooth height differences and remove high-gain noise. This allows
+you to get smoother data
+but will start to blur out important features if you apply it too strongly. The default strength is a sigma of 1.0, but
+you can adjust this in the config file under `gaussian_size`. The `gaussian_mode` parameter suggests how values at
+the border should be handled, see
+[skimage.filters.gaussian](https://scikit-image.org/docs/stable/api/skimage.filters.html#skimage.filters.gaussian)
+for more details.
+
+Here are some examples of different gaussian sizes:
+
+![gaussian_sizes](../_static/images/flattening/gaussian_sizes.png)
diff --git a/docs/advanced/grain_finding.md b/docs/advanced/grain_finding.md
@@ -0,0 +1,99 @@
+# Grain finding
+
+## At a Glance - Identifying Objects of Interest
+
+TopoStats automatically tries to find grains (objects of interest) in your AFM images. There are several steps to this.
+
+- **Height thresholding**: We find grains based on their height in the image.
+- **Remove edge grains**: We remove grains that intersect the image border.
+- **Size thresholding**: We remove grains that are too small or too large.
+- **Optional: U-Net mask improvement**: We can use a U-Net to improve the mask of each grain.
+
+## Height thresholding
+
+Grain finding is the process of detecting useful objects in your AFM images. This might be DNA, proteins, holes in a
+surface or ridges on a surface.
+In the standard operation of TopoStats, the way we find objects is based on a height threshold. This means that we
+detect where things are based on how high up they are.
+
+For example, with our example minicircles.spm image, we have DNA that is poking up from the sample surface, represented
+by
+bright regions in the image, alongside impurities and proteins, also above the surface:
+
+![minicircles image](../_static/images/grain_finding/grain_finding_minicircles.png)
+
+If we want to select the DNA, then we can take only the regions of the image that are above a certain height
+threshold (standard deviation - `std_dev`, absolute - `absolute`, otsu - `otsu`).
+
+Here are several thresholds to show you what happens as we increase the absolute height threshold:
+
+![height thresholds](../_static/images/grain_finding/grain_finding_grain_thresholds.png)
+
+Notice that the amount of data decreases, until we are only left with the very highest points.
+
+The aim is to choose a threshold that keeps the data you want, while removing the background and other low objects
+that you don’t want including.
+So in this example, a threshold of 0.5 would be best, since it keeps the DNA while removing the background.
+
+There are lots of objects in this mask that we don't want to analyse, but we can remove those using area thresholds in
+the next steps. These objects have been detectd because while they are small, they are still high up and above the
+background.
+
+For more information on the types of thresholding, and how to set them, see the [thresholding](thresholding.md) page.
+
+## Remove edge grains
+
+Some grains may intersect the image border. In these cases, the grain will not be able to have accuracte statistics
+calculated for it, since it is not fully in the image. Because of this, we have the option of removing grains that
+intersect the image border with the `remove_edge_intersecting_grains` flag in the config file. This simply removes
+any grains that intersect the image border.
+
+Here is a before and after example of removing edge grains:
+
+![size_thresholding](../_static/images/grain_finding/grain_finding_tidy_borders.png)
+
+## Size thresholding
+
+In our thresholded image, you will notice that we have a lot of small grains that we do not want to analyse in our
+image. We can get rid of those with size thresholding. This is where TopoStats will remove grains based on their area,
+leaving only the right size of molecules. You will need to play around with the thresholds to get the right results.
+
+You can set the size threshold using the `absolute_area_threshold` in the config file. This sets the minimum and
+maximum area of the grains that you want to keep, in nanometers squared. Eg if you want to keep grains that are between
+10nm^2 and 100nm^2, you would set `absolute_area_threshold` to `[10, 100]`.
+
+![size_thresholding](../_static/images/grain_finding/grain_finding_size_thresholding.png)
+
+## Optional: U-Net mask improvement
+
+As an additional optional step, each grain that reaches this stage can be improved by using a U-Net to mask the grain
+again. This requires a U-Net model path to be supplied in the config file.
+
+The U-Net model will take the bounding box of each grain, makes it square, and passees it to a trained U-Net model
+which makes a prediction for a better mask, which then replaces the original mask.
+
+Here is an example comparing absolute height thresholding to U-Net masking for one of our projects. The white boxes
+indicate regions where the height threhsold performs poorly and is improved by the U-Net mask.
+
+![unet_example](../_static/images/grain_finding/grain_finding_unet_example.png)
+
+### Multi-class masking
+
+TopoStats supports masking with multiple classes. This means that you could use a U-Net to mask DNA and proteins
+separately.
+
+This requires a U-Net that has been trained on multiple classes.
+
+Here is an example of multi-class masking using a U-Net which was used for one of our projects.
+
+![multi_class_unet_example](../_static/images/grain_finding/grain_finding_unet_multi_class_example.png)
+
+## Technical details
+
+### Details: Multi-class masking
+
+Multi class masking is implemented by having each image be a tensor of shape N x N x C, where N is the image size,
+and C is the number of classes. Each class is a binary mask, where 1 is the class, and 0 is not the class.
+The first channel is background, where 1 is background, and 0 is not background. The rest of the channels
+are arbitrary, and defined by how the U-Net was trained, however we conventially recommend that the first class
+be for DNA (if applicable) and the next classes for other objects.
-Original file line number
+Diff line change
@@ Expand Up / @@ -39,6 +39,7 @@ pytest-debug.ini @@
     # Documentation
     _build/
+    !docs/_static/images/**
     # MacOS
     .DS_Store
@@ Expand Down @@