Skip to content

Commit

Permalink
Merge branch 'main' into tcatley/dna-width
Browse files Browse the repository at this point in the history
  • Loading branch information
ns-rse authored Nov 18, 2024
2 parents 3ed5d53 + 4008835 commit c97bee4
Show file tree
Hide file tree
Showing 49 changed files with 5,934 additions and 436 deletions.
30 changes: 30 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# TopoStats Pull Requests

Please provide a descriptive summary of the changes your Pull Request introduces.

The [Software Development](https://afm-spm.github.io/TopoStats/main/contributing.html#software-development) section of
the Contributing Guidelines may be useful if you are unfamiliar with linting, pre-commit, docstrings and testing.

**NB** - This header should be replaced with the description but please complete the below checklist or a short
description of why a particular item is not relevant.

---

Before submitting a Pull Request please check the following.

- [ ] Existing tests pass.
- [ ] Documentation has been updated and builds. Remember to update `configuration.md`, `usage.md`, and relevant
processing sections under `advanced.md`.
- [ ] Pre-commit checks pass.
- [ ] New functions/methods have typehints and docstrings.
- [ ] New functions/methods have tests which check the intended behaviour is correct.

## Optional

### `topostats/default_config.yaml`

If adding options to `topostats/default_config.yaml` please ensure.

- [ ] There is a comment adjacent to the option explaining what it is and the valid values.
- [ ] A check is made in `topostats/validation.py` to ensure entries are valid.
- [ ] Add the option to the relevant sub-parser in `topostats/entry_point.py`.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ pytest-debug.ini

# Documentation
_build/
!docs/_static/images/**

# MacOS
.DS_Store
Expand Down
5 changes: 3 additions & 2 deletions .markdownlint-cli2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@ config:
- div
- br

# Globs
globs:
- "**/*.md"

# Fix any fixable errors
ignores:
- "tmp/"

fix: true
14 changes: 7 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0 # Use the ref you want to point at
rev: v5.0.0 # Use the ref you want to point at
hooks:
- id: check-case-conflict
- id: check-docstring-first
Expand All @@ -19,26 +19,26 @@ repos:
types: [python, yaml, markdown]

- repo: https://github.com/DavidAnson/markdownlint-cli2
rev: v0.13.0
rev: v0.14.0
hooks:
- id: markdownlint-cli2
args: []

- repo: https://github.com/asottile/pyupgrade
rev: v3.17.0
rev: v3.19.0
hooks:
- id: pyupgrade
args: [--py38-plus]

- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.6.3
rev: v0.7.2
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix, --show-fixes]

- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.8.0
rev: 24.10.0
hooks:
- id: black
types: [python]
Expand All @@ -47,7 +47,7 @@ repos:
- id: black-jupyter

- repo: https://github.com/adamchainz/blacken-docs
rev: 1.18.0
rev: 1.19.1
hooks:
- id: blacken-docs
additional_dependencies:
Expand All @@ -64,7 +64,7 @@ repos:
- id: prettier

- repo: https://github.com/kynan/nbstripout
rev: 0.7.1
rev: 0.8.0
hooks:
- id: nbstripout

Expand Down
14 changes: 4 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,13 @@ topostats process
```

If you have your own YAML configuration file (see [Usage : Configuring
TopoStats](https://afm-spm.github.io/TopoStats/main/usage.html#configuring_topostats)) then invoke `topostats process`
and use the argument for `--config <config_file>.yaml` that points to your file.
TopoStats](https://afm-spm.github.io/TopoStats/main/usage.html#configuring_topostats)) then invoke `topostats`
and use the argument for `--config <config_file>.yaml` that points to your file with an associated module of
TopoStats e.g. `process`.

```bash
# Edit and save my_config.yaml then run TopoStats with this configuration file
topostats process --config my_config.yaml
topostats --config my_config.yaml process
```

The configuration file is validated before analysis begins and if there are problems you will see errors messages that
Expand All @@ -76,13 +77,6 @@ are hopefully useful in resolving the error(s) in your modified configuration.
You can generate a sample configuration file using the `topostats create-config` argument which writes the default
configuration to the file `./config.yaml` (i.e. in the current directory). This will _not_ run any analyses.

**NB** - This feature is only available in versions > v2.0.0 as it was introduced after v2.0.0 was released. In older
version > 2.0.0 and <= 2.1.2 you can use the older `run_topostats --create-config` option.

```bash
run_topostats --create-config-file config.yaml
```

### Notebooks

Example Jupyter Notebooks have been developed that show how to use TopoStats package interactively which is useful when
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/flattening/gaussian_sizes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

You can read more detailed information about the methods implemented in TopoStats in the pages below.

- [Flattening](advanced/flattening.md)
- [Grain Finding](advanced/grain_finding.md)
- [Thresholding](advanced/thresholding.md)
- [Disordered Tracing](advanced/disordered_tracing.md)
- [Nodestats](advanced/nodestats.md)
- [Ordered Tracing](advanced/ordered_tracing.md)
Expand Down
137 changes: 137 additions & 0 deletions docs/advanced/flattening.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Flattening

Flattening is the process of taking a raw AFM image, and removing the image artefacts that are present due to the
scanning probe microscopy (SPM) and AFM imaging. These encompass, but are not limited to; row alignment from the raster
scanning motion, and polynomial flattening of a surface from piezoelectric bowing.
For surface based samples, such as DNA on Mica, this results in an image where the background mica is flat and the
sample is clearly visible resting on the surface.

Here is a raw, unprocessed AFM image:

![raw AFM image](../_static/images/flattening/flattening_raw_afm_image.png)

You can see there is a large tilt in the image from the bottom right to the top left, as well as lots of horizontal
banding throughout the rows in the image. These artefacts are removed
during the flattening process in TopoStats knows as `Filters`.

## At a Glance - Removing AFM Imaging Artefacts

Images are processed by:

- Row alignment (make each row median the same height)
- Tilt & polynomial removal (fit a plane and quadratic polynomial to the image and subtract)
- Scar removal (remove long, thin, bright streaks in the data)
- Zero the average height (lower the image by the mean height) to make the background roughly centred at zero nm
- Masking (detect objects on the surface and flatten the image again, ignoring the data on the surface)
- Secondary flattening (re-process the data using the mask to tell us where the background is, and zero the data using
the mean of the background mask)
- Gaussian filter (to smooth pixel differences / high-gain noise)

![flattening pipeline](../_static/images/flattening/flattening_pipeline.png)

## Row alignment

The first step in the flattening process is **row alignment**. Row alignment is a process that adjusts the height of
each row of the image so that they all share the same median height value. This "median" value is set by the
`row_alignment_quartile` where the default of 0.5 is the median value, but can be adjusted depending on how much data
is considered background. This gets rid of some of the horizontal
banding and produces an image where the rows are aligned, but the image still has a clear tilt.

![row alignment](../_static/images/flattening/flattening_align_rows.png)

## Tilt removal

After row alignment, tilt removal is applied. This is a simple process of fitting and subtracting a plane to the image,
resulting in a mostly flat image. However as you can see in the following image, it's not perfect and there still
exists "shadows" on rows with lots of non-background data.
Two images are provided here, one with the full z-range and one with an adjusted height range (z-range) to show
the remaining artefacts better, such as the low regions or "shadows" on rows with lots of non-background data.

![tilt_removal_full_zrange](../_static/images/flattening/flattening_tilt_removal_full_zrange.png)

![tilt removal_better_viewing](../_static/images/flattening/flattening_tilt_removal.png)

## Polynomial removal

After the tilt, we remove the polynomial trends. In some images, there is also quadratic or occasionally cubic bowing to
the image too. We remove this by fitting a two dimensional quadratic polynomial to the image (in the horizontal
direction), and subtracting it from the image. We then do the same for a nonlinear polynomial (z = a*x*y) to eliminate
“saddle” trends in the data. We could do all of these at the same time, but we like to be able to see the iterative
differences.

## Scar removal (optional)

We then optionally run scar removal on the image. This is a special function that detects scars - long, thin, bright / dark
streaks in the data, caused by physical problems in the AFM process. They are found by the parameters; `threshold_low`
and `threshold_high` identifying great height changes between rows, and filtered for scars via `max_scar_width`
and `min_scar_length` in pixel lengths. We are using a different image here as an example since our lovely
minicircles.spm image doesn’t have any scars.

![scarred image](../_static/images/flattening/flattening_scarred_image.png)

![scar removed](../_static/images/flattening/flattening_scar_removed.png)

**Note that scar removal can distort data, and it’s best to take data without scars if you can.**

## Zero the average height

![height zeroing](../_static/images/flattening/flattening_height_zeroing.png)

We then lower the image by its mean height which causes the background of the image to be roughly centred at zero nm.
If this function is provided a foreground mask such as in the second iteration of flattening, this function zeros the
data only on the background data.
Data zeroing is important since the raw AFM heights are relative, and these processing steps can shift the background
height away from zero, so this makes it easier to obtain comparative height metrics.

## Masking

Now consider that all the processing we have done has assumed that every pixel of the image is background. We assumed
that there were no objects on the surface, messing up our fitting, and row alignment. If there was a large amount of
DNA on one side of the image, then the slope will be affected by it, and so flatten the image poorly.

Because of this, once we have done our initial flattening, we detect our objects on the surface, and then flatten the
image again! But this time, ignoring the data on the surface, and only considering the background.

How do we do that?
Well first, we need to find the data on the surface. We do this by thresholding.
The type of threshold (standard deviation - `std_dev`, absolute - `absolute`, otsu - `otsu`), and the threshold values
are set by the config file (have a look!). Any pixels that are below the threshold, are considered
background (sample surface). Any pixels that are above the threshold are considered to be data (useful sample objects).
This binary classification allows us to make a binary mask of where is foreground data, and where is background.

For more information on thresholding and how to set it, see the [thresholding](thresholding.md) page.

Here is the binary mask for minicircle.spm:

![tilt_removed_with_mask](../_static/images/flattening/tilt_removed_with_mask.png)

So you can see how all the interesting foreground (high) regions are now masked in white, and the background is in
black.

This allows TopoStats to use only the background (black pixels) in its calculations for slope removal, row alignment
etc.

So we re-do all the previous processing, but with this new useful binary mask to guide us.

## Secondary flattening

After re-processing the data using the mask to tell us where the background is, we get a better, more accurately
flattened image. We can see the "shadows" on rows with lots of data have now been flattened correctly.

From here, we can go on to do things like finding our objects of interest (grains) and get stats about them.

![secondary flattening](../_static/images/flattening/flattening_final_flattened_image.png)

## Gaussian filter

Finally, we apply a Gaussian filter to the image to smooth height differences and remove high-gain noise. This allows
you to get smoother data
but will start to blur out important features if you apply it too strongly. The default strength is a sigma of 1.0, but
you can adjust this in the config file under `gaussian_size`. The `gaussian_mode` parameter suggests how values at
the border should be handled, see
[skimage.filters.gaussian](https://scikit-image.org/docs/stable/api/skimage.filters.html#skimage.filters.gaussian)
for more details.

Here are some examples of different gaussian sizes:

![gaussian_sizes](../_static/images/flattening/gaussian_sizes.png)
99 changes: 99 additions & 0 deletions docs/advanced/grain_finding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Grain finding

## At a Glance - Identifying Objects of Interest

TopoStats automatically tries to find grains (objects of interest) in your AFM images. There are several steps to this.

- **Height thresholding**: We find grains based on their height in the image.
- **Remove edge grains**: We remove grains that intersect the image border.
- **Size thresholding**: We remove grains that are too small or too large.
- **Optional: U-Net mask improvement**: We can use a U-Net to improve the mask of each grain.

## Height thresholding

Grain finding is the process of detecting useful objects in your AFM images. This might be DNA, proteins, holes in a
surface or ridges on a surface.
In the standard operation of TopoStats, the way we find objects is based on a height threshold. This means that we
detect where things are based on how high up they are.

For example, with our example minicircles.spm image, we have DNA that is poking up from the sample surface, represented
by
bright regions in the image, alongside impurities and proteins, also above the surface:

![minicircles image](../_static/images/grain_finding/grain_finding_minicircles.png)

If we want to select the DNA, then we can take only the regions of the image that are above a certain height
threshold (standard deviation - `std_dev`, absolute - `absolute`, otsu - `otsu`).

Here are several thresholds to show you what happens as we increase the absolute height threshold:

![height thresholds](../_static/images/grain_finding/grain_finding_grain_thresholds.png)

Notice that the amount of data decreases, until we are only left with the very highest points.

The aim is to choose a threshold that keeps the data you want, while removing the background and other low objects
that you don’t want including.
So in this example, a threshold of 0.5 would be best, since it keeps the DNA while removing the background.

There are lots of objects in this mask that we don't want to analyse, but we can remove those using area thresholds in
the next steps. These objects have been detectd because while they are small, they are still high up and above the
background.

For more information on the types of thresholding, and how to set them, see the [thresholding](thresholding.md) page.

## Remove edge grains

Some grains may intersect the image border. In these cases, the grain will not be able to have accuracte statistics
calculated for it, since it is not fully in the image. Because of this, we have the option of removing grains that
intersect the image border with the `remove_edge_intersecting_grains` flag in the config file. This simply removes
any grains that intersect the image border.

Here is a before and after example of removing edge grains:

![size_thresholding](../_static/images/grain_finding/grain_finding_tidy_borders.png)

## Size thresholding

In our thresholded image, you will notice that we have a lot of small grains that we do not want to analyse in our
image. We can get rid of those with size thresholding. This is where TopoStats will remove grains based on their area,
leaving only the right size of molecules. You will need to play around with the thresholds to get the right results.

You can set the size threshold using the `absolute_area_threshold` in the config file. This sets the minimum and
maximum area of the grains that you want to keep, in nanometers squared. Eg if you want to keep grains that are between
10nm^2 and 100nm^2, you would set `absolute_area_threshold` to `[10, 100]`.

![size_thresholding](../_static/images/grain_finding/grain_finding_size_thresholding.png)

## Optional: U-Net mask improvement

As an additional optional step, each grain that reaches this stage can be improved by using a U-Net to mask the grain
again. This requires a U-Net model path to be supplied in the config file.

The U-Net model will take the bounding box of each grain, makes it square, and passees it to a trained U-Net model
which makes a prediction for a better mask, which then replaces the original mask.

Here is an example comparing absolute height thresholding to U-Net masking for one of our projects. The white boxes
indicate regions where the height threhsold performs poorly and is improved by the U-Net mask.

![unet_example](../_static/images/grain_finding/grain_finding_unet_example.png)

### Multi-class masking

TopoStats supports masking with multiple classes. This means that you could use a U-Net to mask DNA and proteins
separately.

This requires a U-Net that has been trained on multiple classes.

Here is an example of multi-class masking using a U-Net which was used for one of our projects.

![multi_class_unet_example](../_static/images/grain_finding/grain_finding_unet_multi_class_example.png)

## Technical details

### Details: Multi-class masking

Multi class masking is implemented by having each image be a tensor of shape N x N x C, where N is the image size,
and C is the number of classes. Each class is a binary mask, where 1 is the class, and 0 is not the class.
The first channel is background, where 1 is background, and 0 is not background. The rest of the channels
are arbitrary, and defined by how the U-Net was trained, however we conventially recommend that the first class
be for DNA (if applicable) and the next classes for other objects.
Loading

0 comments on commit c97bee4

Please sign in to comment.