diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/alignment-result.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/alignment-result.png new file mode 100644 index 000000000..9b543f750 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/alignment-result.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-1.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-1.png new file mode 100644 index 000000000..af904ebb2 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-1.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-2.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-2.png new file mode 100644 index 000000000..b9ef724cc Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-2.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-3.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-3.png new file mode 100644 index 000000000..f16b0e008 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-3.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-from-the-pair-members-2.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-from-the-pair-members-2.png new file mode 100644 index 000000000..821ab0f6a Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-from-the-pair-members-2.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-from-the-pair-members.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-from-the-pair-members.png new file mode 100644 index 000000000..d81c2b9b3 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-from-the-pair-members.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-of-the-other-taxa.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-of-the-other-taxa.png new file mode 100644 index 000000000..3beb1270a Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/distance-of-the-other-taxa.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/figtree-software.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/figtree-software.png new file mode 100644 index 000000000..bc236ae86 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/figtree-software.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/help.html b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/help.html new file mode 100644 index 000000000..99057f0ac --- /dev/null +++ b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/help.html @@ -0,0 +1,198 @@ + + + + Alignment - Hierachical aligner (GC) + + + + + + +

Hierachical aligner GC (or Hierarchical Clustering aligner)

+ +

Description

+ +

+ This method aligns detected peaks in different samples through a match score. This score is calculated + based on the mass spectrum and retention time of each peak and ranges of tolerance stipulated in the parameters + setup dialog. +

+ +

General considerations

+ +

GeneralAlgorithm: Agglomerative / hierarchical clustering approaches

+ + How + They Work
+ Given a set of N items to be clustered, and an N*N distance (or similarity) + matrix, the basic process of hierarchical clustering (defined by S.C. Johnson in 1967 - https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html#johnson) is this:

+
    +
  1. Start by assigning + each item to a cluster, so that if you have N items, you now have N clusters, + each containing just one item. Let the distances (similarities) between the + clusters the same as the distances (similarities) between the items they contain.
  2. +
  3. Find the closest (most similar) + pair of clusters and merge them into a single cluster, so that now you have + one cluster less.
  4. +
  5. Compute distances (similarities) + between the new cluster and each of the old clusters.
  6. +
  7. Repeat steps 2 and 3 until all + items are clustered into a single cluster of size N. (*)
  8. +
+

Step 3 can be done + in different ways, which is what distinguishes single-linkage from + complete-linkage and average-linkage clustering.
+ In single-linkage clustering (also called the connectedness + or minimum method), we consider the distance between one cluster and + another cluster to be equal to the shortest distance from any member + of one cluster to any member of the other cluster. If the data consist of similarities, + we consider the similarity between one cluster and another cluster to be equal + to the greatest similarity from any member of one cluster to any member + of the other cluster.
+ In complete-linkage clustering (also called the diameter or + maximum method), we consider the distance between one cluster and another + cluster to be equal to the greatest distance from any member of one cluster + to any member of the other cluster.
+ In average-linkage clustering, we consider the distance between one + cluster and another cluster to be equal to the average distance from + any member of one cluster to any member of the other cluster.
+ A variation on average-link clustering is the UCLUS method of R. + D'Andrade (1978) - https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html#dandrade - which uses the median distance, which is much more + outlier-proof than the average distance.

+

This kind of hierarchical + clustering is called agglomerative because it merges clusters iteratively. + There is also a divisive hierarchical clustering which does the reverse + by starting with all objects in one cluster and subdividing them into smaller + pieces. Divisive methods are not generally available, and rarely have been applied.

+

(*) Of course there + is no point in having all the N items grouped in a single cluster but, once + you have got the complete hierarchical tree, if you want k clusters you just + have to cut the k-1 longest links.

+

+ + +
+ +

>> Available linkage criterion are:

+ + + +

The linkage criterion determines the distance between sets of observations as a function of the pairwise distances between observations.

+

Some commonly used linkage criteria between two sets of observations A and B are:

+ +

where d is the chosen metric.

+ +

Linkage methods:

+


+ + + +


+ +

MZmine GC: Specific considerations

+ +

>> How distance matrix between all pairs of observations is obtained

+ +
    +
  1. Compute a similarity score based on chemical likelihood only
  2. +
  3. Compute a similarity score based on RT likelihood only
  4. +
  5. Determine a combined weighted (=mixture) score based on a mixture of the above two scores
  6. +
+ + Giving:
+ +

1. Chemical similarity measurement (chemSimScore)

+ +
+

2. Retention time score

+ Retention time score is normalized relatively to the RT window tolerance provided by the user (rtScore = (1.0d - rtDiff / rtMaxDiff))
+
+

3. Mixture score

+ The final mixture score (score = (chemSimScore * mzWeight) + (rtScore * rtWeight)).
+
+ +
+ +

A square matrix is generated by comparing all the scores, for all the pairs of peaks.

+ + +

>> Getting clusters from rooted binary tree

+ +The algorithm methods described above all lead to a unique binary tree where all items are clustered into a single cluster of size N (number of leafs). +
+A final step is required to get the very final clusters list. A "cutoff" based on two criterion allows to split the single cluster into appropriate subgroups: + +
    +
  1. A cluster cannot contain more than one leaf per sample
  2. +
  3. A cluster cannot contain two leafs for which the distance (or mixture similarity score) is to low
  4. +
+ +

Example with a 3 samples binary tree

+ + + + +
+ +

Method parameters

+ +
+
Peak list name
+
Name of the new aligned peak list
+
m/z tolerance
+
This value sets the range, in terms of m/z, for possible peaks to be + aligned. Maximum allowed m/z difference
+
Weight for m/z
+
This is the assigned weight for m/z difference at the moment of match score calculation between peak rows. + In case of perfectly matching m/z values the score receives the complete weight.
+
Retention time tolerance type
+
Maximum RT difference can be defined either using absolute or relative value
+
Absolute RT tolerance
+
Maximum allowed absolute RT difference
+
Relative RT tolerance
+
Maximum allowed relative RT difference
+
Weight for RT
+
This is the assigned weight for RT difference at the moment of match score calculation between peak rows. + In case of perfectly matching RT values the score receives the complete weight.
+ +
Export dendrogram as TXT
+
Results in CDT + GTR files (See bellow how to visualize those files).
+
Dendrogram output text filename
+
Name of the resulting TXT file to write the clustering resulting dendrogram to. If the file already exists, it will be overwritten.
+ +
+ +
+
+
+ +

+ New aligned peak list showing peaks from 3 different samples
+
+ +

+ +

+ Clustering result can be exported into CDT+GTR format
+ The latter can be then browsed using common applications such as TreeView - https://sourceforge.net/projects/jtreeview/.
+
+
+

+ +
+ +

+

Inspiration taken from

+ +

+ + + diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/join_alignment_preview.jpg b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/join_alignment_preview.jpg new file mode 100644 index 000000000..67b521895 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/join_alignment_preview.jpg differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkage-criteria-0.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkage-criteria-0.png new file mode 100644 index 000000000..4ba6cf610 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkage-criteria-0.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkage-criteria.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkage-criteria.png new file mode 100644 index 000000000..205a23cf1 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkage-criteria.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkages.jpg b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkages.jpg new file mode 100644 index 000000000..477706d42 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkages.jpg differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/mixture-similarity-score.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/mixture-similarity-score.png new file mode 100644 index 000000000..fc70f942c Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/mixture-similarity-score.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/neightbor-joining.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/neightbor-joining.png new file mode 100644 index 000000000..f41eaec45 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/neightbor-joining.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/q-matrix.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/q-matrix.png new file mode 100644 index 000000000..3d30ae966 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/q-matrix.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/single-complete-average-2.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/single-complete-average-2.png new file mode 100644 index 000000000..da4068569 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/single-complete-average-2.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/single-complete-average.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/single-complete-average.png new file mode 100644 index 000000000..dc43b4840 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/single-complete-average.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/splitting-binary-tree.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/splitting-binary-tree.png new file mode 100644 index 000000000..268d36c48 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/splitting-binary-tree.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/ward-centroid-2.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/ward-centroid-2.png new file mode 100644 index 000000000..ae61fe956 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/ward-centroid-2.png differ diff --git a/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/ward-centroid.png b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/ward-centroid.png new file mode 100644 index 000000000..b8d2e15d4 Binary files /dev/null and b/src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/ward-centroid.png differ