This repository has been archived by the owner on May 5, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 131
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #423 from golgauth/master-2
Added help for 'Hierarchical Aligner GC'.
- Loading branch information
Showing
21 changed files
with
198 additions
and
0 deletions.
There are no files selected for viewing
Binary file added
BIN
+43.9 KB
...mzmine/modules/peaklistmethods/alignment/hierarchical/help/alignment-result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+15.6 KB
...et/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+37 KB
...et/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+34.1 KB
...et/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/chem-sim-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+3.98 KB
...eaklistmethods/alignment/hierarchical/help/distance-from-the-pair-members-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+9.99 KB
.../peaklistmethods/alignment/hierarchical/help/distance-from-the-pair-members.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+6.13 KB
...ules/peaklistmethods/alignment/hierarchical/help/distance-of-the-other-taxa.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+140 KB
...mzmine/modules/peaklistmethods/alignment/hierarchical/help/figtree-software.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
198 changes: 198 additions & 0 deletions
198
src/main/java/net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/help.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
<!doctype html public "-//w3c//dtd html 4.0 transitional//en"> | ||
<html> | ||
<head> | ||
<title>Alignment - Hierachical aligner (GC)</title> | ||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> | ||
<link rel="stylesheet" type="text/css" href="/net/sf/mzmine/desktop/impl/helpsystem/HelpStyles.css"> | ||
</head> | ||
<body> | ||
|
||
|
||
<h1>Hierachical aligner GC (or <b>Hierarchical Clustering aligner)</b></h1> | ||
|
||
<h2>Description</h2> | ||
|
||
<p> | ||
This method aligns detected peaks in different samples through a match score. This score is calculated | ||
based on the mass spectrum and retention time of each peak and ranges of tolerance stipulated in the parameters | ||
setup dialog. | ||
</p> | ||
|
||
<h2>General considerations</h2> | ||
|
||
<h3>GeneralAlgorithm: Agglomerative / hierarchical clustering approaches</h3> | ||
|
||
<em>How | ||
They Work</em></font><font face="Times New Roman, Times, serif"><br> | ||
Given a set of N items to be clustered, and an N*N distance (or similarity) | ||
matrix, the basic process of hierarchical clustering (defined by S.C. Johnson in 1967 - <a href="#">https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html#johnson</a>) is this: </font></p> | ||
<ol> | ||
<li><font align="justify" face="Times New Roman, Times, serif">Start by assigning | ||
each item to a cluster, so that if you have N items, you now have N clusters, | ||
each containing just one item. Let the distances (similarities) between the | ||
clusters the same as the distances (similarities) between the items they contain.</font></li> | ||
<li><font face="Times New Roman, Times, serif">Find the closest (most similar) | ||
pair of clusters and merge them into a single cluster, so that now you have | ||
one cluster less.</font></li> | ||
<li><font face="Times New Roman, Times, serif">Compute distances (similarities) | ||
between the new cluster and each of the old clusters.</font></li> | ||
<li><font face="Times New Roman, Times, serif">Repeat steps 2 and 3 until all | ||
items are clustered into a single cluster of size N. (*)</font></li> | ||
</ol> | ||
<p align="justify"><font face="Times New Roman, Times, serif">Step 3 can be done | ||
in different ways, which is what distinguishes <em>single-linkage</em> from | ||
<em>complete-linkage</em> and <em>average-linkage</em> clustering.<br> | ||
In <em>single-linkage</em> clustering (also called the <em>connectedness</em> | ||
or <em>minimum</em> method), we consider the distance between one cluster and | ||
another cluster to be equal to the <u>shortest</u> distance from any member | ||
of one cluster to any member of the other cluster. If the data consist of similarities, | ||
we consider the similarity between one cluster and another cluster to be equal | ||
to the <u>greatest</u> similarity from any member of one cluster to any member | ||
of the other cluster.<br> | ||
In <em>complete-linkage</em> clustering (also called the <em>diameter</em> or | ||
<em>maximum</em> method), we consider the distance between one cluster and another | ||
cluster to be equal to the <u>greatest</u> distance from any member of one cluster | ||
to any member of the other cluster.<br> | ||
In <em>average-linkage</em> clustering, we consider the distance between one | ||
cluster and another cluster to be equal to the <u>average</u> distance from | ||
any member of one cluster to any member of the other cluster.<br> | ||
A variation on average-link clustering is the UCLUS method of R. | ||
D'Andrade (1978) - <a href="#">https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html#dandrade</a> - which uses the <u>median</u> distance, which is much more | ||
outlier-proof than the average distance.</font></p> | ||
<p align="justify"><font face="Times New Roman, Times, serif">This kind of hierarchical | ||
clustering is called <em>agglomerative</em> because it merges clusters iteratively. | ||
There is also a <em>divisive</em> hierarchical clustering which does the reverse | ||
by starting with all objects in one cluster and subdividing them into smaller | ||
pieces. Divisive methods are not generally available, and rarely have been applied.</font></p> | ||
<p align="justify"><font face="Times New Roman, Times, serif">(*) Of course there | ||
is no point in having all the N items grouped in a single cluster but, once | ||
you have got the complete hierarchical tree, if you want k clusters you just | ||
have to cut the k-1 longest links.</font></p> | ||
<p></p> | ||
|
||
|
||
<hr> | ||
|
||
<h3>>> Available linkage criterion are:</h3> | ||
<ul> | ||
<li>Single</li> | ||
<li>Complete</li> | ||
<li>Average</li> | ||
</ul> | ||
|
||
|
||
<p>The linkage criterion determines the distance between sets of observations as a function of the pairwise distances between observations.</p> | ||
<p>Some commonly used linkage criteria between two sets of observations <i>A</i> and <i>B</i> are:</p> | ||
<img src="linkage-criteria-0.png" width="905" height="155" alt=""> | ||
<p>where <i>d</i> is the chosen metric. </p> | ||
|
||
<h4>Linkage methods:</h4> | ||
<p><img src="single-complete-average-2.png" width="400" height="400" alt=""><p><br> | ||
|
||
|
||
|
||
<hr> | ||
|
||
<h2>MZmine GC: Specific considerations</h2> | ||
|
||
<h3>>> How distance matrix between all pairs of observations is obtained</h3> | ||
|
||
<ol> | ||
<li>Compute a similarity score based on chemical likelihood only</li> | ||
<li>Compute a similarity score based on RT likelihood only</li> | ||
<li>Determine a combined weighted (=mixture) score based on a mixture of the above two scores</li> | ||
</ol> | ||
|
||
Giving:<br> | ||
|
||
<h4>1. Chemical similarity measurement (<i>chemSimScore</i>)</h4> | ||
<img src="chem-sim-3.png" width="675" height="232" alt=""> | ||
<br> | ||
<h4>2. Retention time score</h4> | ||
Retention time score is normalized relatively to the RT window tolerance provided by the user (<i><b>rtScore = (1.0d - rtDiff / rtMaxDiff)</b></i>)<br> | ||
<br> | ||
<h4>3. Mixture score</h4> | ||
The final mixture score (<i><b>score = (chemSimScore * mzWeight) + (rtScore * rtWeight)</b></i>).<br> | ||
<br> | ||
<img src="mixture-similarity-score.png" width="671" height="175" alt=""> | ||
<br> | ||
|
||
<h4>A square matrix is generated by comparing all the scores, for all the pairs of peaks.</h4> | ||
|
||
|
||
<h3>>> Getting clusters from rooted binary tree</h3> | ||
|
||
The algorithm methods described above all lead to a unique binary tree where all items are clustered into a single cluster of size N (number of leafs). | ||
<br> | ||
A final step is required to get the very final clusters list. A "cutoff" based on two criterion allows to split the single cluster into appropriate subgroups: | ||
|
||
<ol> | ||
<li>A cluster cannot contain more than one leaf per sample</li> | ||
<li>A cluster cannot contain two leafs for which the distance (or mixture similarity score) is to low</li> | ||
</ol> | ||
|
||
<h4>Example with a 3 samples binary tree</h4> | ||
|
||
<img src="splitting-binary-tree.png" width="953" height="614" alt=""> | ||
|
||
|
||
<hr> | ||
|
||
<h2>Method parameters</h2> | ||
|
||
<dl> | ||
<dt>Peak list name</dt> | ||
<dd>Name of the new aligned peak list</dd> | ||
<dt>m/z tolerance</dt> | ||
<dd>This value sets the range, in terms of m/z, for possible peaks to be | ||
aligned. Maximum allowed m/z difference</dd> | ||
<dt>Weight for m/z</dt> | ||
<dd>This is the assigned weight for m/z difference at the moment of match score calculation between peak rows. | ||
In case of perfectly matching m/z values the score receives the complete weight.</dd> | ||
<dt>Retention time tolerance type</dt> | ||
<dd>Maximum RT difference can be defined either using absolute or relative value</dd> | ||
<dt>Absolute RT tolerance</dt> | ||
<dd>Maximum allowed absolute RT difference</dd> | ||
<dt>Relative RT tolerance</dt> | ||
<dd>Maximum allowed relative RT difference</dd> | ||
<dt>Weight for RT</dt> | ||
<dd>This is the assigned weight for RT difference at the moment of match score calculation between peak rows. | ||
In case of perfectly matching RT values the score receives the complete weight.</dd> | ||
|
||
<dt>Export dendrogram as TXT</dt> | ||
<dd>Results in CDT + GTR files (See bellow how to visualize those files).</dd> | ||
<dt>Dendrogram output text filename</dt> | ||
<dd>Name of the resulting TXT file to write the clustering resulting dendrogram to. If the file already exists, it will be overwritten.</dd> | ||
|
||
</dl> | ||
|
||
<br> | ||
<hr align="center" width="50"> | ||
<br> | ||
|
||
<p> | ||
<b>New aligned peak list showing peaks from 3 different samples</b><br> | ||
<br> | ||
<img src="alignment-result.png" width="1292" height="624" alt=""> | ||
</p> | ||
|
||
<p> | ||
<b>Clustering result can be exported into CDT+GTR format</b><br> | ||
The latter can be then browsed using common applications such as TreeView - <a href="#" name="TreeView" title="TreeView">https://sourceforge.net/projects/jtreeview/</a>.<br> | ||
<br> | ||
<br> | ||
</p> | ||
|
||
<hr> | ||
|
||
<p> | ||
<h2>Inspiration taken from</h2> | ||
<ul> | ||
<li>An Optimal Peak Alignment For Comprehensive Two-Dimensional Gas Chromatography Mass Spectrometry Using Mixture Similarity Measure - 2013 [Seongho Kim, Aiqin Fang, Bing Wang, Jaesik Jeong, Xiang Zhang] - | ||
<a href="#">https://louisville.edu/faculty/x0zhan17/papers/2011_mspa_bioinfor</a></li> | ||
<li>WEKA hierarchical clustering - <a href="#">http://www.cs.waikato.ac.nz/ml/weka/</a></li> | ||
</ul> | ||
</p> | ||
|
||
</body> | ||
</html> |
Binary file added
BIN
+115 KB
.../modules/peaklistmethods/alignment/hierarchical/help/join_alignment_preview.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+27.1 KB
...mine/modules/peaklistmethods/alignment/hierarchical/help/linkage-criteria-0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+48.8 KB
...mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkage-criteria.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+26.8 KB
.../net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/linkages.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+23.3 KB
...odules/peaklistmethods/alignment/hierarchical/help/mixture-similarity-score.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+30.3 KB
...zmine/modules/peaklistmethods/alignment/hierarchical/help/neightbor-joining.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+8.05 KB
.../net/sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/q-matrix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+165 KB
...dules/peaklistmethods/alignment/hierarchical/help/single-complete-average-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+212 KB
...modules/peaklistmethods/alignment/hierarchical/help/single-complete-average.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+67.8 KB
...e/modules/peaklistmethods/alignment/hierarchical/help/splitting-binary-tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+154 KB
.../mzmine/modules/peaklistmethods/alignment/hierarchical/help/ward-centroid-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+205 KB
...sf/mzmine/modules/peaklistmethods/alignment/hierarchical/help/ward-centroid.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.