Skip to content

Commit

Permalink
Update main.tex
Browse files Browse the repository at this point in the history
  • Loading branch information
TahiriNadia authored Jun 4, 2024
1 parent 3e902dd commit adcadee
Showing 1 changed file with 15 additions and 23 deletions.
38 changes: 15 additions & 23 deletions papers/Gagnon_Kebe_Tahiri/main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -39,18 +39,18 @@ \subsection{Data pre-processing}
Subsequently, we calculated the variance for each of the selected numeric attributes in order to eliminate those with zero or low variance (cut-off ≥ 0.1):

\begin{equation}
S^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}
S^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}
\end{equation}

where $S^2$ is the variance of the sample, $x_i$ represents each value in the dataset, $\bar{x}$ the average of all values in the dataset, and $n$ the number of values in the dataset.
where \( S^2 \) is the variance of the sample, \( x_i \) represents each value in the dataset, \( \bar{x} \) the average of all values in the dataset, and \( n \) the number of values in the dataset.

Of the previously selected numerical attributes, only salinity was removed ($S^2 = 0.02146629$). This selection of attributes and data resulted in a data table containing 62 rows ($n=62$) and 18 columns (number of attributes).
Of the previously selected numerical attributes, only salinity was removed (\( S^2 = 0.02146629 \)). This selection of attributes and data resulted in a data table containing 62 rows (\( n=62 \)) and 18 columns (number of attributes).

From the IceAGE database, 14 attributes were selected. These consist of the geographical coordinates such as longitude (decimal) and latitude (decimal) taken at the beginning (see Figure \ref{fig:fig1}a and \ref{fig:fig1}b) and at the end of sampling. The increase in latitude, in particular, has been highlighted by several studies as being linked to the decline of marine biodiversity on a global scale \citep{lambshead_latitudinal_2000, gage_diversity_2004}. These geographic data are divided into five sectors across the seas around Iceland: the Denmark Strait ($n=28$), the Iceland Basin ($n=15$), the Irminger Basin ($n=12$), the Norwegian Sea ($n=4$) and the Norwegian Basin ($n=3$). For the environmental attributes in this database, we included the depth (m) at the beginning (see Figure \ref{fig:fig1}c) and end of sampling as well as the temperature ($\degree C$) (see Figure \ref{fig:fig1}d) and oxygen concentration (mg/L) (see Figure \ref{fig:fig1}e) of the water depending on the depth at which the specimens were sampled. These properties of water bodies are drivers of deep-sea biodiversity and biogeography with oxygen being a limiting factor for living organisms \citep{keeling_ocean_2010}. In addition to these contributions, the increase in depth \citep{rex_global_2006, costello_marine_2017} as well as the decrease in water temperature at depth \citep{lambshead_latitudinal_2000} are also factors in the loss of marine biodiversity on a global scale. Meteorological parameters such as wind speed (m/s) (see Figure \ref{fig:fig1}f) and wind direction at the beginning and end of sampling were also included in our data given the contribution of wind to the restructuring of the benthic ecosystem through water transport \citep{waga_recent_2020, saeedi_environmental_2022}. The wind direction at the start of sampling consists of six orientations: South-West ($n=22$), South ($n=15$), North-East ($n=9$), South-South-East ($n=9$), North-West ($n=5$) and East ($n=2$); while the one at the end of sampling is made up of seven orientations: South ($n=15$), South-West ($n=15$), North-East ($n=9$), West-South-West ($n=7$), South-East ($n=6$), North-North-West ($n=5$), South-South-East ($n=3$) and East ($n=2$). In addition, we have included the sedimentary characteristics of the sampling sites as factors influencing the distribution of Cumacea \citep{uhlir_adding_2021} and which, for the purposes of this study, fall into six categories: mud ($n=30$), sandy mud ($n=15$), sand ($n=9$), forams ($n=3$), muddy sand ($n=3$) and gravel ($n=2$).
From the IceAGE database, 14 attributes were selected. These consist of the geographical coordinates such as longitude (decimal) and latitude (decimal) taken at the beginning (see Figure \ref{fig:fig1}a and \ref{fig:fig1}b) and at the end of sampling. The increase in latitude, in particular, has been highlighted by several studies as being linked to the decline of marine biodiversity on a global scale \citep{lambshead_latitudinal_2000, gage_diversity_2004}. These geographic data are divided into five sectors across the seas around Iceland: the Denmark Strait (\( n=28 \)), the Iceland Basin (\( n=15 \)), the Irminger Basin (\( n=12 \)), the Norwegian Sea (\( n=4 \)) and the Norwegian Basin (\( n=3 \)). For the environmental attributes in this database, we included the depth (m) at the beginning (see Figure \ref{fig:fig1}c) and end of sampling as well as the temperature (\( \degree C \)) (see Figure \ref{fig:fig1}d) and oxygen concentration (mg/L) (see Figure \ref{fig:fig1}e) of the water depending on the depth at which the specimens were sampled. These properties of water bodies are drivers of deep-sea biodiversity and biogeography with oxygen being a limiting factor for living organisms \citep{keeling_ocean_2010}. In addition to these contributions, the increase in depth \citep{rex_global_2006, costello_marine_2017} as well as the decrease in water temperature at depth \citep{lambshead_latitudinal_2000} are also factors in the loss of marine biodiversity on a global scale. Meteorological parameters such as wind speed (m/s) (see Figure \ref{fig:fig1}f) and wind direction at the beginning and end of sampling were also included in our data given the contribution of wind to the restructuring of the benthic ecosystem through water transport \citep{waga_recent_2020, saeedi_environmental_2022}. The wind direction at the start of sampling consists of six orientations: South-West (\( n=22 \)), South (\( n=15 \)), North-East (\( n=9 \)), South-South-East (\( n=9 \)), North-West (\( n=5 \)) and East (\( n=2 \)); while the one at the end of sampling is made up of seven orientations: South (\( n=15 \)), South-West (\( n=15 \)), North-East (\( n=9 \)), West-South-West (\( n=7 \)), South-East (\( n=6 \)), North-North-West (\( n=5 \)), South-South-East (\( n=3 \)) and East (\( n=2 \)). In addition, we have included the sedimentary characteristics of the sampling sites as factors influencing the distribution of Cumacea \citep{uhlir_adding_2021} and which, for the purposes of this study, fall into six categories: mud (\( n=30 \)), sandy mud (\( n=15 \)), sand (\( n=9 \)), forams (\( n=3 \)), muddy sand (\( n=3 \)) and gravel (\( n=2 \)).

In the BOLD Systems database, taxonomic ranks such as family, genus, and species of the sampled Cumacea were included in our data. These are composed of seven families of Cumacea: Diastylidae ($n=21$), Lampropidae ($n=13$), Leuconidae ($n=12$), Nannastacidae ($n=7$), Bodotriidae ($n=4$), Ceratocumatidae ($n=3$) and Pseudocumatidae ($n=2$). A total of 21 species of Cumacea are found in our sample (see Figure \ref{fig:fig2}).
In the BOLD Systems database, taxonomic ranks such as family, genus, and species of the sampled Cumacea were included in our data. These are composed of seven families of Cumacea: Diastylidae (\( n=21 \)), Lampropidae (\( n=13 \)), Leuconidae (\( n=12 \)), Nannastacidae (\( n=7 \)), Bodotriidae (\( n=4 \)), Ceratocumatidae (\( n=3 \)) and Pseudocumatidae (\( n=2 \)). A total of 21 species of Cumacea are found in our sample (see Figure \ref{fig:fig2}).

The habitat and water mass of the sampling points are the only attributes that were taken directly via Table 1 of \citep{uhlir_adding_2021}. Thus, the definitions of water bodies described by \citep{hansen_north_2000, brix2010distribution, ostmann_marine_2014} were used as a reference for the GIN seas around Iceland: Arctic Polar Water (APW, $n=15$), Iceland Sea Overflow Water (ISOW, $n=15$), North Atlantic Water (NAW, $n=9$), Arctic Polar Water/Norwegian Sea Arctic Intermediate Water (APW/NSAIW, $n=7$), warm Norwegian Sea Deep Water (NSDWw, $n=8$), Labrador Sea Water (LSW, $n=3$), cold Norwegian Sea Deep Water (NSDWc, $n=3$) and Norwegian Sea Arctic Intermediate Water (NSAIW, $n=2$) (see Figure \ref{fig:fig3}). In terms of habitat, we considered the three categories used in \citep{uhlir_adding_2021}: deep sea ($n=38$), shelf ($n=15$) and slope ($n=9$) (see Figure \ref{fig:fig4}).
The habitat and water mass of the sampling points are the only attributes that were taken directly via Table 1 of \citep{uhlir_adding_2021}. Thus, the definitions of water bodies described by \citep{hansen_north_2000, brix2010distribution, ostmann_marine_2014} were used as a reference for the GIN seas around Iceland: Arctic Polar Water (APW, \( n=15 \)), Iceland Sea Overflow Water (ISOW, \( n=15 \)), North Atlantic Water (NAW, \( n=9 \)), Arctic Polar Water/Norwegian Sea Arctic Intermediate Water (APW/NSAIW, \( n=7 \)), warm Norwegian Sea Deep Water (NSDWw, \( n=8 \)), Labrador Sea Water (LSW, \( n=3 \)), cold Norwegian Sea Deep Water (NSDWc, \( n=3 \)) and Norwegian Sea Arctic Intermediate Water (NSAIW, \( n=2 \)) (see Figure \ref{fig:fig3}). In terms of habitat, we considered the three categories used in \citep{uhlir_adding_2021}: deep sea (\( n=38 \)), shelf (\( n=15 \)) and slope (\( n=9 \)) (see Figure \ref{fig:fig4}).

In order to better interpret the relation and evolutionary responses of benthic species, genetic data are needed \citep{wilson_speciation_1987, uhlir_adding_2021}. The aligned DNA sequence of the mitochondrial 16S rRNA gene region of each of the samples will be included in our analyses. In order to understand Thus, we consider 62 of the 306 aligned DNA sequences that were used for phylogenetic analyses of \citep{uhlir_adding_2021}. Since some of the specimens in our sample have their DNA sequence aligned duplicated, or even quadrupled with a difference of one to two nucleotides, we considered the longest aligned DNA sequence for each specimen. Figures \ref{fig:fig1}, \ref{fig:fig2}, \ref{fig:fig5} and \ref{fig:fig6} were made from Python 3.11, while Figures \ref{fig:fig3} and \ref{fig:fig4} were made from RStudio Desktop 4.3.2.

Expand Down Expand Up @@ -101,11 +101,9 @@ \section{Metrics}\label{metrics}
\subsection{Robinson-Foulds Distance (RF Distance)}\label{RF}
The Robinson-Foulds distance measures the dissimilarity between two phylogenetic trees. It counts the number of splits (bipartitions) that are present in one tree but not the other (see \autoref{lst:robinsonFoulds}).

\begin{equation}
\text{RF}(T_1, T_2) = | \Sigma(T_1) \Delta \Sigma(T_2) |
\end{equation}
\[ \text{RF}(T_1, T_2) = | \Sigma(T_1) \Delta \Sigma(T_2) | \]

where $\Sigma(T_1)$ and $\Sigma(T_2)$ are the sets of splits in trees $T_1$ and $T_2$.
where \( \Sigma(T_1) \) and \( \Sigma(T_2) \) are the sets of splits in trees \( T_1 \) and \( T_2 \).

%\autoref{lst:robinsonFoulds}.
\begin{lstlisting}[label=lst:robinsonFoulds,language=Python,caption=Python script for calculating the Robinson-Foulds distance using the ete3 package in aPhyloGeo package]
Expand All @@ -125,18 +123,14 @@ \subsection{Robinson-Foulds Distance (RF Distance)}\label{RF}
\subsection{Normalized Robinson-Foulds Distance}\label{RFnorm}
The normalized Robinson-Foulds distance scales the RF distance to account for the size of the trees, giving a value between 0 and 1.

\begin{equation}
\text{RF}_{\text{norm}}(T_1, T_2) = \frac{| \Sigma(T_1) \Delta \Sigma(T_2) |}{| \Sigma(T_1) | + | \Sigma(T_2) |}
\end{equation}
\[ \text{RF}_{\text{norm}}(T_1, T_2) = \frac{| \Sigma(T_1) \Delta \Sigma(T_2) |}{| \Sigma(T_1) | + | \Sigma(T_2) |} \]

\subsection{Euclidean Distance}\label{euclidean}
The Euclidean distance between two points in a multi-dimensional space is the length of the line segment connecting them (see \autoref{lst:euclideanDist}).

For points $\mathbf{p} = (p_1, \ldots, p_n)$ and $\mathbf{q} = (q_1, \ldots, q_n)$:
For points \( \mathbf{p} = (p_1, \ldots, p_n) \) and \( \mathbf{q} = (q_1, \ldots, q_n) \):

\begin{equation}
d_{\text{Euclidean}}(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}
\end{equation}
\[ d_{\text{Euclidean}}(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2} \]

%\autoref{lst:euclideanDist}.
\begin{lstlisting}[label=lst:euclideanDist,language=Python,caption=Python script for calculating the Euclidean distance distance using the ete3 package in aPhyloGeo package]
Expand All @@ -155,11 +149,9 @@ \subsection{Euclidean Distance}\label{euclidean}
\subsection{Least-Squares Distance}\label{LS}
The Least-Squares distance measures the discrepancy between observed and estimated values, often used in regression analysis (see \autoref{lst:LeastSquare}).

For observed values $y_i$ and estimated values $\hat{y}_i$:
For observed values \( y_i \) and estimated values \( \hat{y}_i \):

\begin{equation}
d_{\text{LS}} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
\end{equation}
\[ d_{\text{LS}} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

These formulas succinctly describe the methods used to measure distances and dissimilarities in various contexts.

Expand All @@ -181,7 +173,7 @@ \subsection{Least-Squares Distance}\label{LS}

\section{Results}\label{results}

Figure \ref{fig:fig1} shows the extent and variability of the dataset for three geographic attributes, namely latitude (decimal) and longitude (decimal) at the start of sampling and depth (m) at specimen collection, as well as three environmental attributes, such as wind speed at the start of sampling (m/s) and temperature ($\degree C$) and the oxygen concentration of the water (mg/L) based on the depth at which the samples were collected.
Figure \ref{fig:fig1} shows the extent and variability of the dataset for three geographic attributes, namely latitude (decimal) and longitude (decimal) at the start of sampling and depth (m) at specimen collection, as well as three environmental attributes, such as wind speed at the start of sampling (m/s) and temperature (\( \degree C \))) and the oxygen concentration of the water (mg/L) based on the depth at which the samples were collected.

\begin{figure}[]
\centering
Expand Down Expand Up @@ -247,4 +239,4 @@ \section{Results}\label{results}

\section{Conclusion}\label{conclusion}

The objective of this study is to perform an in-depth analysis of the influence of extreme climatic variables and environmental characteristics around Iceland on Cumacea (crustaceans: Peracarida) based on phylogeographic analysis. To date, we have selected relevant attributes for our study based on data from the IceAGE project, BOLD Systems, and the study by \citep{uhlir_adding_2021} and eliminated those that were not relevant to this study as well as those that had low variance (salinity, $S^2 = 0.02146629$) or abundant missing data (>95\%). Thus, the first part consisted mainly of literature review, data collection, data pre-processing, and data analysis.
The objective of this study is to perform an in-depth analysis of the influence of extreme climatic variables and environmental characteristics around Iceland on Cumacea (crustaceans: Peracarida) based on phylogeographic analysis. To date, we have selected relevant attributes for our study based on data from the IceAGE project, BOLD Systems, and the study by \citep{uhlir_adding_2021} and eliminated those that were not relevant to this study as well as those that had low variance (salinity, \( S^2 = 0.02146629 \)) or abundant missing data (>95\%). Thus, the first part consisted mainly of literature review, data collection, data pre-processing, and data analysis.

0 comments on commit adcadee

Please sign in to comment.