Skip to content

Commit

Permalink
Update main.tex
Browse files Browse the repository at this point in the history
Corrections salinity
  • Loading branch information
JustGag authored Nov 10, 2024
1 parent 0262875 commit f4f236e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion papers/Gagnon_Kebe_Tahiri/main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ \subsection{Description of the data}
\subsection{Data preprocessing}
We used data from the article \citep{uhlir_adding_2021}, the IceAGE project, and related data from the BOLDSystem database, as described in \citep{uhlir_adding_2021}. Given the enormous variety of variables in these databases, we applied a selective reduction procedure. Variables with no variability (categorical data) were excluded from the study, for which all data were missing and were not linked to genetic sequences or spatial, environmental, and climatic variables. Out of the 495 available in the IceAGE dataset, we considered 62 specimens for which partial 16S rRNA mitochondrial gene sequences were available in the \citep{uhlir_adding_2021} article.

Next, we calculated the variance ($S^2$) in RStudio Desktop 4.3.2 for each of the selected variables (numerical and categorical). This step aimed to eliminate variables with low variation, as they are unlikely to provide essential data for analysis. We set a variance threshold of ≤ 0.1 to exclude uninformative variables. The latter retains variables whose variability is reasonably sufficient for the analyses while rejecting those with little variation. Only water salinity was eliminated based on this criterion ($S^2_\text{Salinity} = 0.02146629 \text{practical salinity units}^2, \text{PSU}^2$). The formula (see Equation \ref{variance}) and code (\autoref{lst:variance}) used to calculate the variance of the final variables, available in the data file on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub}, are provided below:
Next, we calculated the variance ($S^2$) in RStudio Desktop 4.3.2 for each of the selected variables (numerical and categorical). This step aimed to eliminate variables with low variation, as they are unlikely to provide essential data for analysis. We set a variance threshold of ≤ 0.1 to exclude uninformative variables. The latter retains variables whose variability is reasonably sufficient for the analyses while rejecting those with little variation. Only water salinity was eliminated based on this criterion ($S^2_\text{Salinity} = 0.02146629\,\text{practical salinity units}^2, \text{PSU}^2$). The formula (see Equation \ref{variance}) and code (\autoref{lst:variance}) used to calculate the variance of the final variables, available in the data file on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub}, are provided below:

\begin{equation}\label{variance}
S^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}
Expand Down

0 comments on commit f4f236e

Please sign in to comment.