coralmicrobiome-hostbiodiversity

This is the repository associated with the manuscript »Host-level biodiversity shapes the dynamics and networks within the coral reef microbiome«. It contains all code for analyses and figures presented in the manuscript.

Preparation

To reproduce the analysis, you need a local installation of git (clone the repository with: git clone [email protected]:SushiLab/reef-microbiomics-paper.git) and R. Download the data from https://zenodo.org/records/14131612 into a directory named data at the same level as code. As computational resources might be limited, we provide intermediary analyses files on Zenodo, too.

Scripts

00    --> generate colour palette
01-15 --> curate data, including decontamination, merging, and rarefying
16-18 --> compute diversity metrics
19-21 --> community-based analyses: compare communities
22-23 --> community-based analyses: framework microbial exchange
24-27 --> community-based analyses: interaction networks
28-31 --> member-based analyses: interaction networks
32-35 --> member-based analyses: differential abundance analysis
36-40 --> functional prediction analyses

`00_Palettes.R`

Define colour palette (colour_palette.RDS)

`01_Merge-ASV-Tables_Host-Associated.R`

Remove reads (i) unclassified at the domain-level, (ii) belonging to the domain Eukaryota, (iii) belonging to the family Mitochondria, (iv) belonging to the order Chloroplast from the host-associated data
Merge the two ASV tables using the mergeSequenceTables() and collapseNoMismatch() functions by dada2
Rename ASVs and sample names
Generate the ASV data table (asv_dat_original_host-associated.tsv) and ASV abundance table (asv_abtab_original_host-associated.tsv)

`02_Explore-Original-Data_Host-Associated.R`

Compute and plot sequencing depth
Subsample to 1000 reads
Compare microbial communities (square root transformed Bray-Curtis distances)
Analyse sample-to-blank distances
Compare mucus vs control communities

`03_Explore-Decontam-Thresholds_Host-Associated.R`

Run decontam() with thresholds 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5 and generate decontaminated ASV tables and stats files
Explore decontam scores
Compute and plot percentage of original reads/ASVs kept after decontam() using various thresholds (=> default threshold of 0.1 works well)

`04_Explore-Decontaminated-Data_Host-Associated.R`

Subsample to 1000 reads
Analyse sample-to-blank distances for both the original and the decontaminated (at 0.1) ASV abundance tables
Compare microbial communities (square root transformed Bray-Curtis distances) after decontamination
Run PERMANOVA and PERMDISP (=> species significant)
Compare microbial communities (square root transformed Bray-Curtis distances) by taxonomic group

`05_Generate-Cleaned-Tables_Host-Associated.R`

Remove all samples that were lost during the experiment as well as all positive and negative controls
Write cleaned metadata (metadat_cleaned_host-associated.tsv), ASV data table (asv_dat_cleaned_host-associated.tsv), and ASV abundance table (asv_abtab_cleaned_host-associated.tsv; threshold = 0.1)
Compute and plot sequencing depth

`06_Merge-ASV-Tables_Free-living.R`

Remove reads (i) unclassified at the domain-level, (ii) belonging to the domain Eukaryota, (iii) belonging to the family Mitochondria, (iv) belonging to the order Chloroplast from the free-living and exuded data
Merge the two ASV tables using the mergeSequenceTables() and collapseNoMismatch() functions by dada2
Rename ASVs and sample names
Generate the ASV data table (asv_dat_original_free-living.tsv) and ASV abundance table (asv_abtab_original_free-living.tsv)

`07_Explore-Original-Data_Free-living.R`

Compute and plot sequencing depth
Subsample to 1000 reads
Compare microbial communities (square root transformed Bray-Curtis distances)

`08_Explore-Decontam-Thresholds_Free-living.R`

Run decontam() with thresholds 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5 and generate decontaminated ASV tables and stats files
Explore decontam scores
Compute and plot percentage of original reads/ASVs kept after decontam() using various thresholds (=> more stringent threshold of 0.2 works well)

`09_Explore-Decontaminated-Data_Free-living.R`

Subsample to 1000 reads
Analyse sample-to-blank distances for both the original and the decontaminated (at 0.2) ASV abundance tables
Compare microbial communities (square root transformed Bray-Curtis distances) after decontamination
Run PERMANOVA and PERMDISP (=> species significant)
Compare microbial communities (square root transformed Bray-Curtis distances) by taxonomic group

`10_Generate-Cleaned-Tables_Free-living.R`

Remove all samples that were lost during the experiment as well as all positive and negative controls
Write cleaned metadata (metadat_cleaned_free-living.tsv), ASV data table (asv_dat_cleaned_free-living.tsv), and ASV abundance table (asv_abtab_cleaned_free-living.tsv; threshold = 0.2)
Compute and plot sequencing depth

`11_Merge-ASV-Tables.R`

Merge the decontaminated (host-associated and free-living) ASV tables using the mergeSequenceTables() and collapseNoMismatch() functions by dada2
Remove singleton ASVs (present in a single sample at an abundance of 1)
Rename ASVs and sample names
Generate the ASV data table (asv_dat_original.tsv) and ASV abundance table (asv_abtab_original.tsv)

`12_Merge-Metadata_Seq-Depth.R`

Merge the two metadata tables (metadat_cleaned.tsv)
Compute and plot sequencing depth

`13_Rarefy.R`

Rarefy 50 times to 2500 reads per sample (asv_abtab_rarefied_{1,50}.tsv)
Write metadata excluding dropped samples (metadat_rarefied.tsv)

`14_Compile-Sequence-Files.R`

Load all 50 rarefied ASV abundance tables
Extract the sequence information
Write fasta files (asv_abtab_rarefied_{1,50}.fasta)

`15_Annotate-Taxonomy.R`

Assign taxonomic info to ASVs using silva_nr99_v138.1_train_set.fa
Write taxonomy file (asv_dat_taxinfo.tsv)

`16_Compute-BC-Dissimilarity.R`

Load all 50 rarefied ASV abundance tables and the associated metadata
Compute the square-root transformed Bray-Curtis distances for each of the 50 ASV tables and save as a list
Set the diagonal to NA
Reduce the list to a single distance table by taking the sum of the corresponding elements and calculating the mean by dividing by the length of the dataframe
Write the square-root transformed Bray-Curtis distance table (asv_bctab.tsv)

`17_Compute-Richness.R`

Load all 50 rarefied ASV abundance tables and the associated metadata
Compute the hill number diversity indices of each sample for each of the 50 ASV tables and save as a list
Reduce the list to a single distance table by taking the sum of the corresponding elements and calculating the mean by dividing by the length of the dataframe
Write the richness table (asv_richtab.tsv)

`18_Compute-Shared.R`

Load all 50 rarefied ASV abundance tables and the associated metadata
Compute the number of shared ASVs for each of the 50 ASV tables and save as a list
Set the diagonal to NA
Reduce the list to a single distance table by taking the sum of the corresponding elements and calculating the mean by dividing by the length of the dataframe
Write the number of shared ASVs table (asv_shared.tsv)

`19_BDiv_Compare-Communities.R`

Compare the host-associated and exuded microbial communities by species/sample type

Load the square-root transformed Bray-Curtis distance table
Perform a PCoA on the distances
Run a PERMANOVA to test for differences explained by species and PERMDISP (=> species significant)

`20_ADiv_Host-Exudates.R`

Correlate alpha-diversity changes of exuded vs host-associated microbial communities

Load the richness table
Plot exudate vs host Hill numbers by species and get correlation coefficient

`21_BDiv_Complexity_Host-Exudates.R`

Compare the host-associated and exuded microbial communities of hosts living in biodiverse vs degraded reef communities

Select samples from phase 2
For each source and species, run a PERMANOVA to test for differences explained by the ambient community complexity

`22_BDiv_Specificity-Exudation.R`

Explore host-specificity and exudation

Transform the distance table into a 3-column dataframe (keeping all data from all three phases)
Prepare the host-to-tank distance dataframe
Prepare the exudates-to-incubation control dataframe
Plot exudate to control (x-axis) vs host to tank (y-axis)
Draw an ellipse at 0.5 (half of the points lie within the ellipse)
Use data to define cutoffs for groups (mean)
Test whether the groups are significantly different (=> species significant)
Add standard deviation to each data point (as one point is the mean of three distances)

`23_BDiv_Complexity_Specificity-Exudation.R`

Explore host-specificity and exudation as a function of the complexity of the reef community

Select samples from phase 2
Transform the distance table into a 3-column dataframe
Prepare the host-to-tank distance dataframe
Prepare the exudates-to-incubation control dataframe
Compare both distances of hosts living in a biodiverse vs degraded reef community by species with p.adjust.method="holm"
Test whether the mean of the groups is significantly different as a response to treatment

`24_BDiv_Similarity-Between-Hosts.R`

Select species samples from phase 2
Extract distances from asv_bctab.tsv across treatment and source
Group by host and average distance across samples
Convert dissimilarity to similarity
Write the edge files
Compute stats on the full data comparing treatments and write the stats files

`25_BDiv_Richness-By-Host.R`

Select species samples from phase 2
Extract richness from asv_richtab.tsv across treatment and source
Group by host and average richness across samples
Write the node files
Compute stats on the full data comparing treatments and write the stats files

`26_BDiv_Similarity-Richness-Network.R`

Load edge and node files
Compute similarity ratio between biodiverse and degraded by calculating $0.5-\frac{similarity_{degraded}}{similarity_{biodiverse}+similarity_{degraded}}$ (=> negative values mean the similarity between hosts is higher in a degraded habitat, positive values in a biodiverse habitat; the higher the values the more the similarities between the two treatments differ)
Compute richness ratio between biodiverse and degraded by calculating $0.5-\frac{richness_{degraded}}{richness_{biodiverse}+richness_{degraded}}$ (=> negative values mean the richness of a host is higher in a degraded habitat, positive values in a biodiverse habitat; the higher the values the more the richness between the two treatments differ)
Initiate the network plot using the packages igraph and ggraph
Plot the network with colours denoting in which treatment the values are higher and line width/circle size indicating the magnitude of the difference

`27_BDiv_Similarity-Richness-Network_Supplementary.R`

Load edge and node files
Initiate the network plot using the packages igraph and ggraph
Plot the network with line width and colours denoting the magnitude of the compositional interactions and circle size indicating the richness

`28_ASVs_Shared-Between-Hosts.R`

Select species samples from phase 2
Extract distances from asv_shared.tsv across treatment and source
Group by host and average distance across samples
Write the edge files
Compute stats on the full data comparing treatments and write the stats files

`29_ASVs_Unique-By-Host.R`

Load all 50 rarefied ASV abundance tables and the associated metadata
Select species samples from phase 2
Group by host and compute number of unique ASVs across treatment and source using the function upset_data()
Reduce the list to a single distance table by taking the sum of the corresponding elements and calculating the mean by dividing by the length of the dataframe
Write the node files

`30_ASVs_Shared-Unique-Network.R`

Load edge and node files
Compute shared ratio between biodiverse and degraded by calculating $0.5-\frac{shared_{degraded}}{shared_{biodiverse}+shared_{degraded}}$ (=> negative values mean the number of shared ASVs between hosts is higher in a degraded habitat, positive values in a biodiverse habitat; the higher the values the more the number of shared ASVs between the two treatments differ)
Compute unique ratio between biodiverse and degraded by calculating $0.5-\frac{unique_{degraded}}{unique_{biodiverse}+unique_{degraded}}$ (=> negative values mean the number of unique ASVs of a host is higher in a degraded habitat, positive values in a biodiverse habitat; the higher the values the more the number of unique ASVs between the two treatments differ)
Initiate the network plot using the packages igraph and ggraph
Plot the network with colours denoting in which treatment the values are higher and line width/circle size indicating the magnitude of the difference

`31_ASVs_Shared-Unique-Network_Supplementary.R`

Load edge and node files
Initiate the network plot using the packages igraph and ggraph
Plot the network with line width and colours denoting the level of shared ASVs and circle size indicating the unique ASVs

`32_ASVs_DA.R`

Analyse the differentially abundant ASVs between source and treatment in two separate analyses

Select samples from phase 2
Load the modified ancombc R files
For each source separately, subset and transform the data by species
Construct TreeSummarizedExperiment object
Use ancombc2() with fixed effect $treatment_binary$ and p_adj_method="holm"
Write ANCOMBC2 output files (diffab_species.tsv)

`33_ASVs_DA_Volcano-Plot.R`

Plot the DA output

Plot log-fold changes (based on the natural log) with significant differences in red (after correcting for multiple comparisons using the Holm method)

`34_ASVs_DA_iTOL_LFC.R`

Prepare the compilated log-fold change file for iTOL

Merge the DA output
Pull the significant log-fold changes between biodiverse and degraded communities (asv_diffab_lfc.tsv)

`35_ASVs_DA_iTOL_Tree.R`

Examine the phylogeny of the differentially abundant ASVs (across host)

Align sequences through multiple sequence aligner using Muscle
Test maximum-likelihood models
Select the two superior ones and compute trees
Bootstrap 100 times

`36_FunPred_KEGG-Download.R`

Download the Pathway list, the KO list, and the KO to Pathway link files
Reformat and write files (pathway_list.tsv, ko_list.tsv, ko_pathway_link.tsv)

`picrust_rarefied_data.sh`

#! /bin/bash
#SBATCH --job-name picrust      # how the job will be named
#SBATCH --output picrust.log    # filename for the output messages
#SBATCH --error picrust.log     # filename for the error messages
#SBATCH -n 20                   # number of CPUs
#SBATCH --mem-per-cpu=5G        # memory per CPU
#SBATCH --time=14-23:00:00      # maximum time to complete (after which it is killed)

eval "$(conda shell.bash hook)"

conda activate picrust2_env

mkdir /[INSERT PATH]/data/picrust_output

for i in /[INSERT PATH]/data/picrust_input/*.fasta ; \
do j=$( echo $i | sed 's/.fasta//' | sed 's/.*picrust_input[/]//' ) ; \
picrust2_pipeline.py -s "$i" -i /[INSERT PATH]/data/picrust_input/"$j".tsv \
-o /[INSERT PATH]/data/picrust_output/"$j" -p 20 ; done

`37_FunPred_Compute-KO-per-Sample-Abundances.R`

Load all 50 generated predicted unstratified metagenome files (pred_metagenome_unstrat.tsv.gz)
Get a list of the functions (K numbers) that are present across all 50 rarefaction files
Extract these functions from each abundance file and save as a list
Reduce the list to a single abundance table by taking the sum of the corresponding elements and calculating the mean by dividing by the length of the dataframe
Write the abundance table (ko_abtab_sample.tsv)

`picrust_pathway_to_ko.sh`

#! /bin/bash
#SBATCH --job-name picrust      # how the job will be named
#SBATCH --output picrust.log    # filename for the output messages
#SBATCH --error picrust.log     # filename for the error messages
#SBATCH -n 20                   # number of CPUs
#SBATCH --mem-per-cpu=5G        # memory per CPU
#SBATCH --time=14-23:00:00      # maximum time to complete (after which it is killed)

eval "$(conda shell.bash hook)"

conda activate picrust2_env

pathway_pipeline.py -i /[INSERT PATH]/data/ko_abtab_sample.tsv \
-o /[INSERT PATH]/data/picrust_output \
--no_regroup --map /[INSERT PATH]/data/KEGG_pathway_to_ko.tsv -p 20

`38_FunPred_Pathway_Species_.R`

Load the KEGG pathway list (pathway_list_cat.csv) with hand annotated categories
Load the KEGG pathway converted abundance table (path_abun_unstrat.tsv.gz) and extract pathways associated with metabolism, but remove global and overview maps
Compute the square-root transformed Bray-Curtis distances
Perform a PCoA on the distances
Run a PERMANOVA to test for differences explained by species and PERMDISP (=> species significant) and test pairwise using pairwise.adonis2()

`39_FunPred_Compute-KO-per-ASV-Abundances.R`

Load all 50 generated predicted KO abundance files (KO_predicted.tsv.gz)
Get a list of the KOs that are present across all 50 rarefaction files
Extract these KOs from each abundance file and save as a list
Reduce the list to a single abundance table by taking the sum of the corresponding elements and calculating the mean by dividing by the length of the dataframe
Write the abundance table (ko_abtab_asv.tsv)

`40_FunPred_Upset_Sinularia.R`

Load the KEGG pathway list (pathway_list_cat.csv) with hand annotated categories
Load the ko-to-pathway link file (ko_pathway_link.tsv) and extract pathways associated with metabolism, but remove global and overview maps
Load the KO abundance table (ko_abtab_asv.tsv)
Select the ASVs of interest (two closely related members within the family Flavobacteriaceae that are differentially abundant as a response to treatment in Sinularia sp.)
Summarise the KOs to pathways
Remove pathways that are absent
Convert pathway abundance table to presence-absence and annotate
Generate upset plot

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
README.md		README.md

SushiLab/coralmicrobiome-hostbiodiversity

Folders and files

Latest commit

History

Repository files navigation

coralmicrobiome-hostbiodiversity

Preparation

Scripts

00_Palettes.R

01_Merge-ASV-Tables_Host-Associated.R

02_Explore-Original-Data_Host-Associated.R

03_Explore-Decontam-Thresholds_Host-Associated.R

04_Explore-Decontaminated-Data_Host-Associated.R

05_Generate-Cleaned-Tables_Host-Associated.R

06_Merge-ASV-Tables_Free-living.R

07_Explore-Original-Data_Free-living.R

08_Explore-Decontam-Thresholds_Free-living.R

09_Explore-Decontaminated-Data_Free-living.R

10_Generate-Cleaned-Tables_Free-living.R

11_Merge-ASV-Tables.R

12_Merge-Metadata_Seq-Depth.R

13_Rarefy.R

14_Compile-Sequence-Files.R

15_Annotate-Taxonomy.R

16_Compute-BC-Dissimilarity.R

17_Compute-Richness.R

18_Compute-Shared.R

19_BDiv_Compare-Communities.R

20_ADiv_Host-Exudates.R

21_BDiv_Complexity_Host-Exudates.R

22_BDiv_Specificity-Exudation.R

23_BDiv_Complexity_Specificity-Exudation.R

24_BDiv_Similarity-Between-Hosts.R

25_BDiv_Richness-By-Host.R

26_BDiv_Similarity-Richness-Network.R

27_BDiv_Similarity-Richness-Network_Supplementary.R

28_ASVs_Shared-Between-Hosts.R

29_ASVs_Unique-By-Host.R

30_ASVs_Shared-Unique-Network.R

31_ASVs_Shared-Unique-Network_Supplementary.R

32_ASVs_DA.R

33_ASVs_DA_Volcano-Plot.R

34_ASVs_DA_iTOL_LFC.R

35_ASVs_DA_iTOL_Tree.R

36_FunPred_KEGG-Download.R

picrust_rarefied_data.sh

37_FunPred_Compute-KO-per-Sample-Abundances.R

picrust_pathway_to_ko.sh

38_FunPred_Pathway_Species_.R

39_FunPred_Compute-KO-per-ASV-Abundances.R

40_FunPred_Upset_Sinularia.R

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages