-
+
+
Put it in the book, make it clear,
+mia framework’s waiting here.
+
OMA’s guide will light the way,
+helping you every step of the day.
+
+
![TreeSummarizedExperiment class]()
diff --git a/docs/oma_files/figure-revealjs/bioc_packages-1.png b/docs/oma_files/figure-revealjs/bioc_packages-1.png
index 8bcf49e..1b1b84b 100644
Binary files a/docs/oma_files/figure-revealjs/bioc_packages-1.png and b/docs/oma_files/figure-revealjs/bioc_packages-1.png differ
diff --git a/docs/oma_files/figure-revealjs/data_and_data_container-1.png b/docs/oma_files/figure-revealjs/data_and_data_container-1.png
index 02d7fb5..74e43d4 100644
Binary files a/docs/oma_files/figure-revealjs/data_and_data_container-1.png and b/docs/oma_files/figure-revealjs/data_and_data_container-1.png differ
diff --git a/docs/oma_files/figure-revealjs/data_container-1.png b/docs/oma_files/figure-revealjs/data_container-1.png
index ad13779..45dc29f 100644
Binary files a/docs/oma_files/figure-revealjs/data_container-1.png and b/docs/oma_files/figure-revealjs/data_container-1.png differ
diff --git a/docs/oma_files/figure-revealjs/pages_and_book-1.png b/docs/oma_files/figure-revealjs/pages_and_book-1.png
index a3f93ab..da63e04 100644
Binary files a/docs/oma_files/figure-revealjs/pages_and_book-1.png and b/docs/oma_files/figure-revealjs/pages_and_book-1.png differ
diff --git a/docs/oma_files/figure-revealjs/se_field-1.png b/docs/oma_files/figure-revealjs/se_field-1.png
index 7d697ed..7da11ca 100644
Binary files a/docs/oma_files/figure-revealjs/se_field-1.png and b/docs/oma_files/figure-revealjs/se_field-1.png differ
diff --git a/docs/oma_files/figure-revealjs/show_alpha-1.png b/docs/oma_files/figure-revealjs/show_alpha-1.png
deleted file mode 100644
index 1087f7a..0000000
Binary files a/docs/oma_files/figure-revealjs/show_alpha-1.png and /dev/null differ
diff --git a/docs/oma_files/figure-revealjs/show_prevalence-1.png b/docs/oma_files/figure-revealjs/show_prevalence-1.png
deleted file mode 100644
index 66c9244..0000000
Binary files a/docs/oma_files/figure-revealjs/show_prevalence-1.png and /dev/null differ
diff --git a/docs/oma_files/figure-revealjs/sow_pcoa-1.png b/docs/oma_files/figure-revealjs/sow_pcoa-1.png
deleted file mode 100644
index b610b85..0000000
Binary files a/docs/oma_files/figure-revealjs/sow_pcoa-1.png and /dev/null differ
diff --git a/docs/pcoa_files/figure-revealjs/pca-load-1.png b/docs/pcoa_files/figure-revealjs/pca-load-1.png
index a8c631e..38cc75c 100644
Binary files a/docs/pcoa_files/figure-revealjs/pca-load-1.png and b/docs/pcoa_files/figure-revealjs/pca-load-1.png differ
diff --git a/docs/quarto.html b/docs/quarto.html
index 797b06f..a2ee9c4 100644
--- a/docs/quarto.html
+++ b/docs/quarto.html
@@ -516,7 +516,7 @@
Example 2.3: YAML Parameters
editor: visual
smaller: true
author: Escherichia coli
-
date: 2024-11-24
+
date: 2024-11-27
---
diff --git a/docs/search.json b/docs/search.json
index 41c947e..4e1ff0e 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -35,312 +35,207 @@
"text": "RDA plot with weights\n\n\n\n\n\n\n\n\nFigure 3: RDA plot of samples coloured by patient status. The arrows indicate the percentage of variance in beta diversity explained by the patient status or cohort and the respective p-value."
},
{
- "objectID": "oma.html#outline",
- "href": "oma.html#outline",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Outline",
- "text": "Outline"
+ "objectID": "data_manipulation.html#why-data-manipulation",
+ "href": "data_manipulation.html#why-data-manipulation",
+ "title": "Data Manipulation",
+ "section": "Why data manipulation?",
+ "text": "Why data manipulation?\nRaw data might be uninformative or incompatible with a method. We want to be able to modify, polish, subset, agglomerate and transform it."
},
{
- "objectID": "oma.html#bioconductor",
- "href": "oma.html#bioconductor",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Bioconductor",
- "text": "Bioconductor\n\nCommunity-driven open-source project\n\n\nTraining programs & workshops\nConferences & community support\nBioinformatics software"
+ "objectID": "data_manipulation.html#why-so-complex",
+ "href": "data_manipulation.html#why-so-complex",
+ "title": "Data Manipulation",
+ "section": "Why so complex?",
+ "text": "Why so complex?\nTreeSE containers organise information to improve flexibility and accessibility, which comes with a bit of complexity. Focus on assays, colData and rowData."
},
{
- "objectID": "oma.html#software",
- "href": "oma.html#software",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Software",
- "text": "Software\n\n~2,300 R packages\nReview, testing, documentation\nGenomics, transcriptomics, microbiomics, …"
+ "objectID": "data_manipulation.html#example-1.1-data-import",
+ "href": "data_manipulation.html#example-1.1-data-import",
+ "title": "Data Manipulation",
+ "section": "Example 1.1: Data Import",
+ "text": "Example 1.1: Data Import\nWe work with microbiome data inside TreeSummarizedExperiment (TreeSE) containers and mia is our toolkit.\n\n# Load Tengeler2020 and store it into a TreeSE\nlibrary(mia)\ndata(\"Tengeler2020\", package = \"mia\")\ntse <- Tengeler2020\n\nThe components of a TreeSE can all be seen at a glance.\n\n# Print TreeSE\ntse\n\nclass: TreeSummarizedExperiment \ndim: 151 27 \nmetadata(0):\nassays(1): counts\nrownames(151): Bacteroides Bacteroides_1 ... Parabacteroides_8\n Unidentified_Lachnospiraceae_14\nrowData names(6): Kingdom Phylum ... Family Genus\ncolnames(27): A110 A12 ... A35 A38\ncolData names(4): patient_status cohort patient_status_vs_cohort\n sample_name\nreducedDimNames(0):\nmainExpName: NULL\naltExpNames(0):\nrowLinks: a LinkDataFrame (151 rows)\nrowTree: 1 phylo tree(s) (151 leaves)\ncolLinks: NULL\ncolTree: NULL"
},
{
- "objectID": "oma.html#data-containers-form-the-foundation",
- "href": "oma.html#data-containers-form-the-foundation",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Data containers form the foundation",
- "text": "Data containers form the foundation"
+ "objectID": "data_manipulation.html#example-1.2-column-data",
+ "href": "data_manipulation.html#example-1.2-column-data",
+ "title": "Data Manipulation",
+ "section": "Example 1.2: Column data",
+ "text": "Example 1.2: Column data\nColumns represent the samples of an experiment.\n\n# Retrieve sample names\nhead(colnames(tse), 3)\n\n[1] \"A110\" \"A12\" \"A15\" \n\n\nAll information about the samples is stored in colData.\n\n# Retrieve sample data\nhead(colData(tse), 3)\n\nDataFrame with 3 rows and 4 columns\n patient_status cohort patient_status_vs_cohort sample_name\n <character> <character> <character> <character>\nA110 ADHD Cohort_1 ADHD_Cohort_1 A110\nA12 ADHD Cohort_1 ADHD_Cohort_1 A12\nA15 ADHD Cohort_1 ADHD_Cohort_1 A15\n\n\nIndividual variables about the samples can be accessed directly.\n\n# Retrieve sample variables\nhead(tse$patient_status, 3)\n\n[1] \"ADHD\" \"ADHD\" \"ADHD\""
},
{
- "objectID": "oma.html#summarizedexperiment",
- "href": "oma.html#summarizedexperiment",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "SummarizedExperiment",
- "text": "SummarizedExperiment\n\nMost common data container\nOptimized for biological data\nExtended to different purposes"
+ "objectID": "data_manipulation.html#example-1.3-row-data",
+ "href": "data_manipulation.html#example-1.3-row-data",
+ "title": "Data Manipulation",
+ "section": "Example 1.3: Row data",
+ "text": "Example 1.3: Row data\nRows represent the features of an experiment.\n\n# Retrieve feature names\nhead(rownames(tse), 3)\n\n[1] \"Bacteroides\" \"Bacteroides_1\" \"Parabacteroides\"\n\n\nAll information about the samples is stored in rowData.\n\n# Retrieve feature data\nhead(rowData(tse), 3)\n\nDataFrame with 3 rows and 6 columns\n Kingdom Phylum Class Order\n <character> <character> <character> <character>\nBacteroides Bacteria Bacteroidetes Bacteroidia Bacteroidales\nBacteroides_1 Bacteria Bacteroidetes Bacteroidia Bacteroidales\nParabacteroides Bacteria Bacteroidetes Bacteroidia Bacteroidales\n Family Genus\n <character> <character>\nBacteroides Bacteroidaceae Bacteroides\nBacteroides_1 Bacteroidaceae Bacteroides\nParabacteroides Porphyromonadaceae Parabacteroides\n\n\nIndividual variables about the samples can be accessed from rowData.\n\n# Retrieve feature variables\nhead(rowData(tse)$Genus, 3)\n\n[1] \"Bacteroides\" \"Bacteroides\" \"Parabacteroides\""
},
{
- "objectID": "oma.html#optimal-container-for-microbiome-data",
- "href": "oma.html#optimal-container-for-microbiome-data",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Optimal container for microbiome data?",
- "text": "Optimal container for microbiome data?"
+ "objectID": "data_manipulation.html#example-1.4-assays",
+ "href": "data_manipulation.html#example-1.4-assays",
+ "title": "Data Manipulation",
+ "section": "Example 1.4: Assays",
+ "text": "Example 1.4: Assays\nThe assays of an experiment (counts, relative abundance, etc.) can be found in assays.\n\nassays(tse)\n\nList of length 1\nnames(1): counts\n\n\nassayNames return only their names.\n\nassayNames(tse)\n\n[1] \"counts\"\n\n\nAn individual assay can be retrieved with assay.\n\nassay(tse, \"counts\")[seq(6), seq(6)]\n\n A110 A12 A15 A19 A21 A23\nBacteroides 17722 11630 0 8806 1740 1791\nBacteroides_1 12052 0 2679 2776 540 229\nParabacteroides 0 970 0 549 145 0\nBacteroides_2 0 1911 0 5497 659 0\nAkkermansia 1143 1891 1212 584 84 700\nBacteroides_3 0 6498 0 4455 610 0"
},
{
- "objectID": "oma.html#optimal-container-for-microbiome-data-1",
- "href": "oma.html#optimal-container-for-microbiome-data-1",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Optimal container for microbiome data?",
- "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking"
+ "objectID": "data_manipulation.html#exercise-1",
+ "href": "data_manipulation.html#exercise-1",
+ "title": "Data Manipulation",
+ "section": "Exercise 1",
+ "text": "Exercise 1\n\npreliminary exploration: exercise 3.3\nassay retrieval: exercise 3.4\n\nExtra:\n\nconstructing a TreeSE object: exercise 3.1\n\nRaw data can be retrieved here."
},
{
- "objectID": "oma.html#optimal-container-for-microbiome-data-2",
- "href": "oma.html#optimal-container-for-microbiome-data-2",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Optimal container for microbiome data?",
- "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features"
+ "objectID": "data_manipulation.html#example-2.1-subsetting",
+ "href": "data_manipulation.html#example-2.1-subsetting",
+ "title": "Data Manipulation",
+ "section": "Example 2.1: Subsetting",
+ "text": "Example 2.1: Subsetting\nWe can subset features or samples of a TreeSE, but first we need to pick a variable.\n\n# Check levels of a sample variable\nunique(tse$patient_status)\n\n[1] \"ADHD\" \"Control\"\n\n\nTo subset samples, we filter columns with a conditional.\n\n# Subset by a sample variable\nsubcol_tse <- tse[ , tse$patient_status == \"ADHD\"]\ndim(subcol_tse)\n\n[1] 151 13\n\n\nWe now want to subset by our favourite Phylum.\n\n# Check levels of a feature variable\nunique(rowData(tse)$Phylum)\n\n[1] \"Bacteroidetes\" \"Verrucomicrobia\" \"Proteobacteria\" \"Firmicutes\" \n[5] \"Cyanobacteria\" \n\n\nTo subset features, we filter rows with a conditional.\n\n# Subset by a feature variable\nsubrow_tse <- tse[rowData(tse)$Phylum == \"Firmicutes\", ]\ndim(subrow_tse)\n\n[1] 97 27"
},
{
- "objectID": "oma.html#optimal-container-for-microbiome-data-3",
- "href": "oma.html#optimal-container-for-microbiome-data-3",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Optimal container for microbiome data?",
- "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features\nSide information: extended capabilities & data types"
+ "objectID": "data_manipulation.html#example-2.2-agglomeration",
+ "href": "data_manipulation.html#example-2.2-agglomeration",
+ "title": "Data Manipulation",
+ "section": "Example 2.2: Agglomeration",
+ "text": "Example 2.2: Agglomeration\nAgglomeration condenses the assays to higher taxonomic ranks. Related taxa are combined together. We can agglomerate by different ranks.\n\n# View rank options\ntaxonomyRanks(tse)\n\n[1] \"Kingdom\" \"Phylum\" \"Class\" \"Order\" \"Family\" \"Genus\" \n\n\nWe agglomerate by Phylum and store the new experiment in the altExp slot.\n\n# Agglomerate by Phylum and store into altExp slot\naltExp(tse, \"phylum\") <- agglomerateByRank(tse, rank = \"Phylum\")\naltExp(tse, \"phylum\")\n\nclass: TreeSummarizedExperiment \ndim: 5 27 \nmetadata(1): agglomerated_by_rank\nassays(1): counts\nrownames(5): Bacteroidetes Cyanobacteria Firmicutes Proteobacteria\n Verrucomicrobia\nrowData names(6): Kingdom Phylum ... Family Genus\ncolnames(27): A110 A12 ... A35 A38\ncolData names(4): patient_status cohort patient_status_vs_cohort\n sample_name\nreducedDimNames(0):\nmainExpName: NULL\naltExpNames(0):\nrowLinks: a LinkDataFrame (5 rows)\nrowTree: 1 phylo tree(s) (151 leaves)\ncolLinks: NULL\ncolTree: NULL"
},
{
- "objectID": "oma.html#optimal-container-for-microbiome-data-4",
- "href": "oma.html#optimal-container-for-microbiome-data-4",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Optimal container for microbiome data?",
- "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features\nSide information: extended capabilities & data types\nOptimized: for speed & memory"
+ "objectID": "data_manipulation.html#example-2.3-transformation",
+ "href": "data_manipulation.html#example-2.3-transformation",
+ "title": "Data Manipulation",
+ "section": "Example 2.3: Transformation",
+ "text": "Example 2.3: Transformation\nData can be transformed for different reasons. For example, to make samples comparable we can use relative abundance.\n\n# Transform counts to relative abundance\ntse <- transformAssay(tse,\n assay.type = \"counts\",\n method = \"relabundance\")\n\n# View sample-wise sums\nhead(colSums(assay(tse, \"relabundance\")), 3)\n\nA110 A12 A15 \n 1 1 1 \n\n\nOr to standardise features to the normal distribution we can use z-scores: \\(Z = \\frac{x - \\mu}{\\sigma}\\).\n\n# Transform relative abundance to z-scores\ntse <- transformAssay(tse,\n assay.type = \"relabundance\",\n method = \"z\",\n MARGIN = \"features\")\n\n# View feature-wise standard deviations\nhead(rowSds(assay(tse, \"z\")), 3)\n\n Bacteroides Bacteroides_1 Parabacteroides \n 1 1 1"
},
{
- "objectID": "oma.html#optimal-container-for-microbiome-data-5",
- "href": "oma.html#optimal-container-for-microbiome-data-5",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Optimal container for microbiome data?",
- "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features\nSide information: extended capabilities & data types\nOptimized: for speed & memory\nIntegrated: with other applications & frameworks"
+ "objectID": "data_manipulation.html#exercise-2",
+ "href": "data_manipulation.html#exercise-2",
+ "title": "Data Manipulation",
+ "section": "Exercise 2",
+ "text": "Exercise 2\n\nsubsetting: exercise 4.1\nagglomeration: exercise 5.1\ntransformation: exercise 4.6\n\nExtra:\n\nprevalence subsetting: exercise 4.3\nalternative experiments: exercise 5.2"
},
{
- "objectID": "oma.html#optimal-container-for-microbiome-data-6",
- "href": "oma.html#optimal-container-for-microbiome-data-6",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Optimal container for microbiome data?",
- "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features\nSide information: extended capabilities & data types\nOptimized: for speed & memory\nIntegrated: with other applications & frameworks\n\nReduce overlapping efforts, improve interoperability, ensure sustainability."
+ "objectID": "data_manipulation.html#resources",
+ "href": "data_manipulation.html#resources",
+ "title": "Data Manipulation",
+ "section": "Resources",
+ "text": "Resources\n\nmia function reference\nOMA Section - Data Containers\nOMA Section - Subsetting\nOMA Section - Agglomeration\nOMA Section - Transformation"
},
{
- "objectID": "oma.html#treesummarizedexperiment",
- "href": "oma.html#treesummarizedexperiment",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "TreeSummarizedExperiment",
- "text": "TreeSummarizedExperiment\n\nExtension to SummarizedExperiment\nOptimal for microbiome data\nAllows us to develop modular and efficient workflows"
+ "objectID": "compositional_heatmap.html#example-1.1",
+ "href": "compositional_heatmap.html#example-1.1",
+ "title": "Compositional Heatmaps",
+ "section": "Example 1.1",
+ "text": "Example 1.1\nWe first import the packages used in this tutorial.\n\n# Import libraries\nlibrary(mia)\nlibrary(ComplexHeatmap)\n\nWe also import Tengeler2020 from the mia package and store it into a variable.\n\n# Load dataset and store it into tse\ndata(\"Tengeler2020\", package = \"mia\")\ntse <- Tengeler2020\n\nNext, we transform the counts assay to relative abundance assay and store it into the TreeSE.\n\n# Transform counts to relative abundances\ntse <- transformAssay(tse, method = \"relabundance\")\n\nThen, we agglomerate the experiment to the order level, so that information is more condensed and therefore easier to visualise and interpret.\n\n# Agglomerate by order\ntse_order <- agglomerateByRank(tse, rank = \"Order\")"
},
{
- "objectID": "oma.html#microbiome-analysis-mia",
- "href": "oma.html#microbiome-analysis-mia",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "MIcrobiome Analysis (mia)",
- "text": "MIcrobiome Analysis (mia)\n\nMicrobiome data science in SummarizedExperiment ecosystem\nDistributed through several R packages\nmia package top 7.6% Bioconductor downloads"
+ "objectID": "compositional_heatmap.html#why-relative-abundances",
+ "href": "compositional_heatmap.html#why-relative-abundances",
+ "title": "Compositional Heatmaps",
+ "section": "Why relative abundances?",
+ "text": "Why relative abundances?\nMicrobiome data is compositional. Relative abundance helps us draw less biased comparisons between samples.\n\n\nShow code\n# Import packages\nlibrary(miaViz)\nlibrary(patchwork)\n\n# Plot composition by counts\ncounts_bar <- plotAbundance(tse_order, rank = \"Phylum\", use_relative = FALSE) +\n ylab(\"Counts\")\n\n# Plot composition by relative abundance\nrelab_bar <- plotAbundance(tse_order, rank = \"Phylum\", use_relative = TRUE) +\n ylab(\"Relative Abundance\")\n\n# Combine plots\n(counts_bar | relab_bar) +\n plot_layout(guides = \"collect\")\n\n\n\n\nFigure 1: Sample composition by counts (left) or relative abundance (right)."
},
{
- "objectID": "oma.html#community-driven-ecosystem-of-tools",
- "href": "oma.html#community-driven-ecosystem-of-tools",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Community-driven ecosystem of tools",
- "text": "Community-driven ecosystem of tools\n\n\n\nmia (Data analysis)\nmiaViz (Visualization)\nmiaSim (Simulation)\nmiaTime (Time series analysis)\nmiaDash (Graphical user interface)\niSEEtree (Interactive visualization)\nExpanded by independent developers"
+ "objectID": "compositional_heatmap.html#example-1.2",
+ "href": "compositional_heatmap.html#example-1.2",
+ "title": "Compositional Heatmaps",
+ "section": "Example 1.2",
+ "text": "Example 1.2\nTo reduce data skewness, we further transform the relative abundance assay with the Centered-Log Ratio (CLR), which is defined as follows:\n\\[\nclr = log \\frac{x}{g(x)} = log(x)−log[g(x)]\n\\]\nwhere x is a feature and g(x) is the geometric mean of all features in a sample.\n\n# Transform relative abundances to clr\ntse_order <- transformAssay(tse_order,\n assay.type = \"relabundance\",\n method = \"clr\",\n pseudocount = 1,\n MARGIN = \"samples\")\n\nLastly, we get the row-wise z-scores of every feature from the clr assay to standardise abundances across samples.\n\n# Transform clr to z\ntse_order <- transformAssay(tse_order,\n assay.type = \"clr\", \n method = \"z\",\n name = \"clr_z\",\n MARGIN = \"features\")"
},
{
- "objectID": "oma.html#advantages",
- "href": "oma.html#advantages",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Advantages",
- "text": "Advantages\n\nShared data container\nScalable & optimized for large datasets\nComprehensive documentation"
+ "objectID": "compositional_heatmap.html#example-1.3",
+ "href": "compositional_heatmap.html#example-1.3",
+ "title": "Compositional Heatmaps",
+ "section": "Example 1.3",
+ "text": "Example 1.3\nFinally, we visualise the clr-z assay with ComplexHeatmap.\n\n# Visualise clr-z assay with a heatmap\nclrz_hm <- Heatmap(assay(tse_order, \"clr_z\"), name = \"clr-z\")\nclrz_hm\n\n\n\nFigure 2: Heatmap of CLR-Z assay where columns correspond to samples and rows to taxa agglomerated by order."
},
{
- "objectID": "oma.html#section-5",
- "href": "oma.html#section-5",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "",
- "text": "# Load package\nlibrary(mia)\n# Load example dataset\ndata(\"peerj13075\")\ntse <- peerj13075\n\n\nclass: TreeSummarizedExperiment \ndim: 674 58 \nmetadata(0):\nassays(1): counts\nrownames(674): OTU1 OTU2 ... OTU2567 OTU2569\nrowData names(6): kingdom phylum ... family genus\ncolnames(58): ID1 ID2 ... ID57 ID58\ncolData names(5): Sample Geographical_location Gender Age Diet\nreducedDimNames(0):\nmainExpName: NULL\naltExpNames(0):\nrowLinks: NULL\nrowTree: NULL\ncolLinks: NULL\ncolTree: NULL"
+ "objectID": "compositional_heatmap.html#why-clr-z-transformation",
+ "href": "compositional_heatmap.html#why-clr-z-transformation",
+ "title": "Compositional Heatmaps",
+ "section": "Why clr-z transformation?",
+ "text": "Why clr-z transformation?\nA CLR-z transformation improves comparability in two steps:\n\nApply CLR transform to center features column-wise\nFind Z score to standardise features row-wise\n\n\n\nFigure 3: Visual comparison between counts, relative abundance, clr and clr-z assays (from left to right)."
},
{
- "objectID": "oma.html#section-6",
- "href": "oma.html#section-6",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "",
- "text": "# Agglomerate to genus level\ntse <- agglomerateByRank(tse, rank = \"genus\")"
+ "objectID": "compositional_heatmap.html#exercise-1",
+ "href": "compositional_heatmap.html#exercise-1",
+ "title": "Compositional Heatmaps",
+ "section": "Exercise 1",
+ "text": "Exercise 1\n\nheatmap visualisation: exercise 9.2\n\nExtra:\n\nadvanced heatmap: exercise 9.3"
},
{
- "objectID": "oma.html#section-7",
- "href": "oma.html#section-7",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "",
- "text": "# Agglomerate to genus level\ntse <- agglomerateByRank(tse, rank = \"genus\")\n\n# Add relative abundances\ntse <- transformAssay(tse, method = \"relabundance\")"
+ "objectID": "compositional_heatmap.html#resources",
+ "href": "compositional_heatmap.html#resources",
+ "title": "Compositional Heatmaps",
+ "section": "Resources",
+ "text": "Resources\n\nOMA Chapter - Community Composition\nOMA Chapter - Visualisation\nComplexHeatmap Complete Reference"
},
{
- "objectID": "oma.html#section-8",
- "href": "oma.html#section-8",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "",
- "text": "# Load visualization package\nlibrary(miaViz)\n# Summarize abundance of top taxa\nplotAbundanceDensity(tse, assay.type = \"relabundance\")"
+ "objectID": "differential_abundance.html#overview",
+ "href": "differential_abundance.html#overview",
+ "title": "Differential Abundance",
+ "section": "Overview",
+ "text": "Overview\nDifferential Abundance (DA) analysis is used to identify taxa that are significantly more or less abundant in the condition compared to control.\nMany methods are available including:\n\nALDEx2\nANCOMBC\nLinDA\n\nA few things to keep in minds when performing DAA involve:\n\nDAA software normally takes the counts assay as input, because they apply normalisation suitable for count data\nDAA results will be more reproducible if the extremely rare taxa and singletons are removed in advance\nIt is recommended to run different methods on the same data and compare the results"
},
{
- "objectID": "oma.html#section-9",
- "href": "oma.html#section-9",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "",
- "text": "# Calculate alpha diversity indices\ntse <- addAlpha(tse, index = \"shannon\")"
+ "objectID": "differential_abundance.html#example-1.1-preparing-for-da",
+ "href": "differential_abundance.html#example-1.1-preparing-for-da",
+ "title": "Differential Abundance",
+ "section": "Example 1.1: Preparing for DA",
+ "text": "Example 1.1: Preparing for DA\nFirst, we import Tengeler2020 and load the DA library MicrobiomeStats.\n\nlibrary(mia)\nlibrary(MicrobiomeStat)\nlibrary(tidyverse)\n\n# Import Tengeler2020\ndata(\"Tengeler2020\", package = \"mia\")\ntse <- Tengeler2020\n\n\n\nShow code\nmean_abund <- round(mean(rowMeans(assay(tse, \"counts\"))), 2)\npaste0(\"Taxa: \", nrow(tse), \", Mean abundance: \", mean_abund)\n\n\n[1] \"Taxa: 151, Mean abundance: 119.19\"\n\n\nFor DA analysis, it is preferable to reduce the dimensionality and sparsity of the data.\n\n# Agglomerate by Genus and filter by prevalence and detection\ntse_genus <- agglomerateByPrevalence(tse,\n rank = \"Genus\",\n detection = 0.001,\n prevalence = 0.1)\n\n\n\nShow code\nmean_abund_genus <- round(mean(rowMeans(assay(tse_genus, \"counts\"))), 2)\npaste0(\"Taxa: \", nrow(tse_genus), \", Mean abundance: \", mean_abund_genus)\n\n\n[1] \"Taxa: 49, Mean abundance: 355.52\""
},
{
- "objectID": "oma.html#section-10",
- "href": "oma.html#section-10",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "",
- "text": "# Calculate alpha diversity indices\ntse <- addAlpha(tse, index = \"shannon\")\n\n# Load single-cell analysis package that has useful, complementary tools\nlibrary(scater)\n# Plot alpha diversity\nplotColData(tse, x = \"Geographical_location\", y = \"shannon\")"
+ "objectID": "differential_abundance.html#example-1.2-performing-da",
+ "href": "differential_abundance.html#example-1.2-performing-da",
+ "title": "Differential Abundance",
+ "section": "Example 1.2: Performing DA",
+ "text": "Example 1.2: Performing DA\nHere, we run LinDA. We first extract the counts assay and convert it into a dataframe.\n\notu.tab <- assay(tse_genus, \"counts\") |>\n as.data.frame()\n\nWe also need to select the columns of the colData which contain the independent variables you want to include in the model.\n\nmeta <- colData(tse) |>\n as.data.frame() |>\n select(patient_status, cohort)\n\nWe are ready to run LinDA, which takes the assay count (otu.tab) and the variable arrays (meta). A formula for the model with main independent variable + covariates should be defined. The other arguments are optional but good to know.\n\nres <- linda(otu.tab, meta,\n formula = \"~ patient_status + cohort\", \n feature.dat.type = \"count\")\n\n0 features are filtered!\nThe filtered data has 27 samples and 49 features will be tested!\nImputation approach is used.\nFit linear models ...\nCompleted."
},
{
- "objectID": "oma.html#section-11",
- "href": "oma.html#section-11",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "",
- "text": "# Perform PCoA\ntse <- runMDS(tse, assay.type = \"relabundance\", FUN = getDissimilarity, method = \"bray\")"
+ "objectID": "differential_abundance.html#example-1.3-interpreting-results",
+ "href": "differential_abundance.html#example-1.3-interpreting-results",
+ "title": "Differential Abundance",
+ "section": "Example 1.3: Interpreting Results",
+ "text": "Example 1.3: Interpreting Results\nFinally, we select significantly DA taxa and list it in Table 1.\n\nsignif_res <- res$output$patient_statusControl |>\n filter(reject) |>\n select(stat, padj) |>\n arrange(padj)\n\nknitr::kable(signif_res)\n\n\n\nTable 1: DA bacterial genera. If stat > 0, abundance is higher in control, otherwise it is higher in ADHD.\n\n\n\n\n\n\n\nstat\npadj\n\n\n\n\n[Ruminococcus]_gauvreauii_group\n4.891159\n0.0024419\n\n\nFaecalibacterium\n-4.694520\n0.0024419\n\n\nCatabacter\n-3.616601\n0.0236808\n\n\nErysipelatoclostridium\n3.357042\n0.0334163\n\n\nRuminococcaceae_UCG-014\n-3.224143\n0.0368033"
},
{
- "objectID": "oma.html#section-12",
- "href": "oma.html#section-12",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "",
- "text": "# Perform PCoA\ntse <- runMDS(tse, assay.type = \"relabundance\", FUN = getDissimilarity, method = \"bray\")\n# Plot PCoA\nplotReducedDim(tse, dimred = \"MDS\", colour_by = \"Geographical_location\")"
+ "objectID": "differential_abundance.html#exercise-1",
+ "href": "differential_abundance.html#exercise-1",
+ "title": "Differential Abundance",
+ "section": "Exercise 1",
+ "text": "Exercise 1\n\nDA analysis with LinDA: exercise 8.2\nDA analysis with ALDEx2: exercise 8.1\n\nExtra:\n\ncomparing DA methods: exercise 8.3"
},
{
- "objectID": "oma.html#orchestrating-microbiome-analysis-with-bioconductor",
- "href": "oma.html#orchestrating-microbiome-analysis-with-bioconductor",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Orchestrating Microbiome Analysis with Bioconductor",
- "text": "Orchestrating Microbiome Analysis with Bioconductor\n\nResources and tutorials for microbiome analysis\nCommunity-built best practices\nOpen to contributions!\n\n\n\n\n\n\n\nGo to the OMA book\n\n\nmicrobiome.github.io/OMA"
+ "objectID": "differential_abundance.html#resources",
+ "href": "differential_abundance.html#resources",
+ "title": "Differential Abundance",
+ "section": "Resources",
+ "text": "Resources\n\nOMA Chapter - Differential Abundance"
},
{
- "objectID": "oma.html#summary",
- "href": "oma.html#summary",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Summary",
- "text": "Summary\n\n\n\nData containers in Bioconductor\nmia framework for microbiome data science\nOrchestrating Microbiome Analysis (OMA) online book"
+ "objectID": "alpha_diversity.html#overview",
+ "href": "alpha_diversity.html#overview",
+ "title": "Alpha Diversity",
+ "section": "Overview",
+ "text": "Overview\nThis notebook guides you through a basic alpha diversity analysis, where you first estimate alpha diversity in terms of a few indices, plot them for the different study groups and compare the results for the different indices.\nThe following packages are needed to succesfully run the examples in this notebook:\n\nlibrary(mia)\nlibrary(scater)"
},
{
- "objectID": "oma.html#thank-you-for-your-time",
- "href": "oma.html#thank-you-for-your-time",
- "title": "Orchestrating Microbiome Analysis with Bioconductor",
- "section": "Thank you for your time!",
- "text": "Thank you for your time!\n\n\n\n\n\nMoreno-Indias et al. (2021) Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions. Frontiers in Microbiology.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nhttps://microbiome.github.io/"
+ "objectID": "alpha_diversity.html#example-1.1-estimation",
+ "href": "alpha_diversity.html#example-1.1-estimation",
+ "title": "Alpha Diversity",
+ "section": "Example 1.1: Estimation",
+ "text": "Example 1.1: Estimation\nFirst of all, we import Tengeler2020 from the mia package and store it into a variable.\n\n# load dataset and store it into tse\ndata(\"Tengeler2020\", package = \"mia\")\ntse <- Tengeler2020\n\nWe calculate alpha diversity in terms of coverage, Shannon, inverse Simpson and Faith indices based on the counts assay. The first three indices differ from one another in how much weight they give to rare taxa: coverage considers all taxa equally important, whereas Shannon and - even more - Simpson give more importance to abundant taxa. Unlike all others, Faith index measures the phylogenetic diversity and thus requires a phylogenetic tree (stored as rowTree in the TreeSE).\n\n# estimate the specified indices based on the counts assay\ntse <- estimateDiversity(tse,\n assay.type = \"counts\",\n index = \"shannon\")"
},
{
- "objectID": "hintikkaxo_presentation.html#study-design",
- "href": "hintikkaxo_presentation.html#study-design",
- "title": "A brief introduction to HintikkaXOData",
- "section": "Study design",
- "text": "Study design\n\nHintikkaXOData is a MultiAssayExperiment containing microbiome, metabolome and biomarker data from a study on the effects of diet and prebiotics on fatty-liver disease in rats (Hintikka et al. 2021).\n\n\nExperimentList class object of length 3:\n [1] microbiota: TreeSummarizedExperiment with 12706 rows and 40 columns\n [2] metabolites: TreeSummarizedExperiment with 38 rows and 40 columns\n [3] biomarkers: TreeSummarizedExperiment with 39 rows and 40 columns\n\n\n\n\nThe rat population (N = 40) was divided into 4 groups (n = 10), which underwent different diets over a period of 12 weeks. The 4 groups included:\n\n\n\nhigh-fat diet without prebiotics\nhigh-fat diet with prebiotics\nlow-fat diet without prebiotics\nlow-fat diet with prebiotics\n\n\n\n\n\n\n\n\nXOS+\nXOS-\n\n\n\n\nLow Fat\n10\n10\n\n\nHigh Fat\n10\n10\n\n\n\n\n\n\n\nXOS: xylooligosaccharide"
+ "objectID": "alpha_diversity.html#example-1.2-visualisation",
+ "href": "alpha_diversity.html#example-1.2-visualisation",
+ "title": "Alpha Diversity",
+ "section": "Example 1.2: Visualisation",
+ "text": "Example 1.2: Visualisation\nNext, we plot the four indices, with patient status on the x axis and alpha diversity on the y axis. We can also colour by cohort to check for batch effects.\n\n# Plot shannon diversity vs patient_status\np_shannon <- plotColData(tse, \"shannon\", \"patient_status\",\n colour_by = \"cohort\", show_median = TRUE) +\n xlab(\"Patient Status\")\n\np_shannon"
},
{
- "objectID": "hintikkaxo_presentation.html#microbiota",
- "href": "hintikkaxo_presentation.html#microbiota",
- "title": "A brief introduction to HintikkaXOData",
- "section": "Microbiota",
- "text": "Microbiota\n\nMicrobiome data was obtained by 16S rRNA gene sequencing of bacterial DNA sampled from the cecum of terminated rats. Then, sequence reads were assembled into Operational Taxonomic Units (OTUs).\n\n\n\n\n\n\n\n\n\n\nRelative Abundance (%)\n\n\n\n\nBacteroidetes\n51.3\n\n\nFirmicutes\n39.3\n\n\nVerrucomicrobia\n3.8\n\n\nCyanobacteria\n2.6\n\n\nProteobacteria\n2.1\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 1"
- },
- {
- "objectID": "hintikkaxo_presentation.html#metabolites",
- "href": "hintikkaxo_presentation.html#metabolites",
- "title": "A brief introduction to HintikkaXOData",
- "section": "Metabolites",
- "text": "Metabolites\nMetabolome data was obtained from the same cecal samples by nuclear magnetic resonance (NMR) spectroscopy. There is a total of 38 features, corresponding to the different metabolites."
- },
- {
- "objectID": "hintikkaxo_presentation.html#biomarkers",
- "href": "hintikkaxo_presentation.html#biomarkers",
- "title": "A brief introduction to HintikkaXOData",
- "section": "Biomarkers",
- "text": "Biomarkers\n\nBiomarkers data were obtained by Western blot from protein homogenates of rat tissues. There is a total of 39 features, corresponding to the biomarkers/site combinations.\n\n\n\n\n\n\n\n\n\nBiomarker\nFunction\n\n\n\n\nCS\ncitrate synthase\n\n\nAST\naspartate aminotransferase\n\n\nALT\nalanine aminotransferase\n\n\nERK\nextracellular signal-regulated kinase\n\n\nIRS1\ninsulin receptor substrate 1\n\n\nAKT\nprotein kinase B\n\n\nIL1b\ninterleukin 1 beta\n\n\nHAD\n3-hydroxyacyl-CoA dehydrogenase 8\n\n\nCD45\nprotein tyrosine phosphatase\n\n\n\n\n\n\n\n\n\n\n\nName\nSite\n\n\n\n\nepi\nepididymal adipose tissue\n\n\nSCfat\nmesenteric adopose tissue\n\n\nmese\nsubcutaneous adipose tissue\n\n\nliver\nliver\n\n\ngastro\ngastrocnemius muscle\n\n\nserum\nserum"
- },
- {
- "objectID": "hintikkaxo_presentation.html#references",
- "href": "hintikkaxo_presentation.html#references",
- "title": "A brief introduction to HintikkaXOData",
- "section": "References",
- "text": "References\n\n\n\n\n\n\n\n\nHintikka, Jukka, Sanna Lensu, Elina Mäkinen, Sira Karvinen, Marjaana Honkanen, Jere Lindén, Tim Garrels, Satu Pekkala, and Leo Lahti. 2021. “Xylo-Oligosaccharides in Prevention of Hepatic Steatosis and Adipose Tissue Inflammation: Associating Taxonomic and Metabolomic Patterns in Fecal Microbiomes with Biclustering.” International Journal of Environmental Research and Public Health 18 (8): 4049. https://doi.org/10.3390/ijerph18084049."
- },
- {
- "objectID": "quarto.html#example-1.1-markdown-syntax",
- "href": "quarto.html#example-1.1-markdown-syntax",
- "title": "Reproducible Reporting with Quarto",
- "section": "Example 1.1: # Markdown Syntax",
- "text": "Example 1.1: # Markdown Syntax\nIn Markdown syntax, you can do the following and more.\n\nHeadings: # My title, ## My subtitle, ### My subsubtitle\nUnordered lists:\n\n- item 1\n- item 2\n\nOrdered lists:\n\n1. item 1\n2. item 2\n\nFont: *italic*, **bold**, `monospaced`\nLinks: [text](url)\nCross-references: @chunk-label"
- },
- {
- "objectID": "quarto.html#example-1.2-running-code",
- "href": "quarto.html#example-1.2-running-code",
- "title": "Reproducible Reporting with Quarto",
- "section": "Example 1.2: Running Code",
- "text": "Example 1.2: Running Code\nAdd code chunks with alt + cmd + i and click Render to generate a report with both text and code output.\n\n# R code\ncitation(\"mia\")\n\nTo cite package 'mia' in publications use:\n\n Borman T, Ernst F, Shetty S, Lahti L (2024). _mia: Microbiome\n analysis_. R package version 1.15.3, commit\n 161b2096a2ab2c1791eb02e2ccf3564defe39e40,\n <https://github.com/microbiome/mia>.\n\nA BibTeX entry for LaTeX users is\n\n @Manual{,\n title = {mia: Microbiome analysis},\n author = {Tuomas Borman and Felix G.M. Ernst and Sudarshan A. Shetty and Leo Lahti},\n year = {2024},\n note = {R package version 1.15.3, commit 161b2096a2ab2c1791eb02e2ccf3564defe39e40},\n url = {https://github.com/microbiome/mia},\n }"
- },
- {
- "objectID": "quarto.html#exercise-1",
- "href": "quarto.html#exercise-1",
- "title": "Reproducible Reporting with Quarto",
- "section": "Exercise 1",
- "text": "Exercise 1\n\nnew document: exercise 2.1\ncode chunks: exercise 2.2"
- },
- {
- "objectID": "quarto.html#example-2.1-knitr-options",
- "href": "quarto.html#example-2.1-knitr-options",
- "title": "Reproducible Reporting with Quarto",
- "section": "Example 2.1: knitr options",
- "text": "Example 2.1: knitr options\nYou can add options to code chunks to change their behaviour.\n\nOriginal chunk\n\n\nprint(\"I love my microbiome\")\n\n[1] \"I love my microbiome\"\n\n\n\nAfter adding #| echo: false\n\n\n\n[1] \"I love my microbiome\"\n\n\n\nAfter adding #| eval: false\n\n\nprint(\"I love my microbiome\")\n\n\nAfter adding #| code-fold: true\n\n\n\nShow code\nprint(\"I love my microbiome\")\n\n\n[1] \"I love my microbiome\"\n\n\n\nAfter adding #| include: false"
- },
- {
- "objectID": "quarto.html#example-2.2-more-about-knitr-options",
- "href": "quarto.html#example-2.2-more-about-knitr-options",
- "title": "Reproducible Reporting with Quarto",
- "section": "Example 2.2: More about knitr options",
- "text": "Example 2.2: More about knitr options\nIf you want an option to affect all chunks in a script, you can set it globally.\n\n# Turn off chunk visibility and warnings\nknitr::opts_chunk$set(echo = FALSE, warning = FALSE)\n\nYou can label figures (or tables) with #| label: fig-name and cross-reference them with @fig-name (Figure 1).\n\ndata(iris)\nboxplot(Sepal.Length ~ Species, data = iris)\n\n\n\nFigure 1: A boxplot of the sepal length distribution by species."
- },
- {
- "objectID": "quarto.html#example-2.3-yaml-parameters",
- "href": "quarto.html#example-2.3-yaml-parameters",
- "title": "Reproducible Reporting with Quarto",
- "section": "Example 2.3: YAML Parameters",
- "text": "Example 2.3: YAML Parameters\nAt the beginning of any Quarto document, there is a box delimited by ---. There you can define document metadata, such as title, author, date, output format, bibliography, citation style, theme, font size and many others.\n---\ntitle: “Around the gut in 24 hours”\nformat: html\neditor: visual\nsmaller: true\nauthor: Escherichia coli\ndate: 2024-11-24\n---"
- },
- {
- "objectID": "quarto.html#exercise-2",
- "href": "quarto.html#exercise-2",
- "title": "Reproducible Reporting with Quarto",
- "section": "Exercise 2",
- "text": "Exercise 2\n\nknitr options: exercise 2.3\nYAML parameters: exercise 2.4\n\nExtra:\n\nQuarto parameters: exercise 2.5"
- },
- {
- "objectID": "quarto.html#resources",
- "href": "quarto.html#resources",
- "title": "Reproducible Reporting with Quarto",
- "section": "Resources",
- "text": "Resources\n\nOMA Section - Quarto\nChunk Option Catalogue\nCross References\nYAML Parameter Catalogue"
- },
- {
- "objectID": "alpha_diversity.html#overview",
- "href": "alpha_diversity.html#overview",
- "title": "Alpha Diversity",
- "section": "Overview",
- "text": "Overview\nThis notebook guides you through a basic alpha diversity analysis, where you first estimate alpha diversity in terms of a few indices, plot them for the different study groups and compare the results for the different indices.\nThe following packages are needed to succesfully run the examples in this notebook:\n\nlibrary(mia)\nlibrary(scater)"
- },
- {
- "objectID": "alpha_diversity.html#example-1.1-estimation",
- "href": "alpha_diversity.html#example-1.1-estimation",
- "title": "Alpha Diversity",
- "section": "Example 1.1: Estimation",
- "text": "Example 1.1: Estimation\nFirst of all, we import Tengeler2020 from the mia package and store it into a variable.\n\n# load dataset and store it into tse\ndata(\"Tengeler2020\", package = \"mia\")\ntse <- Tengeler2020\n\nWe calculate alpha diversity in terms of coverage, Shannon, inverse Simpson and Faith indices based on the counts assay. The first three indices differ from one another in how much weight they give to rare taxa: coverage considers all taxa equally important, whereas Shannon and - even more - Simpson give more importance to abundant taxa. Unlike all others, Faith index measures the phylogenetic diversity and thus requires a phylogenetic tree (stored as rowTree in the TreeSE).\n\n# estimate the specified indices based on the counts assay\ntse <- estimateDiversity(tse,\n assay.type = \"counts\",\n index = \"shannon\")"
- },
- {
- "objectID": "alpha_diversity.html#example-1.2-visualisation",
- "href": "alpha_diversity.html#example-1.2-visualisation",
- "title": "Alpha Diversity",
- "section": "Example 1.2: Visualisation",
- "text": "Example 1.2: Visualisation\nNext, we plot the four indices, with patient status on the x axis and alpha diversity on the y axis. We can also colour by cohort to check for batch effects.\n\n# Plot shannon diversity vs patient_status\np_shannon <- plotColData(tse, \"shannon\", \"patient_status\",\n colour_by = \"cohort\", show_median = TRUE) +\n xlab(\"Patient Status\")\n\np_shannon"
- },
- {
- "objectID": "alpha_diversity.html#example-1.3-comparison",
- "href": "alpha_diversity.html#example-1.3-comparison",
- "title": "Alpha Diversity",
- "section": "Example 1.3: Comparison",
- "text": "Example 1.3: Comparison\nThe three metrics for alpha diversity follow different scales, but they seem to agree when comparing the distributions of the two patient groups.\n\n\nShow code\nlibrary(patchwork)\n\n# Calculate diversity metrics\ntse <- estimateDiversity(tse, assay.type = \"counts\",\n index = c(\"coverage\", \"inverse_simpson\", \"faith\"))\n\n# Generate a plot for each metric\nplots <- lapply(c(\"coverage\", \"shannon\", \"inverse_simpson\", \"faith\"),\n plotColData, object = tse, x = \"patient_status\",\n colour_by = \"cohort\", show_median = TRUE)\n\n# Combine plots\nwrap_plots(plots) +\n plot_layout(guides = \"collect\") +\n plot_annotation(tag_levels = \"A\")"
+ "objectID": "alpha_diversity.html#example-1.3-comparison",
+ "href": "alpha_diversity.html#example-1.3-comparison",
+ "title": "Alpha Diversity",
+ "section": "Example 1.3: Comparison",
+ "text": "Example 1.3: Comparison\nThe three metrics for alpha diversity follow different scales, but they seem to agree when comparing the distributions of the two patient groups.\n\n\nShow code\nlibrary(patchwork)\n\n# Calculate diversity metrics\ntse <- estimateDiversity(tse, assay.type = \"counts\",\n index = c(\"coverage\", \"inverse_simpson\", \"faith\"))\n\n# Generate a plot for each metric\nplots <- lapply(c(\"coverage\", \"shannon\", \"inverse_simpson\", \"faith\"),\n plotColData, object = tse, x = \"patient_status\",\n colour_by = \"cohort\", show_median = TRUE)\n\n# Combine plots\nwrap_plots(plots) +\n plot_layout(guides = \"collect\") +\n plot_annotation(tag_levels = \"A\")"
},
{
"objectID": "alpha_diversity.html#exercise-1",
@@ -977,14 +872,14 @@
"href": "index.html",
"title": "Quarto Presentations",
"section": "",
- "text": "This website hosts quarto presentations about microbiome analysis and data integration with mia and other related packages. Our presentations were prepared for past courses and conferences and cover a broad range of topics in the scope of biological data science. Currently, you can find the following subjects:\n\n\n\nIntro\nBioconductor project\nOrchestrating microbiome Analysis with Bioconductor\n\n\n\n\n\nReproducible workflow with Quarto\nLearning environment\nWorkflow\n\n\n\n\n\nMGnifyR: An R package for accessing MGnify microbiome data\n\n\n\n\n\nOrchestrating microbiome multi-omics with R & Bioconductor\nData containers\nSummarizedExperiment\nTreeSummarizedExperiment\n\n\n\n\n\nHintikkaXOData\nTengeler2020\n\n\n\n\n\nData Manipulation\nIntroduction to Quarto\nAlpha Diversity\nPCoA\ndbRDA\nCompositional Heatmaps\nDifferential Abundance\nDay 2"
+ "text": "This website hosts quarto presentations about microbiome analysis and data integration with mia and other related packages. Our presentations were prepared for past courses and conferences and cover a broad range of topics in the scope of biological data science. Currently, you can find the following subjects:\n\n\n\nIntro\nBioconductor project\nOrchestrating Microbiome Analysis with Bioconductor\n\n\n\n\n\nReproducible workflow with Quarto\nLearning environment\nWorkflow\n\n\n\n\n\nMGnifyR: An R package for accessing MGnify microbiome data\n\n\n\n\n\nOrchestrating microbiome multi-omics with R & Bioconductor\nData containers\nSummarizedExperiment\nTreeSummarizedExperiment\n\n\n\n\n\nHintikkaXOData\nTengeler2020\n\n\n\n\n\nData Manipulation\nIntroduction to Quarto\nAlpha Diversity\nPCoA\ndbRDA\nCompositional Heatmaps\nDifferential Abundance\nDay 2"
},
{
"objectID": "index.html#overview",
"href": "index.html#overview",
"title": "Quarto Presentations",
"section": "",
- "text": "This website hosts quarto presentations about microbiome analysis and data integration with mia and other related packages. Our presentations were prepared for past courses and conferences and cover a broad range of topics in the scope of biological data science. Currently, you can find the following subjects:\n\n\n\nIntro\nBioconductor project\nOrchestrating microbiome Analysis with Bioconductor\n\n\n\n\n\nReproducible workflow with Quarto\nLearning environment\nWorkflow\n\n\n\n\n\nMGnifyR: An R package for accessing MGnify microbiome data\n\n\n\n\n\nOrchestrating microbiome multi-omics with R & Bioconductor\nData containers\nSummarizedExperiment\nTreeSummarizedExperiment\n\n\n\n\n\nHintikkaXOData\nTengeler2020\n\n\n\n\n\nData Manipulation\nIntroduction to Quarto\nAlpha Diversity\nPCoA\ndbRDA\nCompositional Heatmaps\nDifferential Abundance\nDay 2"
+ "text": "This website hosts quarto presentations about microbiome analysis and data integration with mia and other related packages. Our presentations were prepared for past courses and conferences and cover a broad range of topics in the scope of biological data science. Currently, you can find the following subjects:\n\n\n\nIntro\nBioconductor project\nOrchestrating Microbiome Analysis with Bioconductor\n\n\n\n\n\nReproducible workflow with Quarto\nLearning environment\nWorkflow\n\n\n\n\n\nMGnifyR: An R package for accessing MGnify microbiome data\n\n\n\n\n\nOrchestrating microbiome multi-omics with R & Bioconductor\nData containers\nSummarizedExperiment\nTreeSummarizedExperiment\n\n\n\n\n\nHintikkaXOData\nTengeler2020\n\n\n\n\n\nData Manipulation\nIntroduction to Quarto\nAlpha Diversity\nPCoA\ndbRDA\nCompositional Heatmaps\nDifferential Abundance\nDay 2"
},
{
"objectID": "index.html#contributions",
@@ -1533,179 +1428,228 @@
"text": "Julia packages"
},
{
- "objectID": "differential_abundance.html#overview",
- "href": "differential_abundance.html#overview",
- "title": "Differential Abundance",
- "section": "Overview",
- "text": "Overview\nDifferential Abundance (DA) analysis is used to identify taxa that are significantly more or less abundant in the condition compared to control.\nMany methods are available including:\n\nALDEx2\nANCOMBC\nLinDA\n\nA few things to keep in minds when performing DAA involve:\n\nDAA software normally takes the counts assay as input, because they apply normalisation suitable for count data\nDAA results will be more reproducible if the extremely rare taxa and singletons are removed in advance\nIt is recommended to run different methods on the same data and compare the results"
+ "objectID": "oma.html#outline",
+ "href": "oma.html#outline",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Outline",
+ "text": "Outline"
},
{
- "objectID": "differential_abundance.html#example-1.1-preparing-for-da",
- "href": "differential_abundance.html#example-1.1-preparing-for-da",
- "title": "Differential Abundance",
- "section": "Example 1.1: Preparing for DA",
- "text": "Example 1.1: Preparing for DA\nFirst, we import Tengeler2020 and load the DA library MicrobiomeStats.\n\nlibrary(mia)\nlibrary(MicrobiomeStat)\nlibrary(tidyverse)\n\n# Import Tengeler2020\ndata(\"Tengeler2020\", package = \"mia\")\ntse <- Tengeler2020\n\n\n\nShow code\nmean_abund <- round(mean(rowMeans(assay(tse, \"counts\"))), 2)\npaste0(\"Taxa: \", nrow(tse), \", Mean abundance: \", mean_abund)\n\n\n[1] \"Taxa: 151, Mean abundance: 119.19\"\n\n\nFor DA analysis, it is preferable to reduce the dimensionality and sparsity of the data.\n\n# Agglomerate by Genus and filter by prevalence and detection\ntse_genus <- agglomerateByPrevalence(tse,\n rank = \"Genus\",\n detection = 0.001,\n prevalence = 0.1)\n\n\n\nShow code\nmean_abund_genus <- round(mean(rowMeans(assay(tse_genus, \"counts\"))), 2)\npaste0(\"Taxa: \", nrow(tse_genus), \", Mean abundance: \", mean_abund_genus)\n\n\n[1] \"Taxa: 49, Mean abundance: 355.52\""
+ "objectID": "oma.html#bioconductor",
+ "href": "oma.html#bioconductor",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Bioconductor",
+ "text": "Bioconductor\n\nCommunity-driven open-source project\n\n\nTraining programs & workshops\nConferences & community support\nBioinformatics software"
},
{
- "objectID": "differential_abundance.html#example-1.2-performing-da",
- "href": "differential_abundance.html#example-1.2-performing-da",
- "title": "Differential Abundance",
- "section": "Example 1.2: Performing DA",
- "text": "Example 1.2: Performing DA\nHere, we run LinDA. We first extract the counts assay and convert it into a dataframe.\n\notu.tab <- assay(tse_genus, \"counts\") |>\n as.data.frame()\n\nWe also need to select the columns of the colData which contain the independent variables you want to include in the model.\n\nmeta <- colData(tse) |>\n as.data.frame() |>\n select(patient_status, cohort)\n\nWe are ready to run LinDA, which takes the assay count (otu.tab) and the variable arrays (meta). A formula for the model with main independent variable + covariates should be defined. The other arguments are optional but good to know.\n\nres <- linda(otu.tab, meta,\n formula = \"~ patient_status + cohort\", \n feature.dat.type = \"count\")\n\n0 features are filtered!\nThe filtered data has 27 samples and 49 features will be tested!\nImputation approach is used.\nFit linear models ...\nCompleted."
+ "objectID": "oma.html#software",
+ "href": "oma.html#software",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Software",
+ "text": "Software\n\n~2,300 R packages\nReview, testing, documentation"
},
{
- "objectID": "differential_abundance.html#example-1.3-interpreting-results",
- "href": "differential_abundance.html#example-1.3-interpreting-results",
- "title": "Differential Abundance",
- "section": "Example 1.3: Interpreting Results",
- "text": "Example 1.3: Interpreting Results\nFinally, we select significantly DA taxa and list it in Table 1.\n\nsignif_res <- res$output$patient_statusControl |>\n filter(reject) |>\n select(stat, padj) |>\n arrange(padj)\n\nknitr::kable(signif_res)\n\n\n\nTable 1: DA bacterial genera. If stat > 0, abundance is higher in control, otherwise it is higher in ADHD.\n\n\n\n\n\n\n\nstat\npadj\n\n\n\n\n[Ruminococcus]_gauvreauii_group\n4.891159\n0.0024419\n\n\nFaecalibacterium\n-4.694520\n0.0024419\n\n\nCatabacter\n-3.616601\n0.0236808\n\n\nErysipelatoclostridium\n3.357042\n0.0334163\n\n\nRuminococcaceae_UCG-014\n-3.224143\n0.0368033"
+ "objectID": "oma.html#data-containers-form-the-core",
+ "href": "oma.html#data-containers-form-the-core",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Data containers form the core",
+ "text": "Data containers form the core"
},
{
- "objectID": "differential_abundance.html#exercise-1",
- "href": "differential_abundance.html#exercise-1",
- "title": "Differential Abundance",
- "section": "Exercise 1",
- "text": "Exercise 1\n\nDA analysis with LinDA: exercise 8.2\nDA analysis with ALDEx2: exercise 8.1\n\nExtra:\n\ncomparing DA methods: exercise 8.3"
+ "objectID": "oma.html#summarizedexperiment",
+ "href": "oma.html#summarizedexperiment",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "SummarizedExperiment",
+ "text": "SummarizedExperiment\n\nMost common data container\nOptimized for biological data\nExtended to different purposes"
},
{
- "objectID": "differential_abundance.html#resources",
- "href": "differential_abundance.html#resources",
- "title": "Differential Abundance",
- "section": "Resources",
- "text": "Resources\n\nOMA Chapter - Differential Abundance"
+ "objectID": "oma.html#optimal-container-for-microbiome-data",
+ "href": "oma.html#optimal-container-for-microbiome-data",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Optimal container for microbiome data?",
+ "text": "Optimal container for microbiome data?"
},
{
- "objectID": "compositional_heatmap.html#example-1.1",
- "href": "compositional_heatmap.html#example-1.1",
- "title": "Compositional Heatmaps",
- "section": "Example 1.1",
- "text": "Example 1.1\nWe first import the packages used in this tutorial.\n\n# Import libraries\nlibrary(mia)\nlibrary(ComplexHeatmap)\n\nWe also import Tengeler2020 from the mia package and store it into a variable.\n\n# Load dataset and store it into tse\ndata(\"Tengeler2020\", package = \"mia\")\ntse <- Tengeler2020\n\nNext, we transform the counts assay to relative abundance assay and store it into the TreeSE.\n\n# Transform counts to relative abundances\ntse <- transformAssay(tse, method = \"relabundance\")\n\nThen, we agglomerate the experiment to the order level, so that information is more condensed and therefore easier to visualise and interpret.\n\n# Agglomerate by order\ntse_order <- agglomerateByRank(tse, rank = \"Order\")"
+ "objectID": "oma.html#optimal-container-for-microbiome-data-1",
+ "href": "oma.html#optimal-container-for-microbiome-data-1",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Optimal container for microbiome data?",
+ "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking"
},
{
- "objectID": "compositional_heatmap.html#why-relative-abundances",
- "href": "compositional_heatmap.html#why-relative-abundances",
- "title": "Compositional Heatmaps",
- "section": "Why relative abundances?",
- "text": "Why relative abundances?\nMicrobiome data is compositional. Relative abundance helps us draw less biased comparisons between samples.\n\n\nShow code\n# Import packages\nlibrary(miaViz)\nlibrary(patchwork)\n\n# Plot composition by counts\ncounts_bar <- plotAbundance(tse_order, rank = \"Phylum\", use_relative = FALSE) +\n ylab(\"Counts\")\n\n# Plot composition by relative abundance\nrelab_bar <- plotAbundance(tse_order, rank = \"Phylum\", use_relative = TRUE) +\n ylab(\"Relative Abundance\")\n\n# Combine plots\n(counts_bar | relab_bar) +\n plot_layout(guides = \"collect\")\n\n\n\n\nFigure 1: Sample composition by counts (left) or relative abundance (right)."
+ "objectID": "oma.html#optimal-container-for-microbiome-data-2",
+ "href": "oma.html#optimal-container-for-microbiome-data-2",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Optimal container for microbiome data?",
+ "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features"
},
{
- "objectID": "compositional_heatmap.html#example-1.2",
- "href": "compositional_heatmap.html#example-1.2",
- "title": "Compositional Heatmaps",
- "section": "Example 1.2",
- "text": "Example 1.2\nTo reduce data skewness, we further transform the relative abundance assay with the Centered-Log Ratio (CLR), which is defined as follows:\n\\[\nclr = log \\frac{x}{g(x)} = log(x)−log[g(x)]\n\\]\nwhere x is a feature and g(x) is the geometric mean of all features in a sample.\n\n# Transform relative abundances to clr\ntse_order <- transformAssay(tse_order,\n assay.type = \"relabundance\",\n method = \"clr\",\n pseudocount = 1,\n MARGIN = \"samples\")\n\nLastly, we get the row-wise z-scores of every feature from the clr assay to standardise abundances across samples.\n\n# Transform clr to z\ntse_order <- transformAssay(tse_order,\n assay.type = \"clr\", \n method = \"z\",\n name = \"clr_z\",\n MARGIN = \"features\")"
+ "objectID": "oma.html#optimal-container-for-microbiome-data-3",
+ "href": "oma.html#optimal-container-for-microbiome-data-3",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Optimal container for microbiome data?",
+ "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features\nSide information: extended capabilities & data types"
},
{
- "objectID": "compositional_heatmap.html#example-1.3",
- "href": "compositional_heatmap.html#example-1.3",
- "title": "Compositional Heatmaps",
- "section": "Example 1.3",
- "text": "Example 1.3\nFinally, we visualise the clr-z assay with ComplexHeatmap.\n\n# Visualise clr-z assay with a heatmap\nclrz_hm <- Heatmap(assay(tse_order, \"clr_z\"), name = \"clr-z\")\nclrz_hm\n\n\n\nFigure 2: Heatmap of CLR-Z assay where columns correspond to samples and rows to taxa agglomerated by order."
+ "objectID": "oma.html#optimal-container-for-microbiome-data-4",
+ "href": "oma.html#optimal-container-for-microbiome-data-4",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Optimal container for microbiome data?",
+ "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features\nSide information: extended capabilities & data types\nOptimized: for speed & memory"
},
{
- "objectID": "compositional_heatmap.html#why-clr-z-transformation",
- "href": "compositional_heatmap.html#why-clr-z-transformation",
- "title": "Compositional Heatmaps",
- "section": "Why clr-z transformation?",
- "text": "Why clr-z transformation?\nA CLR-z transformation improves comparability in two steps:\n\nApply CLR transform to center features column-wise\nFind Z score to standardise features row-wise\n\n\n\nFigure 3: Visual comparison between counts, relative abundance, clr and clr-z assays (from left to right)."
+ "objectID": "oma.html#optimal-container-for-microbiome-data-5",
+ "href": "oma.html#optimal-container-for-microbiome-data-5",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Optimal container for microbiome data?",
+ "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features\nSide information: extended capabilities & data types\nOptimized: for speed & memory\nIntegrated: with other applications & frameworks"
},
{
- "objectID": "compositional_heatmap.html#exercise-1",
- "href": "compositional_heatmap.html#exercise-1",
- "title": "Compositional Heatmaps",
- "section": "Exercise 1",
- "text": "Exercise 1\n\nheatmap visualisation: exercise 9.2\n\nExtra:\n\nadvanced heatmap: exercise 9.3"
+ "objectID": "oma.html#optimal-container-for-microbiome-data-6",
+ "href": "oma.html#optimal-container-for-microbiome-data-6",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Optimal container for microbiome data?",
+ "text": "Optimal container for microbiome data?\n\nMultiple assays: seamless interlinking\nHierarchical data: supporting samples & features\nSide information: extended capabilities & data types\nOptimized: for speed & memory\nIntegrated: with other applications & frameworks\n\nReduce overlapping efforts, improve interoperability, ensure sustainability."
},
{
- "objectID": "compositional_heatmap.html#resources",
- "href": "compositional_heatmap.html#resources",
- "title": "Compositional Heatmaps",
- "section": "Resources",
- "text": "Resources\n\nOMA Chapter - Community Composition\nOMA Chapter - Visualisation\nComplexHeatmap Complete Reference"
+ "objectID": "oma.html#treesummarizedexperiment",
+ "href": "oma.html#treesummarizedexperiment",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "TreeSummarizedExperiment",
+ "text": "TreeSummarizedExperiment\n\nExtension to SummarizedExperiment\nOptimal for microbiome data\nLinks microbiome field to larger SummarizedExperiment family"
},
{
- "objectID": "data_manipulation.html#why-data-manipulation",
- "href": "data_manipulation.html#why-data-manipulation",
- "title": "Data Manipulation",
- "section": "Why data manipulation?",
- "text": "Why data manipulation?\nRaw data might be uninformative or incompatible with a method. We want to be able to modify, polish, subset, agglomerate and transform it."
+ "objectID": "oma.html#microbiome-analysis-mia",
+ "href": "oma.html#microbiome-analysis-mia",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "MIcrobiome Analysis (mia)",
+ "text": "MIcrobiome Analysis (mia)\n\nMicrobiome data science in SummarizedExperiment ecosystem\nDistributed through several R packages\nmia package top 7.7% Bioconductor downloads"
},
{
- "objectID": "data_manipulation.html#why-so-complex",
- "href": "data_manipulation.html#why-so-complex",
- "title": "Data Manipulation",
- "section": "Why so complex?",
- "text": "Why so complex?\nTreeSE containers organise information to improve flexibility and accessibility, which comes with a bit of complexity. Focus on assays, colData and rowData."
+ "objectID": "oma.html#community-driven-ecosystem-of-tools",
+ "href": "oma.html#community-driven-ecosystem-of-tools",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Community-driven ecosystem of tools",
+ "text": "Community-driven ecosystem of tools\n\n\n\nmia (Data analysis)\nmiaViz (Visualization)\nmiaSim (Simulation)\nmiaTime (Time series analysis)\nmiaDash (Graphical user interface)\niSEEtree (Interactive visualization)\nExpanded by independent developers"
},
{
- "objectID": "data_manipulation.html#example-1.1-data-import",
- "href": "data_manipulation.html#example-1.1-data-import",
- "title": "Data Manipulation",
- "section": "Example 1.1: Data Import",
- "text": "Example 1.1: Data Import\nWe work with microbiome data inside TreeSummarizedExperiment (TreeSE) containers and mia is our toolkit.\n\n# Load Tengeler2020 and store it into a TreeSE\nlibrary(mia)\ndata(\"Tengeler2020\", package = \"mia\")\ntse <- Tengeler2020\n\nThe components of a TreeSE can all be seen at a glance.\n\n# Print TreeSE\ntse\n\nclass: TreeSummarizedExperiment \ndim: 151 27 \nmetadata(0):\nassays(1): counts\nrownames(151): Bacteroides Bacteroides_1 ... Parabacteroides_8\n Unidentified_Lachnospiraceae_14\nrowData names(6): Kingdom Phylum ... Family Genus\ncolnames(27): A110 A12 ... A35 A38\ncolData names(4): patient_status cohort patient_status_vs_cohort\n sample_name\nreducedDimNames(0):\nmainExpName: NULL\naltExpNames(0):\nrowLinks: a LinkDataFrame (151 rows)\nrowTree: 1 phylo tree(s) (151 leaves)\ncolLinks: NULL\ncolTree: NULL"
+ "objectID": "oma.html#advantages",
+ "href": "oma.html#advantages",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Advantages",
+ "text": "Advantages\n\nShared data container\nScalable & optimized for large datasets\nComprehensive documentation\n\nAllows us to develop efficient microbiome data science workflows"
},
{
- "objectID": "data_manipulation.html#example-1.2-column-data",
- "href": "data_manipulation.html#example-1.2-column-data",
- "title": "Data Manipulation",
- "section": "Example 1.2: Column data",
- "text": "Example 1.2: Column data\nColumns represent the samples of an experiment.\n\n# Retrieve sample names\nhead(colnames(tse), 3)\n\n[1] \"A110\" \"A12\" \"A15\" \n\n\nAll information about the samples is stored in colData.\n\n# Retrieve sample data\nhead(colData(tse), 3)\n\nDataFrame with 3 rows and 4 columns\n patient_status cohort patient_status_vs_cohort sample_name\n <character> <character> <character> <character>\nA110 ADHD Cohort_1 ADHD_Cohort_1 A110\nA12 ADHD Cohort_1 ADHD_Cohort_1 A12\nA15 ADHD Cohort_1 ADHD_Cohort_1 A15\n\n\nIndividual variables about the samples can be accessed directly.\n\n# Retrieve sample variables\nhead(tse$patient_status, 3)\n\n[1] \"ADHD\" \"ADHD\" \"ADHD\""
+ "objectID": "oma.html#orchestrating-microbiome-analysis-with-bioconductor",
+ "href": "oma.html#orchestrating-microbiome-analysis-with-bioconductor",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Orchestrating Microbiome Analysis with Bioconductor",
+ "text": "Orchestrating Microbiome Analysis with Bioconductor\n\nResources and tutorials for microbiome analysis\nCommunity-built best practices\nOpen to contributions!\n\n\n\n\n\n\n\nGo to the Orchestrating Microbiome Analysis (OMA) online book\n\n\nmicrobiome.github.io/OMA"
},
{
- "objectID": "data_manipulation.html#example-1.3-row-data",
- "href": "data_manipulation.html#example-1.3-row-data",
- "title": "Data Manipulation",
- "section": "Example 1.3: Row data",
- "text": "Example 1.3: Row data\nRows represent the features of an experiment.\n\n# Retrieve feature names\nhead(rownames(tse), 3)\n\n[1] \"Bacteroides\" \"Bacteroides_1\" \"Parabacteroides\"\n\n\nAll information about the samples is stored in rowData.\n\n# Retrieve feature data\nhead(rowData(tse), 3)\n\nDataFrame with 3 rows and 6 columns\n Kingdom Phylum Class Order\n <character> <character> <character> <character>\nBacteroides Bacteria Bacteroidetes Bacteroidia Bacteroidales\nBacteroides_1 Bacteria Bacteroidetes Bacteroidia Bacteroidales\nParabacteroides Bacteria Bacteroidetes Bacteroidia Bacteroidales\n Family Genus\n <character> <character>\nBacteroides Bacteroidaceae Bacteroides\nBacteroides_1 Bacteroidaceae Bacteroides\nParabacteroides Porphyromonadaceae Parabacteroides\n\n\nIndividual variables about the samples can be accessed from rowData.\n\n# Retrieve feature variables\nhead(rowData(tse)$Genus, 3)\n\n[1] \"Bacteroides\" \"Bacteroides\" \"Parabacteroides\""
+ "objectID": "oma.html#poem-of-the-day",
+ "href": "oma.html#poem-of-the-day",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Poem of the day",
+ "text": "Poem of the day\n\n\nPut it in the book, make it clear,\nmia framework’s waiting here.\nOMA’s guide will light the way,\nhelping you every step of the day."
},
{
- "objectID": "data_manipulation.html#example-1.4-assays",
- "href": "data_manipulation.html#example-1.4-assays",
- "title": "Data Manipulation",
- "section": "Example 1.4: Assays",
- "text": "Example 1.4: Assays\nThe assays of an experiment (counts, relative abundance, etc.) can be found in assays.\n\nassays(tse)\n\nList of length 1\nnames(1): counts\n\n\nassayNames return only their names.\n\nassayNames(tse)\n\n[1] \"counts\"\n\n\nAn individual assay can be retrieved with assay.\n\nassay(tse, \"counts\")[seq(6), seq(6)]\n\n A110 A12 A15 A19 A21 A23\nBacteroides 17722 11630 0 8806 1740 1791\nBacteroides_1 12052 0 2679 2776 540 229\nParabacteroides 0 970 0 549 145 0\nBacteroides_2 0 1911 0 5497 659 0\nAkkermansia 1143 1891 1212 584 84 700\nBacteroides_3 0 6498 0 4455 610 0"
+ "objectID": "oma.html#thank-you-for-your-time",
+ "href": "oma.html#thank-you-for-your-time",
+ "title": "Orchestrating Microbiome Analysis with Bioconductor",
+ "section": "Thank you for your time!",
+ "text": "Thank you for your time!\n\n\n\n\n\nMoreno-Indias et al. (2021) Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions. Frontiers in Microbiology.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nhttps://microbiome.github.io/"
},
{
- "objectID": "data_manipulation.html#exercise-1",
- "href": "data_manipulation.html#exercise-1",
- "title": "Data Manipulation",
+ "objectID": "quarto.html#example-1.1-markdown-syntax",
+ "href": "quarto.html#example-1.1-markdown-syntax",
+ "title": "Reproducible Reporting with Quarto",
+ "section": "Example 1.1: # Markdown Syntax",
+ "text": "Example 1.1: # Markdown Syntax\nIn Markdown syntax, you can do the following and more.\n\nHeadings: # My title, ## My subtitle, ### My subsubtitle\nUnordered lists:\n\n- item 1\n- item 2\n\nOrdered lists:\n\n1. item 1\n2. item 2\n\nFont: *italic*, **bold**, `monospaced`\nLinks: [text](url)\nCross-references: @chunk-label"
+ },
+ {
+ "objectID": "quarto.html#example-1.2-running-code",
+ "href": "quarto.html#example-1.2-running-code",
+ "title": "Reproducible Reporting with Quarto",
+ "section": "Example 1.2: Running Code",
+ "text": "Example 1.2: Running Code\nAdd code chunks with alt + cmd + i and click Render to generate a report with both text and code output.\n\n# R code\ncitation(\"mia\")\n\nTo cite package 'mia' in publications use:\n\n Borman T, Ernst F, Shetty S, Lahti L (2024). _mia: Microbiome\n analysis_. R package version 1.15.3, commit\n 161b2096a2ab2c1791eb02e2ccf3564defe39e40,\n <https://github.com/microbiome/mia>.\n\nA BibTeX entry for LaTeX users is\n\n @Manual{,\n title = {mia: Microbiome analysis},\n author = {Tuomas Borman and Felix G.M. Ernst and Sudarshan A. Shetty and Leo Lahti},\n year = {2024},\n note = {R package version 1.15.3, commit 161b2096a2ab2c1791eb02e2ccf3564defe39e40},\n url = {https://github.com/microbiome/mia},\n }"
+ },
+ {
+ "objectID": "quarto.html#exercise-1",
+ "href": "quarto.html#exercise-1",
+ "title": "Reproducible Reporting with Quarto",
"section": "Exercise 1",
- "text": "Exercise 1\n\npreliminary exploration: exercise 3.3\nassay retrieval: exercise 3.4\n\nExtra:\n\nconstructing a TreeSE object: exercise 3.1\n\nRaw data can be retrieved here."
+ "text": "Exercise 1\n\nnew document: exercise 2.1\ncode chunks: exercise 2.2"
},
{
- "objectID": "data_manipulation.html#example-2.1-subsetting",
- "href": "data_manipulation.html#example-2.1-subsetting",
- "title": "Data Manipulation",
- "section": "Example 2.1: Subsetting",
- "text": "Example 2.1: Subsetting\nWe can subset features or samples of a TreeSE, but first we need to pick a variable.\n\n# Check levels of a sample variable\nunique(tse$patient_status)\n\n[1] \"ADHD\" \"Control\"\n\n\nTo subset samples, we filter columns with a conditional.\n\n# Subset by a sample variable\nsubcol_tse <- tse[ , tse$patient_status == \"ADHD\"]\ndim(subcol_tse)\n\n[1] 151 13\n\n\nWe now want to subset by our favourite Phylum.\n\n# Check levels of a feature variable\nunique(rowData(tse)$Phylum)\n\n[1] \"Bacteroidetes\" \"Verrucomicrobia\" \"Proteobacteria\" \"Firmicutes\" \n[5] \"Cyanobacteria\" \n\n\nTo subset features, we filter rows with a conditional.\n\n# Subset by a feature variable\nsubrow_tse <- tse[rowData(tse)$Phylum == \"Firmicutes\", ]\ndim(subrow_tse)\n\n[1] 97 27"
+ "objectID": "quarto.html#example-2.1-knitr-options",
+ "href": "quarto.html#example-2.1-knitr-options",
+ "title": "Reproducible Reporting with Quarto",
+ "section": "Example 2.1: knitr options",
+ "text": "Example 2.1: knitr options\nYou can add options to code chunks to change their behaviour.\n\nOriginal chunk\n\n\nprint(\"I love my microbiome\")\n\n[1] \"I love my microbiome\"\n\n\n\nAfter adding #| echo: false\n\n\n\n[1] \"I love my microbiome\"\n\n\n\nAfter adding #| eval: false\n\n\nprint(\"I love my microbiome\")\n\n\nAfter adding #| code-fold: true\n\n\n\nShow code\nprint(\"I love my microbiome\")\n\n\n[1] \"I love my microbiome\"\n\n\n\nAfter adding #| include: false"
},
{
- "objectID": "data_manipulation.html#example-2.2-agglomeration",
- "href": "data_manipulation.html#example-2.2-agglomeration",
- "title": "Data Manipulation",
- "section": "Example 2.2: Agglomeration",
- "text": "Example 2.2: Agglomeration\nAgglomeration condenses the assays to higher taxonomic ranks. Related taxa are combined together. We can agglomerate by different ranks.\n\n# View rank options\ntaxonomyRanks(tse)\n\n[1] \"Kingdom\" \"Phylum\" \"Class\" \"Order\" \"Family\" \"Genus\" \n\n\nWe agglomerate by Phylum and store the new experiment in the altExp slot.\n\n# Agglomerate by Phylum and store into altExp slot\naltExp(tse, \"phylum\") <- agglomerateByRank(tse, rank = \"Phylum\")\naltExp(tse, \"phylum\")\n\nclass: TreeSummarizedExperiment \ndim: 5 27 \nmetadata(1): agglomerated_by_rank\nassays(1): counts\nrownames(5): Bacteroidetes Cyanobacteria Firmicutes Proteobacteria\n Verrucomicrobia\nrowData names(6): Kingdom Phylum ... Family Genus\ncolnames(27): A110 A12 ... A35 A38\ncolData names(4): patient_status cohort patient_status_vs_cohort\n sample_name\nreducedDimNames(0):\nmainExpName: NULL\naltExpNames(0):\nrowLinks: a LinkDataFrame (5 rows)\nrowTree: 1 phylo tree(s) (151 leaves)\ncolLinks: NULL\ncolTree: NULL"
+ "objectID": "quarto.html#example-2.2-more-about-knitr-options",
+ "href": "quarto.html#example-2.2-more-about-knitr-options",
+ "title": "Reproducible Reporting with Quarto",
+ "section": "Example 2.2: More about knitr options",
+ "text": "Example 2.2: More about knitr options\nIf you want an option to affect all chunks in a script, you can set it globally.\n\n# Turn off chunk visibility and warnings\nknitr::opts_chunk$set(echo = FALSE, warning = FALSE)\n\nYou can label figures (or tables) with #| label: fig-name and cross-reference them with @fig-name (Figure 1).\n\ndata(iris)\nboxplot(Sepal.Length ~ Species, data = iris)\n\n\n\nFigure 1: A boxplot of the sepal length distribution by species."
},
{
- "objectID": "data_manipulation.html#example-2.3-transformation",
- "href": "data_manipulation.html#example-2.3-transformation",
- "title": "Data Manipulation",
- "section": "Example 2.3: Transformation",
- "text": "Example 2.3: Transformation\nData can be transformed for different reasons. For example, to make samples comparable we can use relative abundance.\n\n# Transform counts to relative abundance\ntse <- transformAssay(tse,\n assay.type = \"counts\",\n method = \"relabundance\")\n\n# View sample-wise sums\nhead(colSums(assay(tse, \"relabundance\")), 3)\n\nA110 A12 A15 \n 1 1 1 \n\n\nOr to standardise features to the normal distribution we can use z-scores: \\(Z = \\frac{x - \\mu}{\\sigma}\\).\n\n# Transform relative abundance to z-scores\ntse <- transformAssay(tse,\n assay.type = \"relabundance\",\n method = \"z\",\n MARGIN = \"features\")\n\n# View feature-wise standard deviations\nhead(rowSds(assay(tse, \"z\")), 3)\n\n Bacteroides Bacteroides_1 Parabacteroides \n 1 1 1"
+ "objectID": "quarto.html#example-2.3-yaml-parameters",
+ "href": "quarto.html#example-2.3-yaml-parameters",
+ "title": "Reproducible Reporting with Quarto",
+ "section": "Example 2.3: YAML Parameters",
+ "text": "Example 2.3: YAML Parameters\nAt the beginning of any Quarto document, there is a box delimited by ---. There you can define document metadata, such as title, author, date, output format, bibliography, citation style, theme, font size and many others.\n---\ntitle: “Around the gut in 24 hours”\nformat: html\neditor: visual\nsmaller: true\nauthor: Escherichia coli\ndate: 2024-11-27\n---"
},
{
- "objectID": "data_manipulation.html#exercise-2",
- "href": "data_manipulation.html#exercise-2",
- "title": "Data Manipulation",
+ "objectID": "quarto.html#exercise-2",
+ "href": "quarto.html#exercise-2",
+ "title": "Reproducible Reporting with Quarto",
"section": "Exercise 2",
- "text": "Exercise 2\n\nsubsetting: exercise 4.1\nagglomeration: exercise 5.1\ntransformation: exercise 4.6\n\nExtra:\n\nprevalence subsetting: exercise 4.3\nalternative experiments: exercise 5.2"
+ "text": "Exercise 2\n\nknitr options: exercise 2.3\nYAML parameters: exercise 2.4\n\nExtra:\n\nQuarto parameters: exercise 2.5"
},
{
- "objectID": "data_manipulation.html#resources",
- "href": "data_manipulation.html#resources",
- "title": "Data Manipulation",
+ "objectID": "quarto.html#resources",
+ "href": "quarto.html#resources",
+ "title": "Reproducible Reporting with Quarto",
"section": "Resources",
- "text": "Resources\n\nmia function reference\nOMA Section - Data Containers\nOMA Section - Subsetting\nOMA Section - Agglomeration\nOMA Section - Transformation"
+ "text": "Resources\n\nOMA Section - Quarto\nChunk Option Catalogue\nCross References\nYAML Parameter Catalogue"
+ },
+ {
+ "objectID": "hintikkaxo_presentation.html#study-design",
+ "href": "hintikkaxo_presentation.html#study-design",
+ "title": "A brief introduction to HintikkaXOData",
+ "section": "Study design",
+ "text": "Study design\n\nHintikkaXOData is a MultiAssayExperiment containing microbiome, metabolome and biomarker data from a study on the effects of diet and prebiotics on fatty-liver disease in rats (Hintikka et al. 2021).\n\n\nExperimentList class object of length 3:\n [1] microbiota: TreeSummarizedExperiment with 12706 rows and 40 columns\n [2] metabolites: TreeSummarizedExperiment with 38 rows and 40 columns\n [3] biomarkers: TreeSummarizedExperiment with 39 rows and 40 columns\n\n\n\n\nThe rat population (N = 40) was divided into 4 groups (n = 10), which underwent different diets over a period of 12 weeks. The 4 groups included:\n\n\n\nhigh-fat diet without prebiotics\nhigh-fat diet with prebiotics\nlow-fat diet without prebiotics\nlow-fat diet with prebiotics\n\n\n\n\n\n\n\n\nXOS+\nXOS-\n\n\n\n\nLow Fat\n10\n10\n\n\nHigh Fat\n10\n10\n\n\n\n\n\n\n\nXOS: xylooligosaccharide"
+ },
+ {
+ "objectID": "hintikkaxo_presentation.html#microbiota",
+ "href": "hintikkaxo_presentation.html#microbiota",
+ "title": "A brief introduction to HintikkaXOData",
+ "section": "Microbiota",
+ "text": "Microbiota\n\nMicrobiome data was obtained by 16S rRNA gene sequencing of bacterial DNA sampled from the cecum of terminated rats. Then, sequence reads were assembled into Operational Taxonomic Units (OTUs).\n\n\n\n\n\n\n\n\n\n\nRelative Abundance (%)\n\n\n\n\nBacteroidetes\n51.3\n\n\nFirmicutes\n39.3\n\n\nVerrucomicrobia\n3.8\n\n\nCyanobacteria\n2.6\n\n\nProteobacteria\n2.1\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 1"
+ },
+ {
+ "objectID": "hintikkaxo_presentation.html#metabolites",
+ "href": "hintikkaxo_presentation.html#metabolites",
+ "title": "A brief introduction to HintikkaXOData",
+ "section": "Metabolites",
+ "text": "Metabolites\nMetabolome data was obtained from the same cecal samples by nuclear magnetic resonance (NMR) spectroscopy. There is a total of 38 features, corresponding to the different metabolites."
+ },
+ {
+ "objectID": "hintikkaxo_presentation.html#biomarkers",
+ "href": "hintikkaxo_presentation.html#biomarkers",
+ "title": "A brief introduction to HintikkaXOData",
+ "section": "Biomarkers",
+ "text": "Biomarkers\n\nBiomarkers data were obtained by Western blot from protein homogenates of rat tissues. There is a total of 39 features, corresponding to the biomarkers/site combinations.\n\n\n\n\n\n\n\n\n\nBiomarker\nFunction\n\n\n\n\nCS\ncitrate synthase\n\n\nAST\naspartate aminotransferase\n\n\nALT\nalanine aminotransferase\n\n\nERK\nextracellular signal-regulated kinase\n\n\nIRS1\ninsulin receptor substrate 1\n\n\nAKT\nprotein kinase B\n\n\nIL1b\ninterleukin 1 beta\n\n\nHAD\n3-hydroxyacyl-CoA dehydrogenase 8\n\n\nCD45\nprotein tyrosine phosphatase\n\n\n\n\n\n\n\n\n\n\n\nName\nSite\n\n\n\n\nepi\nepididymal adipose tissue\n\n\nSCfat\nmesenteric adopose tissue\n\n\nmese\nsubcutaneous adipose tissue\n\n\nliver\nliver\n\n\ngastro\ngastrocnemius muscle\n\n\nserum\nserum"
+ },
+ {
+ "objectID": "hintikkaxo_presentation.html#references",
+ "href": "hintikkaxo_presentation.html#references",
+ "title": "A brief introduction to HintikkaXOData",
+ "section": "References",
+ "text": "References\n\n\n\n\n\n\n\n\nHintikka, Jukka, Sanna Lensu, Elina Mäkinen, Sira Karvinen, Marjaana Honkanen, Jere Lindén, Tim Garrels, Satu Pekkala, and Leo Lahti. 2021. “Xylo-Oligosaccharides in Prevention of Hepatic Steatosis and Adipose Tissue Inflammation: Associating Taxonomic and Metabolomic Patterns in Fecal Microbiomes with Biclustering.” International Journal of Environmental Research and Public Health 18 (8): 4049. https://doi.org/10.3390/ijerph18084049."
},
{
"objectID": "pcoa.html#why-pcoa",
diff --git a/quarto/oma.qmd b/quarto/oma.qmd
index c185a19..ccd9f60 100755
--- a/quarto/oma.qmd
+++ b/quarto/oma.qmd
@@ -30,7 +30,6 @@ format:
- ~2,300 R packages
- Review, testing, documentation
-- Genomics, transcriptomics, microbiomics, ...
```{r}
#| label: bioc_packages
@@ -115,7 +114,7 @@ p1 <- ggplot(pkgs_date, aes(x = Date, y = N, fill = Field)) +
p1
```
-## Data containers form the foundation {.smaller}
+## Data containers form the core {.smaller}
```{r}
#| label: data_container
@@ -130,7 +129,7 @@ ellipse_data <- data.frame(
y = c(2, 1, 0), # Centers of ellipses
a = c(4, 3, 2), # Widths of ellipses
b = c(3, 2, 1), # Heights of ellipses
- label = c("COMMUNITY", "PACKAGES", "DATA"), # Labels for each ellipse
+ label = c("COMMUNITY", "METHODS", "DATA CONTAINER"), # Labels for each ellipse
label_y = c(4, 1.75, 0) # Adjusted vertical positions for labels
)
@@ -306,17 +305,17 @@ _Reduce overlapping efforts, improve interoperability, ensure sustainability._
- Extension to SummarizedExperiment
- Optimal for microbiome data
-- Links microbiome field to larger SE family
+- Links microbiome field to larger SummarizedExperiment family
##
![](images/SE.png){fig-alt="SummarizedExperiment class" fig-align="center" width=10%}
-## {transition="fade" transition-speed="slow"}
+##
![](images/paste-14DB8F76.png){fig-alt="TreeSummarizedExperiment class" fig-align="center" width=10%}
-## MIcrobiome Analysis (mia) {transition="none"}
+## MIcrobiome Analysis (mia)
```{r}
#| label: mia_stats
@@ -375,133 +374,8 @@ perc <- paste0(round(which(rownames(df) == "mia") / nrow(df), 3)*100, "%")
- Scalable & optimized for large datasets
- Comprehensive documentation
-_Allows us to develop modular and efficient workflows_
+_Allows us to develop efficient microbiome data science workflows_
-## {auto-animate="true"}
-
-```r
-# Load package
-library(mia)
-# Load example dataset
-data("peerj13075")
-tse <- peerj13075
-```
-
-```{r}
-#| label: show_treese
-
-
-# Load package
-library(mia)
-# Load example dataset
-data("peerj13075")
-tse <- peerj13075
-
-tse
-```
-
-
-## {auto-animate="true"}
-
-```r
-# Agglomerate to genus level
-tse <- agglomerateByRank(tse, rank = "genus")
-```
-
-## {auto-animate="true"}
-
-```r
-# Agglomerate to genus level
-tse <- agglomerateByRank(tse, rank = "genus")
-
-# Add relative abundances
-tse <- transformAssay(tse, method = "relabundance")
-```
-
-```{r}
-#| label: show_transform
-
-# Agglomerate to genus level
-tse <- agglomerateByRank(tse, rank = "genus")
-
-# Add relative abundances
-tse <- transformAssay(tse, method = "relabundance")
-```
-
-## {auto-animate="true"}
-
-```r
-# Load visualization package
-library(miaViz)
-# Summarize abundance of top taxa
-plotAbundanceDensity(tse, assay.type = "relabundance")
-```
-
-```{r}
-#| label: show_prevalence
-
-# Load visualization package
-library(miaViz)
-# Summarize abundance of top taxa
-plotAbundanceDensity(tse, assay.type = "relabundance")
-```
-
-## {auto-animate="true"}
-
-```r
-# Calculate alpha diversity indices
-tse <- addAlpha(tse, index = "shannon")
-```
-
-## {auto-animate="true"}
-
-```r
-# Calculate alpha diversity indices
-tse <- addAlpha(tse, index = "shannon")
-
-# Load single-cell analysis package that has useful, complementary tools
-library(scater)
-# Plot alpha diversity
-plotColData(tse, x = "Geographical_location", y = "shannon")
-```
-
-```{r}
-#| label: show_alpha
-#| fig-height: 4
-
-# Calculate alpha diversity indices
-tse <- addAlpha(tse, index = "shannon")
-
-# Load single-cell analysis package that has useful, complementary tools
-library(scater)
-# Plot alpha diversity
-plotColData(tse, x = "Geographical_location", y = "shannon")
-```
-
-## {auto-animate="true"}
-
-```r
-# Perform PCoA
-tse <- runMDS(tse, assay.type = "relabundance", FUN = getDissimilarity, method = "bray")
-```
-
-## {auto-animate="true"}
-
-```r
-# Perform PCoA
-tse <- runMDS(tse, assay.type = "relabundance", FUN = getDissimilarity, method = "bray")
-# Plot PCoA
-plotReducedDim(tse, dimred = "MDS", colour_by = "Geographical_location")
-```
-
-```{r}
-#| label: show_pcoa
-
-# Perform PCoA
-tse <- runMDS(tse, assay.type = "relabundance", FUN = getDissimilarity, method = "bray")
-# Plot PCoA
-plotReducedDim(tse, dimred = "MDS", colour_by = "Geographical_location")
-```
## Orchestrating Microbiome Analysis with Bioconductor
@@ -515,7 +389,7 @@ plotReducedDim(tse, dimred = "MDS", colour_by = "Geographical_location")
[microbiome.github.io/OMA](https://microbiome.github.io/OMA/docs/devel/){preview-link="true"}
:::
-## Summary
+## Poem of the day
::: columns
::: {.column width="60%"}
@@ -525,7 +399,7 @@ _mia framework’s waiting here._
_OMA’s guide will light the way,_
-_Helping you every step of the day._
+_helping you every step of the day._
:::