Skip to content

Latest commit

 

History

History
85 lines (55 loc) · 3.98 KB

gotu.md

File metadata and controls

85 lines (55 loc) · 3.98 KB

gOTU analysis

The notion of “gOTU” (pronounced as "go-to") is the minimal unit for community ecology studies based on shotgun metagenome or other forms of whole-genome microbiome data. It is in constrast to conventional practices, in which taxonomic units such as genera or species were used. Therefore, gOTU is analogous to sOTU in 16S rRNA studies. The advantage of using gOTU includes 1) highest-possible resolution, 2) independent from taxonomy which is coarse and error-prone as a classification system. 3) allowing for phylogeny-based analysis such as Faith’s PD and UniFrac. The last part is enhanced by the “Web of Life” (WoL) reference phylogeny.

Contents

gOTU table generation

To generate a gOTU table, one needs a multiplexed alignment file, or a directory of per-sample alignment files. These files can be generated by aligning sequencing data against a reference genome database. We recommend using SHOGUN with the "Web of Life" database (WoL, available for download at: https://biocore.github.io/wol/). For example:

shogun align -a bowtie2 -d WoLr1 -i input.fasta -o .

Then one can run Woltka to convert the alignment file(s) into a gOTU table:

woltka gotu -i alignment.bowtie2.sam -o table.biom

The output file table.biom is a BIOM table with rows as genome IDs (gOTUs), columns as sample IDs, and cell values as counts of gOTUs in samples.

If necessary, you may convert a BIOM table into tab-delimited file:

biom convert --to-tsv -i table.biom -o table.tsv

Note: Both SHOGUN and WoL are available at the Qiita server. If you are a Qiita user, the alignment file can be automatically generated and downloaded from the Qiita interface. See details.

gOTU analysis using QIIME 2

The generated BIOM table can be imported into a QIIME artifact:

qiime tools import --type FeatureTable[Frequency] --input-path table.biom --output-path table.qza

These intermediate steps are automated if you use the QIIME 2 plugin of Woltka.

One can then investigate the microbiome by applying classical QIIME analyses on the gOTU table. For example, with the WoL reference phylogeny (direct download link: tree.qza), one can do:

qiime diversity core-metrics-phylogenetic \
  --i-phylogeny tree.qza \
  --i-table table.qza \
  --p-sampling-depth 1000 \
  --m-metadata-file metadata.tsv \
  --output-dir .

Alignment ambiguity

It is quite common that one query sequence can be aligned to multiple reference genomes. In such cases, Woltka by default counts each gOTU as 1 / k, where k is the total number of matching genomes.

Alternatively, one may choose to discard all non-unique matches, by adding a flag:

woltka gotu --uniq ...

Custom alignment

Technically, one can use any sequence aligners and reference genome databases to generate alignment files which can then be converted into a gOTU table. We cannot validate the goodness of outcome, but understand that you may have this intention considering the consistency with existing parts of your analytical pipeline. For examples:

bwa mem refseq.fna input.R1.fq input.R2.fq > output.sam
blastn -db refseq_genomes -query input.fa -max_target_seqs 1 -outfmt 6 -out output.txt

However, most of these protocols generate mappings of reads to nucleotides (e.g., chromosomes or scaffolds), rather than to genomes. In order to produce gOTUs, one needs to supply Woltka with a nucleotide-to-genome mapping file (nucl2g.txt, example provided under taxonomy/nucl):

woltka gotu --map nucl2g.txt ...