This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
The pipeline is built using Nextflow and processes data using the following steps:
- FastQC - Read quality control
- metaSPAdes - Assembly metagenome and building graph
The first two steps are required only if the reads are used
- Pathracer - Aligning HMM profile to graph
- MMseqs2 - Detection and clustering ORFs
- Abricate - Annotation representative sequences
- RGI - Annotation representative sequences
- sraX - Annotation representative sequences
- hAMRonization - Summarizing results
- Pipeline information - Report metrics generated during the workflow execution
FastQC gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences.
For further reading and documentation see the FastQC help pages.
Output files:
fastqc/
*_fastqc.html
: FastQC report containing quality metrics for your untrimmed raw fastq files.
fastqc/zips/
*_fastqc.zip
: Zip archive containing the FastQC report, tab-delimited data file and plot images.
metaSPAdes is an de Bruijn graph-based assembly tool.
Output files:
spades/
*.assembly.gfa
: SPAdes assembly graph and scaffolds paths in GFA 1.0 format*.contigs.fa
: resulting contigs*.scaffolds.fa
: resulting scaffolds*.spades.log
: SPAdes log
Pathracer is a novel standalone tool that aligns profile HMM directly to the assembly graph. The tool provides the set of most probable paths traversed by a HMM through the whole assembly graph, regardless whether the sequence of interested is encoded on the single contig or scattered across the set of edges, therefore significantly improving the recovery of sequences of interest even from fragmented metagenome assemblies.
Output files:
pathracer/
*.all.edges.fa
: unique edge paths for all pHMMs in one file*.pathracer.log
: log file
MMseqs2 is a software suite to search and cluster huge protein and nucleotide sequence sets. The extractorfs
module uses to detect all open reading frames (ORFs) on all six frames. For clustering easy-linclust
module are used.
Output files:
orfs/
*.all_orfs.fasta
: all open reading frames (ORFs) on all six frames in one file*.orfs_rep_seq.fasta
: representative sequences
Abricate is used for mass screening of contigs for antimicrobial resistance or virulence genes. Please see the Abricate docs for more detailed information regarding the output files.
Output files:
abricate/
*.rep_seq.tsv
: a tap-separated output file with resultsall.summary.tsv
: representative sequences
RGI is used to predict resistome(s) from protein or nucleotide data based on homology and SNP models. The application uses reference data from the Comprehensive Antibiotic Resistance Database (CARD). Please see the RGI docs for more detailed information regarding the output files.
Output files:
rgi/
*.json
: json format file with results*.txt
: a tap-separated output file with results*.png
: a heat map from pre-compiled RGI main JSON files, samples and AMR genes organized alphabetically
sraX is used to systematically detect the presence of AMR determinants and, ultimately, describe the repertoire of antibiotic resistance genes (ARGs) within a collection of genomes (the “resistome” analysis).
Output files:
rgi/
Results/
: directory containing HTML report, plots and summary files
hAMRonization is used to combine and summarize the results.
Output files:
hamronize/
*_abricate_hamronized.tsv
: a tab-separated abricate report*_rgi_hamronized.tsv
: a tab-separated rgi reportamr_summary.tsv
: general reportamr_summary.html
: general HTML report
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.
Output files:
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.csv
. - Documentation for interpretation of results in HTML format:
results_description.html
.
- Reports generated by Nextflow: