Skip to content
Karin Lagesen edited this page Oct 15, 2018 · 11 revisions

Disclaimer

This is pre-publication software that is currently under active development. Use it at your own risk. Bug reports are welcome, but a user cannot depend on getting support at this time.

The Bifrost genomic epidemiology pipeline

Pipeline for analyzing genomic read sets for public and animal health purposes.

Author: Karin Lagesen, @karinlag

Contact information: please submit an issue, and the author will get back to you.


Synopsis

This software uses the Nextflow.io workflow system to run various analyses appropriate for genomic epidemiology and comparative microbiology purposes. The Nextflow system allows for running the same pipeline on a local computer and on a cluster without changing the code.

Installation

For installation, see the installation pages. Please note: this software has at the time of writing (August 2018) not been tested on any other systems than Ubuntu and on the University of Oslo/Abel cluster (i.e. under slurm).

How to run

For details on how to run, see the Run pages. The pipeline consists of a run script which enables the running of several different scripts. Each script consists of several different tools which result in an analysis. For each script, a nextflow script, a template config file and a template profile file is provided. For each compute system, the profile file needs to be adjusted to ensure that the software used is available. The easiest way to do that is to create a conda environment with the required software. Input the location to that in the appropriate conda config files, and you should be good to go.
Once this is done, that profile file should not need modification. For each run, the template config script should be modified to specify specific things for that run, such as input data, species, databases needed, options to software, etc.

Current capabilities

The pipeline has been developed as a series of scripts, where each script has a specific input and a set of logically connected analyses. Each script comes with its own nextflow script and a separate config file, which is used to specify inputs and software options for that specific run.

The current pipeline contains the following scripts:

  • qc_track.nf: Basic QC
    • Fastqc is run on all input files, followed by multiqc, which aggregates the results.
  • specific_gene.nf: MLST, virulence and AMR annotation
    • The software ARIBA is used to annotate MLST, virulence and AMR directly from reads. This script can be used to run all three at once, or just one or two of them.
  • asm_annot.nf: Assembly and annotation
    • This script first runs through fastqc and multiqc, before stripping PhiX using bbduk, trimming with trimmomatic, assembly with SPAdes, assembly polishing with pilon, evaluating assemblies with QUAST, before annoating with prokka.

Planned features:

The following features are planned for future releases:

  • Species identification
  • SNP tree analyses, probably both with parsnp and kSNP
  • Pan-genome analysis, probably using ROARY

For University of Oslo Abel users

This software is already available at the UiO Abel cluster. Please see the University of Oslo Abel pages for how to run the software on the cluster.