famosab/wrrocmetatest is a bioinformatics pipeline that runs a minimal working example (fastp + megahit) on a small metagenome data set containing simulated Illumina reads from 15 microbial genomes to test the nf-prov plugin for metagenomics.
The application of this pipeline was to be used as exemplary Nextflow pipeline for the development and testing of the nf-prov Nextflow plugin. More information can be found in the BioHackathon Europe report.
A small metagenome data set containing simulated Illumina reads from 15 microbial genomes
- https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/denbi-mg-course/read1.fq.gz
- https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/denbi-mg-course/read2.fq.gz
To run the pipeline locally on the testdata you need to download the fq files and create the following samplesheet
testsheet.csv
:
sample,fastq_1,fastq_2
test,<path/to/>read1.fq.gz,<path/to/>read2.fq.gz
And add the following config file
testdata.config
:
process{
withName: FASTP {
cpus = 8
memory = 16.GB
}
withName: MEGAHIT {
cpus = 8
memory = 16.GB
ext.args = { "--k-min 51 --k-max 71 --k-step 20" }
}
}
// add the following lines to run it with nf-prov
plugins {
id '[email protected]'
}
prov {
enabled = true
formats {
wrroc {
file = "${params.outdir}/ro-crate-metadata.json"
overwrite = true
agent {
name = "John Doe"
orcid = "https://orcid.org/0000-0000-0000-0000"
}
license = "https://spdx.org/licenses/MIT"
profile = "provenance_run_crate"
}
}
}
Then you can run the workflow with the following command
nextflow run <path/to/>wrrocmetatest/main.nf -profile docker --input <path/to/>testsheet.csv --outdir results -c <path/to/>testdata.config
Depending on your resources this test run takes around 5 minutes.
Clone the repository with the current working version to your local machine. In our case this is famosab/nf-prov. Checkout the relevant branch, here workflow-run-crate
. Then run make install
. This will add nf-prov
to .nextflow/plugins/
from which it can be used with any pipeline.
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv
:
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
Each row represents a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run famosab/wrroc-meta-test \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file
option. Custom config files including those provided by the -c
Nextflow option can be used to provide any configuration except for parameters; see docs.
famosab/wrroc-meta-test was originally written by Famke Bäuerle, Tom Tubbesing, Keiler Collier, Matt Burridge, Benedikt Osterholz, Alex Sczyrba, Neil Wipat and Sandy Rogers during the BioHackathon Europe 2024.
If you would like to contribute to this pipeline, please see the contributing guidelines.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.