Skip to content

This is a Nextflow pipeline for generating sequencing reports for the SNP&Seq Technology platform, NGI Uppsala, SciLifelab Genomics.

Notifications You must be signed in to change notification settings

Molmed/seqreports

Repository files navigation

seqreports: SNP&Seq Run folder QC pipeline

This is a Nextflow pipeline for generating sequencing reports for the SNP&Seq Technology platform, NGI Uppsala, SciLifelab Genomics.

Pre-requisites

You need to:

Optional:

  • (currently mandatory: see known issues) Download the fastq-screen database by downloading fastq-screen from here, extract the archive and then run fastq_screen --get_genomes.

How to run the nextflow pipeline

Awesome, you're all set! Let's try generating reports for your favourite runfolder:

        # Using parameters supplied in a config (see below)
        nextflow run -c custom.config -profile snpseq,singularity main.nf

        # Using parameters supplied on the command line
        nextflow run -profile snpseq,singularity main.nf \
            --run_folder '/path/to/runfolder' \
            --fastqscreen_databases '/path/to/databases' \
            --checkqc_config '/path/to/checkqc.config'

Available profiles

These are the primary config profiles:

  • dev: Run locally with low memory.
  • uppmax: Uppmax slurm profile for use on the cluster at Uppmax (note: The parameter params.project must be supplied).
  • snpseq: Run locally with greater memory available than dev.
  • singularity: Enables singularity and provides container URLs.
  • test: Run the pipeline using test data

Additional profiles:

  • debug: prints out the env properties before executing processes.

Supplying a custom config file

Custom config files can contain all command line parameters, nextflow parameters, and overriding options.

For example:

resume = true
params.run_folder = '/path/to/runfolder'
params.fastqscreen_databases = '/path/to/databases'
params.checkqc_config = '/path/to/checkqc.config'
workDir = '/path/to/temporary/storage/space'

Contributing and Development

To contribute to this repo, fork it and make a pull request to the main branch.

Tests are run through GitHub Actions when pushing code to the repo. See instructions below on how to reproduce it locally.

To keep the python parts of the project nice and tidy, we enforce that code should be formatted according to black. To re-format your code with black, simply run:

black .

Running tests locally

Assuming you have installed all pre-requisites (except the fastq screen database: test data comes with a minimal version of it), you can run tests locally by following these steps:

# create virtual environment 
virtualenv -p python3.9 venv/   

# activate venv
source venv/bin/activate

# install dependencies
pip install -r requirements-dev.txt

# run tests
pytest tests/

# perform black formatter check
black --check .

Known issues:

  • Unable to download genome indicies using fastq_screen --get_genomes as wget within the container does not resolve the address correctly. Fastq Screen must be installed separately (e.g. with conda) and the genomes downloaded prior to running the workflow. The path to the databases must then be given using the params.fastqscreen_databases parameter.

About

This is a Nextflow pipeline for generating sequencing reports for the SNP&Seq Technology platform, NGI Uppsala, SciLifelab Genomics.

Resources

Stars

Watchers

Forks

Packages

No packages published