Merge pull request #3 from CenterForMedicalGeneticsGhent/dev

Release PR 1.0.0
nf-cmgg · May 22, 2023 · ef940a3 · ef940a3
2 parents aec8029 + 8c7c2de
commit ef940a3
Show file tree

Hide file tree

Showing 41 changed files with 792 additions and 399 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -17,7 +17,7 @@ concurrency:
 
 jobs:
   test:
-    name: Run pipeline with test data
+    name: Run nf-test with Nextflow version ${{ matrix.NXF_VER }}
     # Only run on push if this is the nf-core dev branch (merged PRs)
     if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx') }}"
     runs-on: ubuntu-latest
@@ -35,9 +35,16 @@ jobs:
         with:
           version: "${{ matrix.NXF_VER }}"
 
+      - name: Install nf-test
+        run: |
+          conda install -c bioconda nf-test
+
       - name: Run pipeline with test data
-        # TODO nf-core: You can customise CI pipeline run tests as required
-        # For example: adding multiple test runs with different parameters
-        # Remember that you can parallelise this by using strategy.matrix
         run: |
-          nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
+          $CONDA/bin/nf-test test --junitxml=default.xml
+
+      - name: Publish Test Report
+        uses: mikepenz/action-junit-report@v3
+        if: always() # always run even if the previous step fails
+        with:
+          report_paths: "default.xml"
diff --git a/.github/workflows/clean-up.yml b/.github/workflows/clean-up.yml
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,5 @@ results/
 testing/
 testing*
 *.pyc
+.nf-test
+null
diff --git a/.nf-core.yml b/.nf-core.yml
@@ -1,15 +1,20 @@
 repository_type: pipeline
 lint:
   files_exist:
-  - CODE_OF_CONDUCT.md
-  - assets/nf-core-nf-cmgg-wisecondorx_logo_light.png
-  - docs/images/nf-core-nf-cmgg-wisecondorx_logo_light.png
-  - docs/images/nf-core-nf-cmgg-wisecondorx_logo_dark.png
-  - .github/ISSUE_TEMPLATE/config.yml
-  - .github/workflows/awstest.yml
-  - .github/workflows/awsfulltest.yml
+    - CODE_OF_CONDUCT.md
+    - assets/nf-core-nf-cmgg-wisecondorx_logo_light.png
+    - docs/images/nf-core-nf-cmgg-wisecondorx_logo_light.png
+    - docs/images/nf-core-nf-cmgg-wisecondorx_logo_dark.png
+    - .github/ISSUE_TEMPLATE/config.yml
+    - .github/workflows/awstest.yml
+    - .github/workflows/awsfulltest.yml
+    - lib/WorkflowNf-cmgg-wisecondorx.groovy
   nextflow_config:
-  - manifest.name
-  - manifest.homePage
+    - manifest.name
+    - manifest.homePage
   multiqc_config:
-  - report_comment
+    - report_comment
+  files_unchanged:
+    - .github/ISSUE_TEMPLATE/bug_report.yml
+  schema_params: false
+  pipeline_name_conventions: false
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,14 +3,6 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v1.0dev - [date]
+## v1.0.0 - Clumsy Apprentice - [22 May 2023]
 
 Initial release of CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx, created with the [nf-core](https://nf-co.re/) template.
-
-### `Added`
-
-### `Fixed`
-
-### `Dependencies`
-
-### `Deprecated`
diff --git a/README.md b/README.md
@@ -1,5 +1,3 @@
-[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
-
 [![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A522.10.1-23aa62.svg)](https://www.nextflow.io/)
 [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
 [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
@@ -8,20 +6,10 @@
 
 ## Introduction
 
-**CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx** is a bioinformatics pipeline that ...
-
-<!-- TODO nf-core:
-   Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
-   major pipeline sections and the types of output it produces. You're giving an overview to someone new
-   to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
--->
+**CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx** is a bioinformatics pipeline that creates a reference for WisecondorX
 
-<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
-     workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples.   -->
-<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
-
-1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
-2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
+1. Convert the alignment files to NPZ format (`WisecondorX convert`)
+2. Create a reference from all NPZ files (`WisecondorX newref`)
 
 ## Usage
 
@@ -30,26 +18,8 @@
 > to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline)
 > with `-profile test` before running the workflow on actual data.
 
-<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
-     Explain what rows and columns represent. For instance (please edit as appropriate):
-
-First, prepare a samplesheet with your input data that looks as follows:
-
-`samplesheet.csv`:
-
-```csv
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
-```
-
-Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
-
--->
-
 Now, you can run the pipeline using:
 
-<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
-
 ```bash
 nextflow run CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx \
    -profile <docker/singularity/.../institute> \
@@ -66,21 +36,12 @@ nextflow run CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx \
 
 CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx was originally written by nvnieuwk.
 
-We thank the following people for their extensive assistance in the development of this pipeline:
-
-<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
-
 ## Contributions and Support
 
 If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
 
 ## Citations
 
-<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
-<!-- If you use  CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
-
-<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
-
 An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
 
 This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

diff --git a/assets/methods_description_template.yml b/assets/methods_description_template.yml
@@ -3,11 +3,9 @@ description: "Suggested text and references to use when describing pipeline usag
 section_name: "CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx Methods Description"
 section_href: "https://github.com/CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx"
 plot_type: "html"
-## TODO nf-core: Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
-## You inject any metadata in the Nextflow '${workflow}' object
 data: |
   <h4>Methods</h4>
-  <p>Data was processed using CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>).</p>
+  <p>Data was processed using CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx v${workflow.manifest.version} ${doi_text}.
   <p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
   <pre><code>${workflow.commandLine}</code></pre>
   <h4>References</h4>

diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv
@@ -1,3 +1,3 @@
-sample,fastq_1,fastq_2
-SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz
-SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,
+cram,crai
+https://raw.githubusercontent.com/CenterForMedicalGeneticsGhent/nf-cmgg-test-datasets/main/data/genomics/homo_sapiens/illumina/cram/test.cram,,
+https://raw.githubusercontent.com/CenterForMedicalGeneticsGhent/nf-cmgg-test-datasets/main/data/genomics/homo_sapiens/illumina/cram/test2.cram,https://raw.githubusercontent.com/CenterForMedicalGeneticsGhent/nf-cmgg-test-datasets/main/data/genomics/homo_sapiens/illumina/cram/test2.cram.crai
diff --git a/assets/schema_input.json b/assets/schema_input.json
@@ -1,36 +1,20 @@
 {
     "$schema": "http://json-schema.org/draft-07/schema",
     "$id": "https://raw.githubusercontent.com/CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx/master/assets/schema_input.json",
-    "title": "CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx pipeline - params.input schema",
-    "description": "Schema for the file provided with params.input",
-    "type": "array",
-    "items": {
-        "type": "object",
-        "properties": {
-            "sample": {
-                "type": "string",
-                "pattern": "^\\S+$",
-                "errorMessage": "Sample name must be provided and cannot contain spaces"
-            },
-            "fastq_1": {
-                "type": "string",
-                "pattern": "^\\S+\\.f(ast)?q\\.gz$",
-                "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
-            },
-            "fastq_2": {
-                "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
-                "anyOf": [
-                    {
-                        "type": "string",
-                        "pattern": "^\\S+\\.f(ast)?q\\.gz$"
-                    },
-                    {
-                        "type": "string",
-                        "maxLength": 0
-                    }
-                ]
-            }
+    "title": "Samplesheet validation schema",
+    "description": "Schema for the samplesheet used in this pipeline",
+    "type": "object",
+    "properties": {
+        "cram": {
+            "type": "string",
+            "pattern": "^\\S+\\.(b|cr)am$",
+            "format": "file-path"
         },
-        "required": ["sample", "fastq_1"]
-    }
+        "crai": {
+            "type": "string",
+            "pattern": "^\\S+\\.(b|cr)ai$",
+            "format": "file-path"
+        }
+    },
+    "required": ["cram"]
 }
diff --git a/conf/base.config b/conf/base.config
@@ -10,7 +10,6 @@
 
 process {
 
-    // TODO nf-core: Check the defaults for all processes
     cpus   = { check_max( 1    * task.attempt, 'cpus'   ) }
     memory = { check_max( 6.GB * task.attempt, 'memory' ) }
     time   = { check_max( 4.h  * task.attempt, 'time'   ) }
@@ -19,13 +18,6 @@ process {
     maxRetries    = 1
     maxErrors     = '-1'
 
-    // Process-specific resource requirements
-    // NOTE - Please try and re-use the labels below as much as possible.
-    //        These labels are used and recognised by default in DSL2 files hosted on nf-core/modules.
-    //        If possible, it would be nice to keep the same label naming convention when
-    //        adding in your local modules too.
-    // TODO nf-core: Customise requirements for specific processes.
-    // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
     withLabel:process_single {
         cpus   = { check_max( 1                  , 'cpus'    ) }
         memory = { check_max( 6.GB * task.attempt, 'memory'  ) }

diff --git a/conf/modules.config b/conf/modules.config
@@ -13,29 +13,36 @@
 process {
 
     publishDir = [
-        path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
-        mode: params.publish_dir_mode,
-        saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        enabled: false
     ]
 
-    withName: SAMPLESHEET_CHECK {
+    withName: WISECONDORX_NEWREF {
         publishDir = [
-            path: { "${params.outdir}/pipeline_info" },
+            enabled: true,
+            path: { "${params.outdir}" },
             mode: params.publish_dir_mode,
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
         ]
     }
 
-    withName: FASTQC {
-        ext.args = '--quiet'
-    }
-
     withName: CUSTOM_DUMPSOFTWAREVERSIONS {
         publishDir = [
+            enabled: true,
             path: { "${params.outdir}/pipeline_info" },
             mode: params.publish_dir_mode,
             pattern: '*_versions.yml'
         ]
     }
 
+    withName: MULTIQC {
+        publishDir = [
+            overwrite: true,
+            path: { "${params.outdir}/multiqc_reports" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+        errorStrategy = {task.exitStatus == 143 ? 'retry' : 'ignore'}
+        ext.args      = { params.multiqc_config ? "--config $multiqc_custom_config" : "" }
+    }
+
 }
diff --git a/conf/nf_test.config b/conf/nf_test.config
@@ -0,0 +1,31 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+
+    Use as follows:
+        nextflow run CenterForMedicalGeneticsGhent/nf-cmgg-wisecondorx -profile test,<docker/singularity> --outdir <OUTDIR>
+
+----------------------------------------------------------------------------------------
+*/
+
+params {
+    config_profile_name        = 'Test profile'
+    config_profile_description = 'Minimal test dataset to check pipeline function'
+
+    // Limit resources so that this can run on GitHub Actions
+    max_cpus   = 2
+    max_memory = '6.GB'
+    max_time   = '6.h'
+
+    genomes_ignore = true
+
+    fasta           = params.test_data["homo_sapiens"]["genome"]["genome_fasta"]
+    fai             = params.test_data["homo_sapiens"]["genome"]["genome_fasta_fai"]
+
+    // Input data
+    input  = "${projectDir}/tests/inputs/samplesheet.csv"
+    outdir = "${params.outputDir}"
+
+}
diff --git a/conf/test.config b/conf/test.config
@@ -19,11 +19,12 @@ params {
     max_memory = '6.GB'
     max_time   = '6.h'
 
+    genomes_ignore = true
+
+    fasta           = params.test_data["homo_sapiens"]["genome"]["genome_fasta"]
+    fai             = null //params.test_data["homo_sapiens"]["genome"]["genome_fasta_fai"]
+
     // Input data
-    // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
-    // TODO nf-core: Give any required params for the test so that command line flags are not needed
-    input  = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'
+    input  = "${projectDir}/assets/samplesheet.csv"
 
-    // Genome references
-    genome = 'R64-1-1'
 }
diff --git a/conf/test_full.config b/conf/test_full.config
@@ -17,8 +17,6 @@ params {
     config_profile_description = 'Full test dataset to check pipeline function'
 
     // Input data for full size test
-    // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
-    // TODO nf-core: Give any required params for the test so that command line flags are not needed
     input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'
 
     // Genome references