Merge branch 'dev' into consensus-pr

nf-core · Dec 20, 2024 · d267dd5 · d267dd5
2 parents 11e8c7e + 55164ab
commit d267dd5
Show file tree

Hide file tree

Showing 9 changed files with 36 additions and 17 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### `Added`
 
+- [#798](https://github.com/nf-core/ampliseq/pull/798) - Added SILVA version 138.2 of DADA2 taxonomy database: `silva=138.2` or `silva` as parameter to `--dada2_ref_taxonomy`
 - [#804](https://github.com/nf-core/ampliseq/pull/804) - Added version 10 of Unite as parameter for `--dada_ref_taxonomy` (issue [#768](https://github.com/nf-core/ampliseq/issues/768))
 - [#803](https://github.com/nf-core/ampliseq/pull/803) - New parameters introduced related to `--mergepairs_strategy`. These parameters would only be effective if `--mergepairs_strategy consensus` is set.
 
@@ -18,15 +19,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 | **mergepairs_consensus_minoverlap**        | The minimum number of overlapping base pairs required to merge forward and reverse reads. | 12                |
 | **mergepairs_consensus_maxmismatch**       | The maximum number of mismatches allowed within the overlapping region for merging reads. | 0                 |
 | **mergepairs_consensus_percentile_cutoff** | The percentile cutoff determining the minimum observed overlap in the dataset.            | 0.001             |
-=======
 
 ### `Changed`
 
 - [#803](https://github.com/nf-core/ampliseq/pull/803) - Changed DADA2_DENOISING : `--concatenate_reads` renaming to `--mergepairs_strategy` ; support new method named "consensus" by setting `--mergepairs_strategy consensus` ; changed options of `--mergepairs_strategy` from TRUE/FALSE (boolean) to ["merge", "concatenate", "consensus"].
+- [#818](https://github.com/nf-core/ampliseq/pull/818) - Provide users the ability to not bump stack size in vsearch clustering.
 
 ### `Fixed`
 
 - [#800](https://github.com/nf-core/ampliseq/pull/800) - Fixed SH files for UNITE9.0, they were missing some entries due to a bug caused by API update in PlutoF
+- [#808](https://github.com/nf-core/ampliseq/pull/808) - Add missing library declaration in R script.
 
 ### `Dependencies`
 

diff --git a/bin/taxref_reformat_standard.sh b/bin/taxref_reformat_standard.sh
@@ -5,4 +5,4 @@
 gunzip -c *train*gz > assignTaxonomy.fna
 
 # and the file for add species, identified by containing "species" in the name, is renamed
-mv *species*gz addSpecies.fna.gz
+mv *assign*gz addSpecies.fna.gz
diff --git a/conf/ref_databases.config b/conf/ref_databases.config
@@ -178,11 +178,18 @@ params {
             taxlevels = "Domain,Kingdom,Phylum,Class,Order,Family,Genus,Species"
         }
         'silva' {
-            title = "Silva 138.1 prokaryotic SSU"
-            file = [ "https://zenodo.org/record/4587955/files/silva_nr99_v138.1_wSpecies_train_set.fa.gz", "https://zenodo.org/record/4587955/files/silva_species_assignment_v138.1.fa.gz" ]
+            title = "Silva 138.2 prokaryotic SSU"
+            file = [ "https://zenodo.org/records/14169026/files/silva_nr99_v138.2_toSpecies_trainset.fa.gz", "https://zenodo.org/records/14169026/files/silva_v138.2_assignSpecies.fa.gz" ]
             citation = "Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112."
             fmtscript = "taxref_reformat_standard.sh"
-            dbversion = "SILVA v138.1 (https://zenodo.org/record/4587955)"
+            dbversion = "SILVA v138.2 (https://zenodo.org/records/14169026)"
+        }
+        'silva=138.2' {
+            title = "Silva 138.2 prokaryotic SSU"
+            file = [ "https://zenodo.org/records/14169026/files/silva_nr99_v138.2_toSpecies_trainset.fa.gz", "https://zenodo.org/records/14169026/files/silva_v138.2_assignSpecies.fa.gz" ]
+            citation = "Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112."
+            fmtscript = "taxref_reformat_standard.sh"
+            dbversion = "SILVA v138.2 (https://zenodo.org/records/14169026)"
         }
         'silva=138' {
             title = "Silva 138.1 prokaryotic SSU"

diff --git a/docs/usage.md b/docs/usage.md
@@ -214,19 +214,19 @@ Please note the following additional requirements:
 
 Taxonomic classification of ASVs can be performed with tools DADA2, SINTAX, Kraken2 or QIIME2. Multiple taxonomic reference databases are pre-configured for those tools, but user supplied databases are also supported for some tools. Alternatively (or in addition), phylogenetic placement can be used to extract taxonomic classifications.
 
-In case multiple tools for taxonomic classification are executed in one pipeline run, only the taxonomic classification result of one tool is forwarded to downstream analysis with QIIME2. The priority is `phylogenetic placement` > `DADA2` > `SINTAX` > `Kraken2` > `QIIME2`.
+In case multiple tools for taxonomic classification are executed in one pipeline run, only the taxonomic classification result of one tool is forwarded to downstream analysis with QIIME2. The priority is `phylogenetic placement` > `DADA2` > `SINTAX` > `Kraken2` > `QIIME2`, that is by no means a recommendation for a specific tool but a technical limitation.
 
 Default setting for taxonomic classification is DADA2 with the SILVA reference taxonomy database.
 
 Pre-configured reference taxonomy databases are:
 
 | Database key | DADA2 | SINTAX | Kraken2 | QIIME2 | Target genes                                  |
 | ------------ | ----- | ------ | ------- | ------ | --------------------------------------------- |
-| silva        | +     | -      | +       | +      | 16S rRNA                                      |
-| gtdb         | +¹    | -      | -       | -      | 16S rRNA                                      |
+| silva        | +¹    | -      | +       | +      | 16S rRNA                                      |
+| gtdb         | +²    | -      | -       | -      | 16S rRNA                                      |
 | sbdi-gtdb    | +     | -      | -       | -      | 16S rRNA                                      |
 | rdp          | +     | -      | +       | -      | 16S rRNA                                      |
-| greengenes   | -     | -      | +       | (+)²   | 16S rRNA                                      |
+| greengenes   | -     | -      | +       | (+)³   | 16S rRNA                                      |
 | greengenes2  | -     | -      | -       | +      | 16S rRNA                                      |
 | pr2          | +     | -      | -       | -      | 18S rRNA                                      |
 | unite-fungi  | +     | +      | -       | -      | eukaryotic nuclear ribosomal ITS region       |
@@ -235,9 +235,9 @@ Pre-configured reference taxonomy databases are:
 | midori2-co1  | +     | -      | -       | -      | eukaryotic Cytochrome Oxidase I (COI)         |
 | phytoref     | +     | -      | -       | -      | eukaryotic plastid 16S rRNA                   |
 | zehr-nifh    | +     | -      | -       | -      | Nitrogenase iron protein NifH                 |
-| standard     | -     | -      | +       | -      | any in genomes of archaea, bacteria, viruses³ |
+| standard     | -     | -      | +       | -      | any in genomes of archaea, bacteria, viruses⁴ |
 
-¹[`--dada_taxonomy_rc`](https://nf-co.re/ampliseq/parameters#dada_taxonomy_rc) is recommended; ²: de-replicated at 85%, only for testing purposes; ³: quality of results might vary
+¹: As of Silva version 138 optimized for classification of Bacteria and Archaea, not suitable for Eukaryotes; ²[`--dada_taxonomy_rc`](https://nf-co.re/ampliseq/parameters#dada_taxonomy_rc) is recommended; ³: de-replicated at 85%, only for testing purposes; ⁴: quality of results might vary
 
 Special features of taxonomic classification tools:
 

diff --git a/modules/local/filter_clusters.nf b/modules/local/filter_clusters.nf
@@ -23,8 +23,9 @@ process FILTER_CLUSTERS {
     script:
     def prefix   = task.ext.prefix ?: "'$meta.id'"
     def clusters = "'$clusters'"
+    def ulimiter = params.raise_filter_stacksize ? "ulimit -s unlimited" : ""
     """
-    ulimit -s unlimited
+    ${ulimiter}
     echo ${clusters} | filt_clusters.py -t ${asv} -p ${prefix} -c -
 
     cat <<-END_VERSIONS > versions.yml

diff --git a/modules/local/format_pplacetax.nf b/modules/local/format_pplacetax.nf
@@ -59,7 +59,7 @@ process FORMAT_PPLACETAX {
                 # then, check if reduced entries are identical > if yes, choose any row, if no, repeat
                 # at that step the taxonomies have same length
                 print( paste ( asvid,"enters STEP 3" ) )
-                list_taxonpath <- str_split( temp\$taxopath, ";")
+                list_taxonpath <- stringr::str_split( temp\$taxopath, ";")
                 df_taxonpath <- as.data.frame(do.call(rbind, list_taxonpath))
                 for (i in ncol(df_taxonpath):0) {
                     # choose first column and change taxon to reduced overlap

diff --git a/nextflow.config b/nextflow.config
@@ -82,6 +82,7 @@ params {
     ancom_sample_min_count                    = 1
     vsearch_cluster                           = null
     vsearch_cluster_id                        = 0.97
+    raise_filter_stacksize                    = true
     ancom                                     = false
     ancombc                                   = false
     ancombc_effect_size                       = 1
@@ -115,7 +116,7 @@ params {
     skip_report            = false
 
     // Database options
-    dada_ref_taxonomy        = "silva=138"
+    dada_ref_taxonomy        = "silva=138.2"
     dada_assign_taxlevels    = null
     dada_ref_tax_custom      = null
     dada_ref_tax_custom_sp   = null

diff --git a/nextflow_schema.json b/nextflow_schema.json
@@ -335,6 +335,13 @@
                     "description": "Pairwise Identity value used when post-clustering ASVs if `--vsearch_cluster` option is used (default: 0.97).",
                     "help_text": "Lowering or increasing this value can change the number ASVs left over after clustering."
                 },
+                "raise_filter_stacksize": {
+                    "type": "boolean",
+                    "default": true,
+                    "fa_icon": "fas fa-angle-double-up",
+                    "description": "Raise stack size when filtering VSEARCH clusters",
+                    "help_text": "Setting to true adds 'ulimit -s unlimited' to the beginning of the filt_clusters.py command."
+                },
                 "filter_ssu": {
                     "type": "string",
                     "description": "Enable SSU filtering. Comma separated list of kingdoms (domains) in Barrnap, a combination (or one) of \"bac\", \"arc\", \"mito\", and \"euk\". ASVs that have their lowest evalue in that kingdoms are kept.",
@@ -400,7 +407,7 @@
                     "type": "string",
                     "help_text": "Choose any of the supported databases, and optionally also specify the version. Database and version are separated by an equal sign (`=`, e.g. `silva=138`) . This will download the desired database, format it to produce a file that is compatible with DADA2's assignTaxonomy and another file that is compatible with DADA2's addSpecies.\n\nThe following databases are supported:\n- GTDB - Genome Taxonomy Database - 16S rRNA\n- SBDI-GTDB, a Sativa-vetted version of the GTDB 16S rRNA\n- PR2 - Protist Reference Ribosomal Database - 18S rRNA\n- RDP - Ribosomal Database Project - 16S rRNA\n- SILVA ribosomal RNA gene database project - 16S rRNA\n- UNITE - eukaryotic nuclear ribosomal ITS region - ITS\n- COIDB - eukaryotic Cytochrome Oxidase I (COI) from The Barcode of Life Data System (BOLD) - COI\n\nGenerally, using `gtdb`, `pr2`, `rdp`, `sbdi-gtdb`, `silva`, `coidb`, `unite-fungi`, or `unite-alleuk` will select the most recent supported version.\n\nPlease note that commercial/non-academic entities [require licensing](https://www.arb-silva.de/silva-license-information) for SILVA v132 database (non-default) but not from v138 on (default).",
                     "description": "Name of supported database, and optionally also version number",
-                    "default": "silva=138",
+                    "default": "silva=138.2",
                     "enum": [
                         "coidb",
                         "coidb=221216",
@@ -425,8 +432,9 @@
                         "sbdi-gtdb=R06-RS202-3",
                         "sbdi-gtdb=R06-RS202-1",
                         "silva",
-                        "silva=132",
+                        "silva=138.2",
                         "silva=138",
+                        "silva=132",
                         "unite-alleuk",
                         "unite-alleuk=10.0",
                         "unite-alleuk=9.0",

diff --git a/subworkflows/local/utils_nfcore_ampliseq_pipeline/main.nf b/subworkflows/local/utils_nfcore_ampliseq_pipeline/main.nf
@@ -233,7 +233,7 @@ def validateInputParameters() {
         "pr2","pr2=5.0.0","pr2=4.14.0","pr2=4.13.0",
         "rdp","rdp=18",
         "sbdi-gtdb","sbdi-gtdb=R09-RS220-1","sbdi-gtdb=R08-RS214-1","sbdi-gtdb=R07-RS207-1",
-        "silva","silva=138","silva=132",
+        "silva","silva=138.2","silva=138","silva=132",
         "unite-fungi","unite-fungi=10.0","unite-fungi=9.0","unite-fungi=8.3","unite-fungi=8.2",
         "unite-alleuk","unite-alleuk=10.0","unite-alleuk=9.0","unite-alleuk=8.3","unite-alleuk=8.2"
     ]