Skip to content

Commit

Permalink
Merge branch 'dev' into consensus-pr
Browse files Browse the repository at this point in the history
  • Loading branch information
d4straub authored Dec 20, 2024
2 parents 11e8c7e + 55164ab commit d267dd5
Show file tree
Hide file tree
Showing 9 changed files with 36 additions and 17 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#798](https://github.com/nf-core/ampliseq/pull/798) - Added SILVA version 138.2 of DADA2 taxonomy database: `silva=138.2` or `silva` as parameter to `--dada2_ref_taxonomy`
- [#804](https://github.com/nf-core/ampliseq/pull/804) - Added version 10 of Unite as parameter for `--dada_ref_taxonomy` (issue [#768](https://github.com/nf-core/ampliseq/issues/768))
- [#803](https://github.com/nf-core/ampliseq/pull/803) - New parameters introduced related to `--mergepairs_strategy`. These parameters would only be effective if `--mergepairs_strategy consensus` is set.

Expand All @@ -18,15 +19,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| **mergepairs_consensus_minoverlap** | The minimum number of overlapping base pairs required to merge forward and reverse reads. | 12 |
| **mergepairs_consensus_maxmismatch** | The maximum number of mismatches allowed within the overlapping region for merging reads. | 0 |
| **mergepairs_consensus_percentile_cutoff** | The percentile cutoff determining the minimum observed overlap in the dataset. | 0.001 |
=======

### `Changed`

- [#803](https://github.com/nf-core/ampliseq/pull/803) - Changed DADA2_DENOISING : `--concatenate_reads` renaming to `--mergepairs_strategy` ; support new method named "consensus" by setting `--mergepairs_strategy consensus` ; changed options of `--mergepairs_strategy` from TRUE/FALSE (boolean) to ["merge", "concatenate", "consensus"].
- [#818](https://github.com/nf-core/ampliseq/pull/818) - Provide users the ability to not bump stack size in vsearch clustering.

### `Fixed`

- [#800](https://github.com/nf-core/ampliseq/pull/800) - Fixed SH files for UNITE9.0, they were missing some entries due to a bug caused by API update in PlutoF
- [#808](https://github.com/nf-core/ampliseq/pull/808) - Add missing library declaration in R script.

### `Dependencies`

Expand Down
2 changes: 1 addition & 1 deletion bin/taxref_reformat_standard.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
gunzip -c *train*gz > assignTaxonomy.fna

# and the file for add species, identified by containing "species" in the name, is renamed
mv *species*gz addSpecies.fna.gz
mv *assign*gz addSpecies.fna.gz
13 changes: 10 additions & 3 deletions conf/ref_databases.config
Original file line number Diff line number Diff line change
Expand Up @@ -178,11 +178,18 @@ params {
taxlevels = "Domain,Kingdom,Phylum,Class,Order,Family,Genus,Species"
}
'silva' {
title = "Silva 138.1 prokaryotic SSU"
file = [ "https://zenodo.org/record/4587955/files/silva_nr99_v138.1_wSpecies_train_set.fa.gz", "https://zenodo.org/record/4587955/files/silva_species_assignment_v138.1.fa.gz" ]
title = "Silva 138.2 prokaryotic SSU"
file = [ "https://zenodo.org/records/14169026/files/silva_nr99_v138.2_toSpecies_trainset.fa.gz", "https://zenodo.org/records/14169026/files/silva_v138.2_assignSpecies.fa.gz" ]
citation = "Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112."
fmtscript = "taxref_reformat_standard.sh"
dbversion = "SILVA v138.1 (https://zenodo.org/record/4587955)"
dbversion = "SILVA v138.2 (https://zenodo.org/records/14169026)"
}
'silva=138.2' {
title = "Silva 138.2 prokaryotic SSU"
file = [ "https://zenodo.org/records/14169026/files/silva_nr99_v138.2_toSpecies_trainset.fa.gz", "https://zenodo.org/records/14169026/files/silva_v138.2_assignSpecies.fa.gz" ]
citation = "Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112."
fmtscript = "taxref_reformat_standard.sh"
dbversion = "SILVA v138.2 (https://zenodo.org/records/14169026)"
}
'silva=138' {
title = "Silva 138.1 prokaryotic SSU"
Expand Down
12 changes: 6 additions & 6 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,19 +214,19 @@ Please note the following additional requirements:

Taxonomic classification of ASVs can be performed with tools DADA2, SINTAX, Kraken2 or QIIME2. Multiple taxonomic reference databases are pre-configured for those tools, but user supplied databases are also supported for some tools. Alternatively (or in addition), phylogenetic placement can be used to extract taxonomic classifications.

In case multiple tools for taxonomic classification are executed in one pipeline run, only the taxonomic classification result of one tool is forwarded to downstream analysis with QIIME2. The priority is `phylogenetic placement` > `DADA2` > `SINTAX` > `Kraken2` > `QIIME2`.
In case multiple tools for taxonomic classification are executed in one pipeline run, only the taxonomic classification result of one tool is forwarded to downstream analysis with QIIME2. The priority is `phylogenetic placement` > `DADA2` > `SINTAX` > `Kraken2` > `QIIME2`, that is by no means a recommendation for a specific tool but a technical limitation.

Default setting for taxonomic classification is DADA2 with the SILVA reference taxonomy database.

Pre-configured reference taxonomy databases are:

| Database key | DADA2 | SINTAX | Kraken2 | QIIME2 | Target genes |
| ------------ | ----- | ------ | ------- | ------ | --------------------------------------------- |
| silva | + | - | + | + | 16S rRNA |
| gtdb | +¹ | - | - | - | 16S rRNA |
| silva | +¹ | - | + | + | 16S rRNA |
| gtdb | +² | - | - | - | 16S rRNA |
| sbdi-gtdb | + | - | - | - | 16S rRNA |
| rdp | + | - | + | - | 16S rRNA |
| greengenes | - | - | + | (+)² | 16S rRNA |
| greengenes | - | - | + | (+)³ | 16S rRNA |
| greengenes2 | - | - | - | + | 16S rRNA |
| pr2 | + | - | - | - | 18S rRNA |
| unite-fungi | + | + | - | - | eukaryotic nuclear ribosomal ITS region |
Expand All @@ -235,9 +235,9 @@ Pre-configured reference taxonomy databases are:
| midori2-co1 | + | - | - | - | eukaryotic Cytochrome Oxidase I (COI) |
| phytoref | + | - | - | - | eukaryotic plastid 16S rRNA |
| zehr-nifh | + | - | - | - | Nitrogenase iron protein NifH |
| standard | - | - | + | - | any in genomes of archaea, bacteria, viruses³ |
| standard | - | - | + | - | any in genomes of archaea, bacteria, viruses |

¹[`--dada_taxonomy_rc`](https://nf-co.re/ampliseq/parameters#dada_taxonomy_rc) is recommended; ²: de-replicated at 85%, only for testing purposes; ³: quality of results might vary
¹: As of Silva version 138 optimized for classification of Bacteria and Archaea, not suitable for Eukaryotes; ²[`--dada_taxonomy_rc`](https://nf-co.re/ampliseq/parameters#dada_taxonomy_rc) is recommended; ³: de-replicated at 85%, only for testing purposes; : quality of results might vary

Special features of taxonomic classification tools:

Expand Down
3 changes: 2 additions & 1 deletion modules/local/filter_clusters.nf
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,9 @@ process FILTER_CLUSTERS {
script:
def prefix = task.ext.prefix ?: "'$meta.id'"
def clusters = "'$clusters'"
def ulimiter = params.raise_filter_stacksize ? "ulimit -s unlimited" : ""
"""
ulimit -s unlimited
${ulimiter}
echo ${clusters} | filt_clusters.py -t ${asv} -p ${prefix} -c -
cat <<-END_VERSIONS > versions.yml
Expand Down
2 changes: 1 addition & 1 deletion modules/local/format_pplacetax.nf
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ process FORMAT_PPLACETAX {
# then, check if reduced entries are identical > if yes, choose any row, if no, repeat
# at that step the taxonomies have same length
print( paste ( asvid,"enters STEP 3" ) )
list_taxonpath <- str_split( temp\$taxopath, ";")
list_taxonpath <- stringr::str_split( temp\$taxopath, ";")
df_taxonpath <- as.data.frame(do.call(rbind, list_taxonpath))
for (i in ncol(df_taxonpath):0) {
# choose first column and change taxon to reduced overlap
Expand Down
3 changes: 2 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ params {
ancom_sample_min_count = 1
vsearch_cluster = null
vsearch_cluster_id = 0.97
raise_filter_stacksize = true
ancom = false
ancombc = false
ancombc_effect_size = 1
Expand Down Expand Up @@ -115,7 +116,7 @@ params {
skip_report = false

// Database options
dada_ref_taxonomy = "silva=138"
dada_ref_taxonomy = "silva=138.2"
dada_assign_taxlevels = null
dada_ref_tax_custom = null
dada_ref_tax_custom_sp = null
Expand Down
12 changes: 10 additions & 2 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,13 @@
"description": "Pairwise Identity value used when post-clustering ASVs if `--vsearch_cluster` option is used (default: 0.97).",
"help_text": "Lowering or increasing this value can change the number ASVs left over after clustering."
},
"raise_filter_stacksize": {
"type": "boolean",
"default": true,
"fa_icon": "fas fa-angle-double-up",
"description": "Raise stack size when filtering VSEARCH clusters",
"help_text": "Setting to true adds 'ulimit -s unlimited' to the beginning of the filt_clusters.py command."
},
"filter_ssu": {
"type": "string",
"description": "Enable SSU filtering. Comma separated list of kingdoms (domains) in Barrnap, a combination (or one) of \"bac\", \"arc\", \"mito\", and \"euk\". ASVs that have their lowest evalue in that kingdoms are kept.",
Expand Down Expand Up @@ -400,7 +407,7 @@
"type": "string",
"help_text": "Choose any of the supported databases, and optionally also specify the version. Database and version are separated by an equal sign (`=`, e.g. `silva=138`) . This will download the desired database, format it to produce a file that is compatible with DADA2's assignTaxonomy and another file that is compatible with DADA2's addSpecies.\n\nThe following databases are supported:\n- GTDB - Genome Taxonomy Database - 16S rRNA\n- SBDI-GTDB, a Sativa-vetted version of the GTDB 16S rRNA\n- PR2 - Protist Reference Ribosomal Database - 18S rRNA\n- RDP - Ribosomal Database Project - 16S rRNA\n- SILVA ribosomal RNA gene database project - 16S rRNA\n- UNITE - eukaryotic nuclear ribosomal ITS region - ITS\n- COIDB - eukaryotic Cytochrome Oxidase I (COI) from The Barcode of Life Data System (BOLD) - COI\n\nGenerally, using `gtdb`, `pr2`, `rdp`, `sbdi-gtdb`, `silva`, `coidb`, `unite-fungi`, or `unite-alleuk` will select the most recent supported version.\n\nPlease note that commercial/non-academic entities [require licensing](https://www.arb-silva.de/silva-license-information) for SILVA v132 database (non-default) but not from v138 on (default).",
"description": "Name of supported database, and optionally also version number",
"default": "silva=138",
"default": "silva=138.2",
"enum": [
"coidb",
"coidb=221216",
Expand All @@ -425,8 +432,9 @@
"sbdi-gtdb=R06-RS202-3",
"sbdi-gtdb=R06-RS202-1",
"silva",
"silva=132",
"silva=138.2",
"silva=138",
"silva=132",
"unite-alleuk",
"unite-alleuk=10.0",
"unite-alleuk=9.0",
Expand Down
2 changes: 1 addition & 1 deletion subworkflows/local/utils_nfcore_ampliseq_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ def validateInputParameters() {
"pr2","pr2=5.0.0","pr2=4.14.0","pr2=4.13.0",
"rdp","rdp=18",
"sbdi-gtdb","sbdi-gtdb=R09-RS220-1","sbdi-gtdb=R08-RS214-1","sbdi-gtdb=R07-RS207-1",
"silva","silva=138","silva=132",
"silva","silva=138.2","silva=138","silva=132",
"unite-fungi","unite-fungi=10.0","unite-fungi=9.0","unite-fungi=8.3","unite-fungi=8.2",
"unite-alleuk","unite-alleuk=10.0","unite-alleuk=9.0","unite-alleuk=8.3","unite-alleuk=8.2"
]
Expand Down

0 comments on commit d267dd5

Please sign in to comment.