You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in slack, the classification step (in particular DADA2_ADDSPECIES) use excessive amounts of RAM in cases when there are lots of ASVs generated, e.g. in the order of 60-100 000 ASVs. A suggested solution is to run the classification step in batches by splitting the ASV_seqs.fasta file, and then merge the resulting taxonomy files. It would be great if it would be possible to implemented that in the pipeline instead of having to run it manually.
The text was updated successfully, but these errors were encountered:
Thanks!
Chunking can be done as described in https://nextflow-io.github.io/patterns/process-per-file-chunk/ and files can be collected afterwards and merged again. Should not be too complicated to implement. One parameter for the chunk size might be good so that one can split to the desired size and have by default probably 10k or so.
[unfortunately I dont see currently a time window where I could do that]
Description of feature
As discussed in slack, the classification step (in particular DADA2_ADDSPECIES) use excessive amounts of RAM in cases when there are lots of ASVs generated, e.g. in the order of 60-100 000 ASVs. A suggested solution is to run the classification step in batches by splitting the ASV_seqs.fasta file, and then merge the resulting taxonomy files. It would be great if it would be possible to implemented that in the pipeline instead of having to run it manually.
The text was updated successfully, but these errors were encountered: