Skip to content

Commit

Permalink
add Rasusa for subsampling ONT reads
Browse files Browse the repository at this point in the history
close #13
  • Loading branch information
pmenzel committed Jul 7, 2023
1 parent 315bde4 commit 8e03bd1
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 2 deletions.
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ See the preprint here: [Snakemake Workflows for Long-read Bacterial Genome Assem

| read filtering | assembly | long read polishing | short read polishing | reference-based polishing |
| --- | --- | --- | --- | --- |
| [Filtlong](https://github.com/rrwick/Filtlong) | [Flye](https://github.com/fenderglass/Flye)<br/> [raven](https://github.com/lbcb-sci/raven)<br/> [miniasm](https://github.com/lh3/miniasm)<br/> [Unicycler](https://github.com/rrwick/Unicycler)<br/> [Canu](https://github.com/marbl/canu) | [racon](https://github.com/lbcb-sci/racon)<br/> [medaka](https://github.com/nanoporetech/medaka) | [pilon](https://github.com/broadinstitute/pilon/wiki)<br/> [Polypolish](https://github.com/rrwick/Polypolish)<br/> [POLCA](https://github.com/alekseyzimin/masurca#polca) | [Homopolish](https://github.com/ythuang0522/homopolish)<br/> [proovframe](https://github.com/thackl/proovframe) |
| [Filtlong](https://github.com/rrwick/Filtlong)<br/> [Rasusa](https://github.com/mbhall88/rasusa) | [Flye](https://github.com/fenderglass/Flye)<br/> [raven](https://github.com/lbcb-sci/raven)<br/> [miniasm](https://github.com/lh3/miniasm)<br/> [Unicycler](https://github.com/rrwick/Unicycler)<br/> [Canu](https://github.com/marbl/canu) | [racon](https://github.com/lbcb-sci/racon)<br/> [medaka](https://github.com/nanoporetech/medaka) | [pilon](https://github.com/broadinstitute/pilon/wiki)<br/> [Polypolish](https://github.com/rrwick/Polypolish)<br/> [POLCA](https://github.com/alekseyzimin/masurca#polca) | [Homopolish](https://github.com/ythuang0522/homopolish)<br/> [proovframe](https://github.com/thackl/proovframe) |


## Quick start
Expand Down Expand Up @@ -114,6 +114,17 @@ The output is written to `fastq-ont/mysample+filtlongMB<m>,<q>,<l>,<n>.fastq`.

When using any of the Filtlong keywords in a folder name, they must be followed by an underscore, followed by the keyword for the assembler.

### Rasusa
The ONT reads can be randomly subsampled prior to the assembly.

The available keywords are:

**`rasusaMB<m>`**
This will subsample the ONT reads to a total of `m` megabases.
The output is written to `fastq-ont/mysample+rasusaMB<m>.fastq`.

When using any the Rasusa keyword in a folder name, it must be followed by an underscore, followed by the keyword for the assembler.

### Flye

Following keywords can be used to run the assembly with Flye:
Expand Down
19 changes: 18 additions & 1 deletion Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,13 @@ wildcard_constraints:
def get_ont_fq(wildcards):
if "filtlong" in wildcards.sample:
return "fastq-ont/" + wildcards.sample + ".fastq"
elif "rasusa" in wildcards.sample:
return "fastq-ont/" + wildcards.sample + ".fastq"
else:
return glob("fastq-ont/" + wildcards.sample + ".fastq*")


# use split("+")[0] here for removing the +filtlong... suffices from sample names for Illumina reads
# use split("+")[0] here for removing the +filtlong... or +rasusa... suffices from sample names for Illumina reads
def get_R1_fq(wildcards):
return glob("fastq-illumina/" + wildcards.sample.split("+")[0] + "_R1.fastq*")

Expand Down Expand Up @@ -161,6 +163,21 @@ rule filtlong:
"""
rule rasusaMB:
conda:
"env/conda-rasusa.yaml"
threads: 1
input:
fq=get_ont_fq,
output:
"fastq-ont/{sample}+rasusaMB{num}.fastq",
log:
"fastq-ont/{sample}_rasusaMB{num}_log.txt",
shell:
"""
rasusa --bases {wildcards.num}m -i {input} -o {output} 2>{log}
"""
rule filtlongMB:
threads: 1
input:
Expand Down
9 changes: 9 additions & 0 deletions env/conda-rasusa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
name: ont-assembly-snake-rasusa
channels:
- conda-forge
- bioconda
- defaults
- anaconda
dependencies:
- bioconda::rasusa=0.7.1

0 comments on commit 8e03bd1

Please sign in to comment.