Skip to content

Commit

Permalink
add canu assembler
Browse files Browse the repository at this point in the history
close #12
  • Loading branch information
pmenzel committed May 12, 2023
1 parent b434c02 commit 315bde4
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 1 deletion.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ See the preprint here: [Snakemake Workflows for Long-read Bacterial Genome Assem

| read filtering | assembly | long read polishing | short read polishing | reference-based polishing |
| --- | --- | --- | --- | --- |
| [Filtlong](https://github.com/rrwick/Filtlong) | [Flye](https://github.com/fenderglass/Flye)<br/> [raven](https://github.com/lbcb-sci/raven)<br/> [miniasm](https://github.com/lh3/miniasm)<br/> [Unicycler](https://github.com/rrwick/Unicycler) | [racon](https://github.com/lbcb-sci/racon)<br/> [medaka](https://github.com/nanoporetech/medaka) | [pilon](https://github.com/broadinstitute/pilon/wiki)<br/> [Polypolish](https://github.com/rrwick/Polypolish)<br/> [POLCA](https://github.com/alekseyzimin/masurca#polca) | [Homopolish](https://github.com/ythuang0522/homopolish)<br/> [proovframe](https://github.com/thackl/proovframe) |
| [Filtlong](https://github.com/rrwick/Filtlong) | [Flye](https://github.com/fenderglass/Flye)<br/> [raven](https://github.com/lbcb-sci/raven)<br/> [miniasm](https://github.com/lh3/miniasm)<br/> [Unicycler](https://github.com/rrwick/Unicycler)<br/> [Canu](https://github.com/marbl/canu) | [racon](https://github.com/lbcb-sci/racon)<br/> [medaka](https://github.com/nanoporetech/medaka) | [pilon](https://github.com/broadinstitute/pilon/wiki)<br/> [Polypolish](https://github.com/rrwick/Polypolish)<br/> [POLCA](https://github.com/alekseyzimin/masurca#polca) | [Homopolish](https://github.com/ythuang0522/homopolish)<br/> [proovframe](https://github.com/thackl/proovframe) |


## Quick start
Expand Down Expand Up @@ -153,6 +153,11 @@ Default assembly. Miniasm does not do any polishing by itself.
**`unicycler`**
Unicycler does a hybrid assembly, i.e., both ONT and Illumina reads must be present in `fastq-ont` and `fastq-illumina`, respectively.

### Canu

**`canu`**
The Canu assembler requires to know the genome size (in Megabases) beforehand, use Snakemake option: `--config genome_size=5.2` (e.g. for 5.2 Mb)

### racon
Following keywords can be used to polish an assembly using ONT reads:

Expand Down
33 changes: 33 additions & 0 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ if config.get("map_medaka_model", False):
table = pd.read_csv(medaka_model_file, sep="\t", lineterminator="\n", header=None)
map_medaka_model = dict(zip(table[0], table[1]))

# this option is needed by Canu
target_genome_size = None
if config.get("genome_size", False):
target_genome_size = config["genome_size"]
print("Target genome size = " + str(target_genome_size) + "M")


wildcard_constraints:
sample="[^_]+",
Expand Down Expand Up @@ -74,6 +80,14 @@ if not references_protein and [
"Error: must provide at least one reference protein file when using proovframe"
)

# if any desired assembly uses Canu then need to provide target genome size
if not target_genome_size and [
string for string in sample_assemblies if "_canu" in string
]:
quit(
"Error: must provide target genome size when using Canu assembler, use option (e.g. for 5.2Mb): --config genome_size=5.2"
)

list_outputs = expand(
"assemblies/{sample_assembly}/output.fa", sample_assembly=sample_assemblies
)
Expand Down Expand Up @@ -372,6 +386,25 @@ rule ravenX:
"""
rule canu:
conda:
"env/conda-canu.yaml"
threads: 10
input:
fqont=get_ont_fq,
output:
fa="assemblies/{sample}_canu/output.fa",
link="assemblies/{sample}_canu.fa",
log:
"assemblies/{sample}_canu/log.txt",
shell:
"""
canu -nanopore -d assemblies/{wildcards.sample}_canu/ -p output useGrid=false maxThreads={threads} genomeSize={target_genome_size}m {input.fqont} >>{log} 2>&1
cp assemblies/{wildcards.sample}_canu/output.contigs.fasta {output.fa} 2>>{log}
ln -sr {output.fa} {output.link}
"""
# for running racon once
rule racon:
conda:
Expand Down
9 changes: 9 additions & 0 deletions env/conda-canu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
name: ont-assembly-snake-canu
channels:
- conda-forge
- bioconda
- defaults
- anaconda
dependencies:
- bioconda::canu=2.2

0 comments on commit 315bde4

Please sign in to comment.