Skip to content

Latest commit

 

History

History
34 lines (31 loc) · 1.3 KB

BaseCallingAssembly.md

File metadata and controls

34 lines (31 loc) · 1.3 KB

Basecalling and Genome Assembly

MinION basecalling and assembly

  1. Convert fast5 files to pod5 format:
pod5 convert fast5 <directory-of-fast5s> --output output_pod5s/ --one-to-one <directory-of-fast5s>

2A. Dorado basecalling on laptop:

dorado download --model <model-name>
dorado basecaller --emit-fastq <directory-of-pod5s> | gzip > output.fq.gz # try and implement pigz parallel compression

2B. Dorado basecalling on LCC cluster using dorado.sh script:

sbatch $scripts/dorado.sh pod5_directory
  1. Use canu.sh SLURM script for Canu assembly:
assembly=<assembly-prefix>
nano_reads=<fastq directory>
canu -d ${assembly}_canu_run -p $assembly genomeSize=45m useGrid=false gridOptionsOVS=" --time 96:00:00 --partition=CAC48M192_L --ntasks=1 --cpus-per-task=4 " minReadLength=1000 -nanopore-raw $nano_reads
  1. Rescuing raw files from failed MinION runs:
./recover_reads <Representative-fast5-file> </Library/MinKNOW/data/queued_reads/complete-reads-directory> --output-directory Recovered_fast5

Determine contig lengths

  1. Run the SeqLen.pl script:
perl SeqLen.pl <genome.fasta>