Bacterial genome ASSembly (BASS) ><((('>

This pipeline is meant to assemble paired-end bacterial genomes that have short reads. The pipeline uses both SPADES (slow) and SKESA (fast) to assemble your genomes. We take in a directory of genome sequences, and output directory, whether you want quality anaysis or not, and preferred kmer sizes. Check out our wiki page

Installing

This pipeline uses as conda based environment to ensure you have the appropriate dependencies. We recommend that you download and install Miniconda from https://conda.io/en/latest/miniconda.html

Example for installing Miniconda for Linux :

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
rm  Miniconda3-latest-Linux-x86_64.sh

Next to clone the repository into your current directory (http version):

git clone  https://github.gatech.edu/compgenomics2019/Team3-GenomeAssembly.git

If you need the ssh version:

git clone [email protected]:compgenomics2019/Team3-GenomeAssembly.git

Next create and activate a conda environment from the yml files provided in the lib directory.

### FOR LINUX ###
cd Team3-GenomeAssembly/
conda env create -f lib/bass_linux.yml -n your_env_name
source activate your_env_name

### FOR MAC ###
cd Team3-GenomeAssembly/
conda env create -f lib/bass_OS.yml -n your_env_name
source activate your_env_name

If you decline to create an environment with Miniconda, you will be responsible for your own dependencies for the following:

Running

Prepping your data

Your forward and reverse reads for your genomes should be in an input folder alone. See example_data/ as a reference below:

Team3-GenomeAssembly/
   example_data/
      CGT3662_1.fq.gz
      CGT3662_2.fq.gz

For Help...

Having trouble at any time running our pipeline? Feel free to try the following command within Team3-GenomeAssembly/

./pipeline.sh -h

and you will have the following printed:

Usage: sh pipeline.bash -i <input directory> -o <output directory> -[OPTIONS]
              Bacterial short reads genome assembly software. The options available are:
                        -i : Directory for genome sequences [required]
                        -o : Output directory [required]
                        -f : For fast assembly (uses skesa)
                        -q : Flag to perform quality analysis of assembly using Quast
                        -m : Flag to perform quality analysis of reads using FastQC+MultiQC
                        -k : Kmer range for spades (default=99,105,107,111)
                        -v : Flag to turn on verbose mode
                        -h : Print usage instructions

Running example_data

For an example, we can assembly our example_data/ using SPAdes with specified kmer sizes 99,105,107, and 111, run FastQC and MultiQC, and produce a quast report using the following command within the Team3-GenomeAssembly/ directory:

./pipeline.sh -i example_data/ -o example_out -k 99,105,107,111 -mq

You can view the output of this command without running it. Check out Team3-GenomeAssembly/example_output. Also feel free to run the command above an analyze the output and verify that the results are reproducible. Below is the Quast result.tsv showing an assembled genome with 8 contigs.

Assembly	CGT3662_scaffolds
# contigs (>= 0 bp)	10
# contigs (>= 1000 bp)	7
# contigs (>= 5000 bp)	7
# contigs (>= 10000 bp)	7
# contigs (>= 25000 bp)	7
# contigs (>= 50000 bp)	7
Total length (>= 0 bp)	2915377
Total length (>= 1000 bp)	2914433
Total length (>= 5000 bp)	2914433
Total length (>= 10000 bp)	2914433
Total length (>= 25000 bp)	2914433
Total length (>= 50000 bp)	2914433
# contigs	8
Largest contig	1528638
Total length	2915081
GC (%)	37.86
N50	1528638
N75	330532
L50	1
L75	3
# N's per 100 kbp	2.61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bacterial genome ASSembly (BASS) ><((('>

Installing

Running

Prepping your data

For Help...

Running example_data

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
example_data		example_data
example_output		example_output
lib		lib
README.md		README.md
pipeline.sh		pipeline.sh

compgenomics2019/Team3-GenomeAssembly

Folders and files

Latest commit

History

Repository files navigation

Bacterial genome ASSembly (BASS) ><((('>

Installing

Running

Prepping your data

For Help...

Running example_data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages