Skip to content

Boyle-Lab/bulkPlasmidSeq

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dependencies:

All require Python 3, please see software sources for more detailed requirements. I recommend installing
all of these inside the Medaka conda environment. Samtools and minimap2 already installed with Medaka environment.

Medaka

    Prefered installation - Using conda environment, takes care of medaka dependencies:

        conda create -n medaka -c conda-forge -c bioconda medaka
        
        conda activate medaka

        --------------------------------------------------------------

        virtualenv medaka --python=python3 --prompt "(medaka) "
        . medaka/bin/activate
        pip install medaka
        
    Source: https://github.com/nanoporetech/medaka
    
NanoFilt

    One of several available long read filtering softwares, NanoFilt has the advantage of specifying
    the maximimum read length which is good for filtering out extra long reads. 
        
        pip install nanofilt
        pip install nanofilt --upgrade
       
        --------------------------------------------------------------
        
        conda install -c bioconda nanofilt
        
    Source: https://github.com/wdecoster/nanofilt

Samtools

Minimap2

Integrative Genomics Browser - IGV 

    Specify --igv path/to/igv.sh to take screenshots of alignments.
    
Biopython
    
    Used for IO and marker based binning. 
    
    Source: https://biopython.org/wiki/Download

Emboss
    
    Needle pairwise alignments for consensus alignments.
    
    Source: http://emboss.sourceforge.net/download/   

BulkPlasmidSeq usage examples:

For binning sequences based on unique sequences in reference - biobin:

    python bulkPlasmidSeq.py biobin -i path/to/reads -r path/to/plasmids -o output_directory

For generating concensus sequences - medaka:
    
    python bulkPlasmidSeq.py medaka -i my_reads.fastq -r my_plasmid_genome.fa -o output_directory -t 4
    
Filter reads with nanofilt. 

    python bulkPlasmidSeq.py medaka --filter \
        -i unfiltered_reads.fastq \
        -r my_plasmid_genome.fa \
        -q 7 \
        -o output_directroy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.3%
  • Shell 0.7%