PAPER

Python Aligns Paired-End Reads

Python 3.5+

This is primarily for colleagues at Carnegie Institute for Science, Stanford

This is meant to be run on a system managed with SLURM !!

Install

Currently no wheel, sorry! So just clone the repository and navigate to src/ folder and run. Essentially follow these steps:

Open a terminal and log into a server (i.e. Memex)
navigate to a directory that you are willing to store this tool in
Clone repo by typing this in terminal:

git clone https://github.com/qks1lver/paper.git

This creates a folder called paper/ with all the codes inside

Now you have it!

Trinity

Navigate to paper/src/
Run something like this:

sbatch run.sh -a trinity -i pair_end_fastq_files_dir/ -o output_files_dir/ -c 12 -m 50

-i files in this directory can be .fastq or .fastq.gz

-o directory of output files from Trinity, will creat folder if non-existent

-c number of CPUs to allow Trinity to use

-m amount of RAM in G to allow Trinity to use

BWA

Navigate to paper/src/
Run something like this:

sbatch run.sh -a bwa -r reference_genome.fa -i pair_end_fastq_files_dir/ -o output_files_dir/ -c 12 -m 20

-r FASTA file reference genome

-i files in this directory can be .fastq or .fastq.gz

-o directory of output BAM files (fastq -> BWA -> SAMtools -> BAM), will creat folder if non-existent

-c number of CPUs to allow BWA to use

-m amount of RAM in G to allow BWA to use

Extract unmapped reads

Navigate to paper/src/
Run something like this:

sbatch run.sh --unmap -i mapped_read_files_dir/ -o output_files_dir/ -c 4 -m 8

-i files in this directory can be BAM or SAM

-o directory of output BAM files of unmapped reads, will creat folder if non-existent

-c number of CPUs to allow SAMtools to use (default is the number of CPUs found. On Memex, this would go to 24, which is too much and can potentially make your job wait)

-m amount of RAM in G to allow SAMtools to use (default is 4)

Unpair paired BAMs

This could come after extracting the unmapped reads, which would generate one bam files with both pairs of reads. Unpair the reads, so they can then go through Trinity.

Navigate to paper/src/
Run something like this:

sbatch run.sh --unpair -i unmapped_read_files_dir/ -o output_files_dir/ -c 4 -m 8

-i files in this directory can be BAM or SAM

-o directory of output .fastq.gz files of unpaired unmapped reads, will creat folder if non-existent

-c number of CPUs to allow SAMtools to use (default is the number of CPUs found. On Memex, this would go to 24, which is too much and can potentially make your job wait)

-m amount of RAM in G to allow SAMtools to use (default is 4)

Check jobs

To maximize parallelization, Paper submits one job for each file (or each set of paired-end fastq files). Each job uses as many processors and RAM as specified by -c and -m.

To check the progress of jobs, use SLURM's squeue, for example, my account name on Memex is jyyen:

squeue -u jyyen

For more on SLURM, check out this great list of the most used commands:

https://www.rc.fas.harvard.edu/resources/documentation/convenient-slurm-commands/

Contact

Jiun Yen ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAPER

Install

Trinity

BWA

Extract unmapped reads

Unpair paired BAMs

Check jobs

Contact

About

Releases

Packages

Languages

qks1lver/paper

Folders and files

Latest commit

History

Repository files navigation

PAPER

Install

Trinity

BWA

Extract unmapped reads

Unpair paired BAMs

Check jobs

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages