Skip to content
Chaochih Liu edited this page May 27, 2020 · 23 revisions

Getting Started

Navigating the sections

Installation

This repository contains scripts, called handlers, that automate the process of converting raw FASTQ sequences into BAM files, and finally to a finished VCF file.

To set up sequence_handling, open a terminal and type:

git clone https://github.com/MorrellLab/sequence_handling.git

If you don't have Git installed, you can go to the repository on GitHub and select Download ZIP on the right-hand side. No GitHub account is required for downloading through either method.

To see usage information about sequence_handling, go into the sequence handling directory

cd sequence_handling

and run:

./sequence_handling

Computational Requirements

This repository has a heavy dependency on GNU Parallel; most of the handlers use GNU Parallel to speed the processing of multiple samples. While this speeds processing, be aware that: standard laptops, tablets, and desktop computers may not be appropriate. The handlers in the repository are most appropriate for use on supercomputers; this repository was designed with the Minnesota Supercomputing Institute (MSI) in mind.

Dependency Requirements

MSI's resources make extensive use of a module system, in which software is installed and maintained by MSI and users can call upon modules as needed. These handlers are designed to call upon modules whenever possible, however some dependencies are only available through the Morrell Lab. To gain access to the Morrell Lab modules, please run the following command on the login host:

echo export MODULEPATH=/panfs/roc/groups/9/morrellp/public/Modules:'$MODULEPATH' >> ~/.bash_profile

Please check the dependencies page to see which programs are necessary for each handler.

Next Steps

Before beginning sequence_handling, make sure that your FastQ samples have been merged (if individual samples are split across multiple files) and renamed. It will be much harder to merge and/or rename files later in the pipeline.

Take a look at the recommended workflow to familiarize yourself with the goals of this repository. To begin running the pipeline, you will need to:

  1. Install or locate the correct version of each of the dependencies. (MSI users simply need to have access to all of the required modules.)
  2. Create a list of your samples using sample_list_generator.sh.
  3. Fill out a configuration file for your project.

For the latest updates and to chat with our team, please join our Slack workspace: sequencehandling.slack.com.

Additional Resources

For slides from a Does[0]Compute? discussion on validating files at each step in a sequencing pipeline and some of the commands/tools you may want to use, go here.