Skip to content

Latest commit

 

History

History
52 lines (40 loc) · 3.07 KB

tutorial.md

File metadata and controls

52 lines (40 loc) · 3.07 KB

Tutorial:

This section is a walk through for how to use MCSMRT using sample data from BEI sequencing on PacBio. These input files were generated by running the Reads of Insert (ROI) protocol via SMRT link. A filtering criteria of minimum number of 5 ccs passes and predicted minimum accuracy of 90 was used. Successful completion of the ROI protocol creates a results directory in a location/path which is based on how PacBio was configured. A directory similar to the one produced by ROI is in the tutorial folder in mcsmrt's GitHub page. You can then run each of these steps to learn how to use MCSMRT.

  1. cd to your home directory and clone the MCSMRT repository e.g.
   $ cd ~    
   $ git clone [email protected]:jpearl01/mcsmrt.git
  1. Create a new directory called "mcmsrt_tutorial" in your home directory, where all the input and output files will be stored
$ mkdir ~/mcmsrt_tutorial
  1. Change directory to the mcmsrt_tutorial folder:
$ cd ~/mcmsrt_tutorial
  1. Download all tutorial datafiles from here into the mcmsrt_tutorial folder.

  2. Expand the archive BEI_sample_datatar.gz:

$ tar -xzf BEI_sample_data.tar.gz

This will create a directory called data with the same structure as a SMRT portal/SMRT link ROI protocol output folder. The data files in this directory are from a single cell of a PacBio sequencing run and contain 4 barcoded replicates of the BEI mock community.

  1. In the sample_key.tsv file, you must change the data_path column to the FULL PATH of the newly created data folder. Do not use relative paths here.

  2. Run get_fastqs.rb to correctly modify headers (add barcodelabel and ccs passes) in a directory called 'reads'

$ ruby ~/mcsmrt/get_fastqs.rb -s sample_key.tsv -o reads
  1. The last step is to run the main program mcsmrt.rb which produces an OTU table with taxonomies for each OTU. Execute this command (after the -d flag change 1 to the number of threads you want to use):
$ ruby ~/mcsmrt/mcsmrt.rb -f reads/ \
-d 1 \
-c ~/mcsmrt_tutorial/rdp_gold.fa \
-t ~/mcsmrt_tutorial/16sMicrobial_ncbi_lineage_reference_database.udb \
-l ~/mcsmrt_tutorial/16sMicrobial_ncbi_lineage.fasta \
-p ~/mcsmrt_tutorial/primers.fasta \
-b ~/mcsmrt/data/ncbi_clustered_table.tsv \
-v

With the -g option, it is possible to provide a path to a fasta formatted file of a complete host genome, e.g. -g /path/to/human.fasta. This will filter out any spurious host sequence that may have inadvertantly amplified during pcr. The other input files required to run this script are provided in the tutorial folder.

Example output files:

After successful completion of the command which runs the mcsmrt.rb script, various output files are generated. The most imporatant/useful output files are pre_all_reads_info.txt and post_final_results.txt. As an example, results run through MCSMRT using data from BEI mock community is added in the folder called example_output_files.