-
Notifications
You must be signed in to change notification settings - Fork 0
smithlabcode/zagros
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
______ |___ / / / __ _ __ _ _ __ ___ ___ / / / _` |/ _` | '__/ _ \/ __| / /_| (_| | (_| | | | (_) \__ \ /_____\__,_|\__, |_| \___/|___/ __/ | |___/ ******************************** * V1.1.0 * ******************************** ********************************* Copyright and License Information ******************************************************************************* Copyright (C) 2013 University of Southern California, Emad Bahrami-Samani, Philip J. Uren, Andrew D. Smith Authors: Emad Bahrami-Samani, Philip J. Uren, Andrew D. Smith This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. *********************** Building and Installing ******************************************************************************* This software package has been designed to operate in a UNIX-like environment. It has been tested on MacOS X Snow Leopard and Linux. Step 0 ------ This software package requires a functioning installation of the GNU Scientific Library (GSL). If you don't already have this installed, you will need to download and install it from http://www.gnu.org/software/gsl/ Step 1 ------ To build the binaries, type the following, where '>' is your prompt and the CWD is the root of the distribution > make all Step 2 ------ To install the binaries, type the following, where '>' is your prompt and the CWD is the root of the distribution > make install This will place the binaries in the bin directory under the package root. They can be used directly from there without any additional steps. You can add that directory to your PATH environment variable to avoid having to specify their full paths, or you can copy the binaries to another directory of your choice in your PATH *********** Basic Usage ******************************************************************************* Zagros has four modes of operations. ****************** 1.) Sequence only: In this mode, only the sequence information is used for motif discovery. The input can either be a set of sequences in fasta format or genomic regions in bed format. This set of regions/sequences corresponds to the locations of significant enrichment for reads in the experiment. In case the input is the set of sequences, you can simply run: > ./zagros input.fa If the input consists of a set of genomic regions, the set of the target genome sequences must also be provided to Zagros to extract the sequnces: > ./zagros -c path/to/chrom_directory input.bed The chromosome directory can be downloaded from UCSC genome browser website. The lastest versions can be found here: http://hgdownload.soe.ucsc.edu/downloads.html ************************** 2.) Sequence and Structure In this mode, in addition to the target sequences the secondary structure information is used as well. In this case, the secondary structure data must be first obtained and saved using the "thermo" program: > ./thermo -o input.str input.fa or > ./thermo -c path/to/chrom_directory -o input.str input.bed After this step, by providing both the target and secondary structure file to Zagros the motif discovery is performed based on both. > ./zagros -t input.str input.fa or > ./zagros -c path/to/chrom_directory -t input.str input.bed ********************************** 3.) Sequence and Diagnostic events In this mode, in addition to the target sequences the information about cross-link modification events is used as well. In this case, the diagnostic events information must be first obtained and saved using the "extractDEs" program. The input to extractDEs program is the set of mapped reads. The user must specify what technology is used for obtaining the reads (hCLIP, pCLIP or iCLIP), what mapper is used for mapping the reads, and the genomic regions of significant regions that is used for zagros as input. ExtractDEs then produces the set of diagnostic events corrsponding to the regions of interest. Zagros can interprete the mapped reads from three mappers: bowtie (native output format), novoalign (native output format) and RMAP (bed format). > ./extractDEs -m novoalign -t iCLIP -o input.des -r input.bed mapped_reads.novo Then run zagros program by inputing the diagnostic events file as one of the options. > ./zagros -d input.des input.fa or > ./zagros -c path/to/chrom_directory -d input.des input.bed ********************************************* 4.) Sequence, Structure and Diagnostic events > ./thermo -o input.str input.fa > ./extractDEs -m novoalign -t iCLIP -o input.des -r mapped_reads.nov > ./zagros -t input.str -d input.des input.fa ******************** Command Line Options ******************************************************************************* Usage: zagros [OPTIONS] <target_regions/sequences> Options: -o, -output output file name (default: stdout) -w, -width width of motifs to find (4 <= w <= 12; default: 6) -n, -number number of motifs to output (default: 1) -c, -chrom directory with chrom files (FASTA format) -t, -structure structure information file -d, -diagnostic_events diagnostic events information file -i, -diagEventsThresh down-sample diagnostic events to this many per sequence (-1 for no down-sampling; default: -1) -s, -starting-points number of starting points to try for EM search. Higher values will be slower, but more likely to find the global maximum. -v, -verbose print more run info Help options: -?, -help print this help message -about print about message ************************************** 4. Input and output formats of Zagros ******************************************************************************* Zagros takes three possible inputs for mapped reads: 1.) RMAP (BED format) [REQUIRED if using diagnostic evnts] 2.) Novoalign (Novoalign native format) [REQUIRED if using diagnostic evnts] 3.) Bowtie (Bowtie native format) [REQUIRED if using diagnostic evnts] ************************ Contacts and bug reports ******************************************************************************* Emad Bahrami-Samani [email protected] Philip J. Uren [email protected] Andrew D. Smith [email protected] If you found a bug in Piranha, we'd like to know about it. Before contacting us though, please check the following list: 1.) Are you using the latest version? The bug you found may already have been fixed. 2.) Check that your input is in the correct format and you have selected the correct options. 3.) Please reduce your input to the smallest possible size that still produces the bug; we will need your input data to reproduce the problem.
About
Zagros is an algorithm for motif discovery in CLIP-seq data
Resources
Stars
Watchers
Forks
Packages 0
No packages published