This is a set of tools I developed to do some basic, but helpful things with Ion Torrent NGS data in general. Some of the tools, as described below are more general in nature, however, and hopefully useful little bits to do quick jobs.
For now, here is a brief description of the tools included in this repo. Each
one has a full set of help docs, which can be accessed with by passing the
-h
or --help
option to the script.
Current Version: v1.3.0_110515
Requirements:
Perl Modules:
LWP::Simple
XML::Twig
Sort::Versions
Data::Dump
Description:
Script to query the UCSC DAS server and pull out sequence information given a position string. Can use either a single string entry:
<<<chrx:123435
or can use a file with positions in the same format and get a batch output.
Current Version: v8.1.102218
Requirements:
- VCF Tools
- Perl Modules:
Sort::Versions
Data::Dump
Description:
Script to parse Ion Torrent specific VCF files and pull out variant data. This works with TS v4.2 and v5.0 files, both run with or without Ion Reporter systems.
In order to run this utility, you'll need to have the
package installed and VCF Tools in your $PATH
. There may be some
other non-standard Perl modules to be installed, such as
Sort::Versions
, Data::Dump
, etc. All can easily be installed
from CPAN as usual.
See the help documentation for this script for details on the options and functionality of this tool:
$ vcfExtractor.pl -h
Current Version: v7.25.031418
Requirements:
Description:
Read in an Ion Torrent BAM file and generate a readlength histogram plot from the sample. This script will require the Perl Statistics::R module, as well as, the most excellent ggplot2 library in the R Statistics package.
Current Version: v0.1.020918
Requirements:
- Konstantin's Python pyliftover library
Description:
Map coordinates of two reference assemblies (e.g. hg18 and hg19) together in order. This utility requires Konstantin's excellent python pyliftover library which leverages the UCSC liftOver utility for mapping reference assemblies.
Current Version: v0.4.030118
Requirements:
vcfExtractor.pl
Description:
Combine OCA Ion Reporter blood and tumor VCFs to generate a tumor / normal comparison file. In reality, though the labels will be 'blood' and 'tumor' related, the data really can be generated by comparing any two VCF files from Ion Reporter.
Current Version: 2.0_111417
Requirements:
- Python 3
- Python 3 requests library
Description:
Input a file, comma separated list, or a single ClinVar ID and get a table of variant information derived from ClinVar using the eutils API functionality of NCBI. No filtering possible for now, but will be added later.
Current Version: v1.3.121318
Requirements:
Description:
Using a pathway lookup table in resources
, generate get a list of
oncogenic related pathways for a gene or set of genes. Need to continue
to refine the pathways lookup tables, but the hope is that this will
be a good annotator tool that can be implemented into other pipelines.
Current Version: v0.2.121517
Requirements:
Description: Protein Domain Retrieval Script Starting with a correctly formatted HUGO gene ID, retrieve protein domain position information from EMBL in a JSON format that can be used as a lookup DB in other programs. You can either load a comma separated string of IDs, or a batchfile containing a list of IDs, one per line, to look up.
Current Version: v0.8.121718
Requirements:
- Perl MCE::Shared module
Description:
Script to read in a GRCh37 (hg19) coordinate in the format chr:position
and
output a HUGO gene name. Can input a comma separated list of coords or a file
containing a batch of coords to lookup, one per line. This script is written
with parallel processing in mind, so it's really fast to look up data batchwise.