Skip to content

Commit

Permalink
Merge pull request #16 from rhshah/development
Browse files Browse the repository at this point in the history
Adding Kinase Domain Annotation
  • Loading branch information
rhshah authored Jan 10, 2018
2 parents cc9cbda + 15e1df5 commit 74c32af
Show file tree
Hide file tree
Showing 89 changed files with 2,863 additions and 1,502 deletions.
2 changes: 0 additions & 2 deletions .settings/org.eclipse.core.resources.prefs

This file was deleted.

7 changes: 7 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
language: python
python:
- "2.7"
# command to install dependencies
install: "pip install -r requirements.txt"
# command to run tests
script: nosetests -v --with-coverage --cover-tests tests
21 changes: 18 additions & 3 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,23 @@
v0.0.2
RS:Added funtionality to visualize SV, though not clean, but a start.
v0.0.3
RS:Added funtinality to add annotation of repeat regions, cosmic census and database of genomic variants (DGv)
RS:Added funtionality to add annotation of repeat regions, cosmic census and database of genomic variants (DGv)
v0.0.4
RS:Added logging
v0.0.5
RS:Added Funtionality to get counts of events from Cosmic Fusion Export.
v0.0.6
RS:Addition of coloredlogs and some trivial bug fixes
v1.0.3
RS:Added Functionality to use multiple tracks to annotate.
v1.0.4
Minor Bug fixes
v1.0.5
Major Bug Fix:
Did not assign intron number properly in some cases and that might give wrong annotation
v1.0.6
RS:Added funtionality to get counts of events from Cosmic Fusion Export.
v1.0.7
Updated to have no padding and proper transcript selection
v1.0.8
Removing TIMM23B for proper annotation of RET-NCOA4
v1.0.9
RS:Added funtionality to kinase domain involvement in the annotation
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
init:
pip install -r requirements.txt

test:
nosetests -v --with-coverage --cover-tests tests
134 changes: 70 additions & 64 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ iAnnotateSV: Annotation of structural variants detected from NGS
.. image:: https://zenodo.org/badge/18929/rhshah/iAnnotateSV.svg
:target: https://zenodo.org/badge/latestdoi/18929/rhshah/iAnnotateSV


.. image:: https://travis-ci.org/rhshah/iAnnotateSV.svg?branch=development
:target: https://travis-ci.org/rhshah/iAnnotateS

iAnnotateSV is a Python library and command-line software toolkit to annotate and
visualize structural variants detected from Next Generation DNA sequencing data. This works for majority is just re-writing of a tool called dRanger_annotate written in matlab by Mike Lawrence at Broad Institue.
Expand Down Expand Up @@ -60,47 +63,47 @@ Else To Run:

``python path/to/iAnnotateSV.py -i svFile.txt -ofp outputfilePrefix -o /path/to/output/dir -r hg19 -d 3000 -c canonicalTranscripts.txt -u uniprot.txt -p``

::
usage: iAnnotateSV.py [options]

**usage: iAnnotateSV.py [options]**

**Annotate SV based on a specific human reference**

**optional arguments:**
Annotate SV based on a specific human reference
Annotate SV based on a specific human reference

optional arguments:
optional arguments:
-h, --help show this help message and exit
-v, --verbose make lots of noise [default]
-r hg19, --refFileVersion hg19
Which human reference file to be used, hg18,hg19 or
hg38
Which human reference file to be used, hg18,hg19 or
hg38
-rf hg19.sv.table.txt, --refFile hg19.sv.table.txt
Human reference file location to be used
-ofp test, --outputFilePrefix test
Prefix for the output file
Prefix for the output file
-o /somedir, --outputDir /somedir
Full Path to the output dir
Full Path to the output dir
-i svfile.txt, --svFile svfile.txt
Location of the structural variants file to annotate
Location of the structural variants file to annotate
-d 3000, --distance 3000
Distance used to extend the promoter region
Distance used to extend the promoter region
-a, --autoSelect Auto Select which transcript to be used[default]
-c canonicalExons.txt, --canonicalTranscripts canonicalExons.txt
Location of canonical transcript list for each gene.
Use only if you want the output for specific
transcripts for each gene.
-p, --plotSV Plot the structural variant in question[default]
Location of canonical transcript list for each gene.
Use only if you want the output for specific
transcripts for each gene.
-p, --plotSV Plot the structural variant in question
-u uniprot.txt, --uniprotFile uniprot.txt
Location of UniProt list contain information for
protein domains. Use only if you want to plot the
structural variant
Location of UniProt list contain information for
protein domains. Use only if you want to plot the
structural variant
-rr RepeatRegionFile.tsv, --repeatFile RepeatRegionFile.tsv
Location of the Repeat Region Bed File
Location of the Repeat Region Bed File
-dgv DGvFile.tsv, --dgvFile DGvFile.tsv
Location of the Database of Genomic Variants Bed File
Location of the Database of Genomic Variants Bed File
-cc CosmicConsensus.tsv, --cosmicConsensusFile CosmicConsensus.tsv
Location of the Cosmic Consensus TSV file
-cct CosmicFusionCounts.tsv, --cosmicCountsFile CosmicConsensus.tsv
Location of the Cosmic Fusion Counts TSV file
Location of the Cosmic Consensus TSV file
-cct cosmic_fusion_counts.tsv, --cosmicCountsFile cosmic_fusion_counts.tsv
Location of the Cosmic Counts TSV file


Input file format is a tab-delimited file containing:

Expand Down Expand Up @@ -130,9 +133,11 @@ as the header and where:
* **gene1** : Gene for the first break point,
* **transcript1** : Transcript used for the first breakpoint,
* **site1** : Explanation of the site where the first breakpoint occurs [Example=>Intron of EWSR1(+):126bp after exon 10],
* **kinasedomain1** : Explanation of the site where the first breapoint involves a Kinase Domain or not[Example=>Partial Kinase Domain Included]
* **gene2** : Gene for the second break point,
* **transcript2** : Transcript used for the second breakpoint,
* **site2** : Explanation of the site where the second breakpoint occurs [Example=>Intron of ERG(-):393bp after exon 4],
* **kinasedomain2** : Explanation of the site where the second breapoint involves a Kinase Domain or not[Example=>Partial Kinase Domain Included]
* **fusion** : Explanation if the evnet leads to fusion or not. [Example=>Protein Fusion: in frame {EWSR1:ERG}]
* **Cosmic_Fusion_Counts** : Number of Counts for the Events from Cosmic Fusion Results
* **repName-repClass-repFamily:-site1** : Repeat Region Annotation for site 1
Expand All @@ -145,7 +150,6 @@ as the header and where:
* **DGv_Name-DGv_VarType-site1** : Database of Genomic Variants annotation for site 1
* **DGv_Name-DGv_VarType-site** : Database of Genomic Variants annotation for site 2


:Example Plot:

.. image:: images/EWSR1-chr22_29688289_ERG-chr21_39775034_Translocation.jpg
Expand Down Expand Up @@ -435,41 +439,43 @@ Submodules
:show-inheritance:
- This module is the driver module, it takes user information and runs all other module to produce proper structural variant annotation

**usage: iAnnotateSV.py [options]**

**Annotate SV based on a specific human reference**

**optional arguments:**

-h, --help show this help message and exit
-v, --verbose make lots of noise [default]
-r hg19, --refFileVersion hg19
Which human reference file to be used, hg18,hg19 or
hg38
-ofp test, --outputFilePrefix test
Prefix for the output file
-o /somedir, --outputDir /somedir
Full Path to the output dir
-i svfile.txt, --svFile svfile.txt
Location of the structural variants file to annotate
-d 3000, --distance 3000
Distance used to extend the promoter region
-a, --autoSelect Auto Select which transcript to be used[default]
-c canonicalExons.txt, --canonicalTranscripts canonicalExons.txt
Location of canonical transcript list for each gene.
Use only if you want the output for specific
transcripts for each gene.
-p, --plotSV Plot the structural variant in question[default]
-u uniprot.txt, --uniprotFile uniprot.txt
Location of UniProt list contain information for
protein domains. Use only if you want to plot the
structural variant
-rr RepeatRegionFile.tsv, --repeatFile RepeatRegionFile.tsv
Location of the Repeat Region Bed File
-dgv DGvFile.tsv, --dgvFile DGvFile.tsv
Location of the Database of Genomic Variants Bed File
-cc CosmicConsensus.tsv, --cosmicConsensusFile CosmicConsensus.tsv
Location of the Cosmic Consensus TSV file
-cct CosmicFusionCounts.tsv, --cosmicCountsFile CosmicConsensus.tsv
Location of the Cosmic Fusion Counts TSV file
Here is the Usage again::

usage: iAnnotateSV.py [options]

Annotate SV based on a specific human reference

optional arguments:

-h, --help show this help message and exit
-v, --verbose make lots of noise [default]
-r hg19, --refFileVersion hg19
Which human reference file to be used, hg18,hg19 or
hg38
-ofp test, --outputFilePrefix test
Prefix for the output file
-o /somedir, --outputDir /somedir
Full Path to the output dir
-i svfile.txt, --svFile svfile.txt
Location of the structural variants file to annotate
-d 3000, --distance 3000
Distance used to extend the promoter region
-a, --autoSelect Auto Select which transcript to be used[default]
-c canonicalExons.txt, --canonicalTranscripts canonicalExons.txt
Location of canonical transcript list for each gene.
Use only if you want the output for specific
transcripts for each gene.
-p, --plotSV Plot the structural variant in question[default]
-u uniprot.txt, --uniprotFile uniprot.txt
Location of UniProt list contain information for
protein domains. Use only if you want to plot the
structural variant
-rr RepeatRegionFile.tsv, --repeatFile RepeatRegionFile.tsv
Location of the Repeat Region Bed File
-dgv DGvFile.tsv, --dgvFile DGvFile.tsv
Location of the Database of Genomic Variants Bed File
-cc CosmicConsensus.tsv, --cosmicConsensusFile CosmicConsensus.tsv
Location of the Cosmic Consensus TSV file
-cct CosmicFusionCounts.tsv, --cosmicCountsFile CosmicConsensus.tsv
Location of the Cosmic Fusion Counts TSV file
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/iAnnotateSV.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/modules.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: fd3e2cef76c328aa704a409c450cc6e1
config: 28894c8d3c2921d47047283801d617fa
tags: 645f666f9bcd5a90fca523b33c5a78b7
Loading

0 comments on commit 74c32af

Please sign in to comment.