-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added usage description to README. added more keywords to setup. adde…
…d back the de novo files and co-segregate variant file inputs.
- Loading branch information
Showing
4 changed files
with
158 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,3 +21,121 @@ Run: | |
For detailed help/support, email Adam: | ||
|
||
[email protected] | ||
|
||
## Usage Details | ||
### Input data | ||
-m Standard .maf | ||
-f Standard .vcf | ||
-T Custom .tsv | ||
Variant data may be input via at least one variant file. | ||
This means that if variants are spread across several files, then you can input one of each type. | ||
For the .maf and .tsv, use the custom columns to determine which columns to use. | ||
Note that a standard .maf does not include protein annotations. | ||
Use the custom column for the peptide change column. | ||
If your .vcf has VEP annotations, then CharGer should be able to parse the information. | ||
This information will be added to your variants when available. | ||
|
||
### Output | ||
-o output file | ||
-w output as HTML (flag) | ||
-k annotate input (flag) | ||
--run-url-test test url when creating links | ||
Name your output file; otherwise it will be called charger_summary.tsv. | ||
You can opt to make the output into an HTML page, instead of a readable .tsv. | ||
If you need to be assured of properly linked URL's, use the url test flag. | ||
|
||
### Access data | ||
-l ClinVar (flag) | ||
-x ExAC (flag) | ||
-E VEP (flag) | ||
-t TCGA cancer types (flag) | ||
Using these flags turns on accession features built in. | ||
For the ClinVar, ExAC, and VEP flags, if no local VEP or databse is provided, then BioMine will be used to access the ReST interface. | ||
The TCGA flag allows disease determination from sample barcodes in a .maf when using a diseases file (see below). | ||
|
||
### Suppress data or overrides | ||
-O override with ClinVar description (flag) | ||
-D suppress needing disease specific (flag) | ||
You can have CharGer override its pathogenic characterization with whatever ClinVar has. | ||
Suppressing disease specific variants takes any variants in the diseases file (see below) and treats them as equally pathogenic without disease consideration. | ||
|
||
### Cross-reference data | ||
-z pathogenic variants, .vcf | ||
-e expression matrix file, .tsv | ||
-g gene list file, (format: gene\\tdisease\\tmode_of_inheritance) .txt | ||
-d diseases file, (format: gene\\tdisease\\tmode_of_inheritance) .tsv | ||
-n de novo file, standard .maf | ||
-a assumed de novo file, standard .maf | ||
-c co-segregation file, standard .maf | ||
-H HotSpot3D clusters file, .clusters | ||
-r recurrence threshold (default = 2) | ||
Variants or genes from each of these files can be used as additional known information. | ||
An expression matrix file has columns for each sample, and its rows are genes. | ||
The genes should be approved HUGO symbols. | ||
HotSpot3D clusters can be used for versions v1.x.x. | ||
The recurrence threshold will be pulled from the recurrence/weight column of the .clusters file when provided. | ||
|
||
### Local VEP | ||
--vep-script Path to VEP | ||
--vep-dir Path to VEP directory | ||
--vep-cache Path to VEP cache directory | ||
--vep-version VEP version (default = 87) | ||
--vep-output VEP output file (default = charger.vep.vcf) | ||
--grch assembly GRCh verion (default = 37) | ||
--ensembl-release Ensembl release version (default = 75) | ||
--reference-fasta VEP reference fasta | ||
--fork Number of forked processes used in VEP (default = 0) | ||
This currently only works with .vcf input only. | ||
Annotations are run with the VEP everything flag, so any local plugins will be used. | ||
The BioMine accession is also suppressed when using a local VEP installaltion. | ||
The VEP directory is not the same as would be given to VEP's --dir option. | ||
Instead it is the path to the directory with the VEP .pl file. | ||
The VEP script is the .pl file only. | ||
If not given, it will be /vep-dir/variant\_effect\_predictor.pl. | ||
The VEP cache directory is the same as would be given to VEP's --dir-cache option. | ||
If you have multiple VEP versions, then specify the version you want to use. | ||
This can be different from the Ensembl release option. | ||
VEP output is the same os would be given to VEP's -o option and should end with .vcf. | ||
The default output file will be called charger.vep.vcf. | ||
The GRCh reference genome can be set to either 37 or 38. | ||
The reference Fasta file will be deteremined automatically if not specified. | ||
If the reference Fasta file is constructed automatically, then if, for example, the VEP chache is ~/.vep/, the Ensembl release is 74, and the reference assembly is 37, then the reference Fasta file will be ~/.vep/homo\_sapiens/74\_GRCH37/Homo\_sapiens.GRCh37.74.dna.primary\_assembly.fa.gz. | ||
|
||
### Local databases | ||
--exac-vcf ExAC vcf.gz | ||
--mac-clinvar-tsv ClinVar from MacArthur lab (clinvar_alleles.tsv.gz) | ||
Using local databases suppresses the BioMine accession too. | ||
These files can be downloaded from their respective sites. | ||
|
||
### Filters | ||
--rare Allele frequency threshold for rare/common (default = 1, process variant with any frequency): | ||
--vcf-any-filter Allow variants that do not pass all filters in .vcf input (flag) | ||
--mutation-types Comma delimited list of types to allow | ||
Using filters will limit the variants processed. | ||
The rare option takes variants with allele frequency less than the given value. | ||
The .vcf any filter accepts only variants that have passed all filters. | ||
If no .vcf pass filter status given, the .vcf null value will be taken as having passed. | ||
Mutation types filtering requires a comma delimitted list (no spaces) using terms from Ensembl's consequence terms. | ||
|
||
### ReST batch sizes | ||
-v VEP (#variants, default/max allowed = 150) | ||
-b ClinVar summary (#variants, default/max allowed = 500) | ||
-B ClinVar searchsize (#variants, default/max allowed = 50) | ||
ReST API's usually have limits on the amount of data sent or received. | ||
Exceeding these batch sizes would normally lead to warnings and/or IP blockage, but CharGer and BioMine try to keep batches at safe sizes. Last updated limits February 2017. | ||
|
||
### Custom columns (0-based) | ||
-G HUGO gene symbol | ||
-X chromosome | ||
-S start position | ||
-P stop position | ||
-R reference allele | ||
-A alternate allele | ||
-s strand | ||
-M sample name | ||
-C codon | ||
-p peptide change | ||
-L variant classification | ||
-F allele frequency | ||
Use these for .tsv and/or .maf input variant files to specify columns of relevant data. | ||
CharGer makes use of genomic and protein variant annotations, so the more data made available the better your results. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,16 @@ | ||
#!/bin/python | ||
# CharGer - Characterization of Germline variants | ||
# author: Adam D Scott ([email protected]) & Kuan-lin Huang ([email protected]) | ||
# version: v0.1.0 - 2015*12 | ||
# version: v0.1.0 - 2016*04 | ||
|
||
import sys | ||
import getopt | ||
from charger import charger | ||
import time | ||
|
||
def parseArgs( argv ): | ||
helpText = "Usage: " | ||
helpText = "\nCharGer - v0.1.0\n\n" | ||
helpText += "Usage: " | ||
helpText += "charger <input file> [options]\n\n" | ||
helpText += "Accepted input data files:\n" | ||
helpText += " -m Standard .maf\n" | ||
|
@@ -29,37 +30,37 @@ def parseArgs( argv ): | |
helpText += " -O override with ClinVar description (flag)\n" | ||
helpText += " -D suppress needing disease specific (flag)\n" | ||
helpText += "Cross-reference data files:\n" | ||
helpText += " -z pathogenic variants .vcf\n" | ||
helpText += " -e expression matrix file .tsv\n" | ||
helpText += " -g gene list file .txt\n" | ||
helpText += " -d diseases file (format: gene\\tdisease\\tmode_of_inheritance) .tsv\n" | ||
helpText += " -n de novo file .?\n" | ||
helpText += " -a assumed de novo file .?\n" | ||
helpText += " -c co-segregation file .?\n" | ||
helpText += " -H HotSpot3D clusters file .clusters\n" | ||
helpText += " -r recurrence threshold (default = )\n" | ||
helpText += " -z pathogenic variants, .vcf\n" | ||
helpText += " -e expression matrix file, .tsv\n" | ||
helpText += " -g gene list file, (format: gene\\tdisease\\tmode_of_inheritance) .txt\n" | ||
helpText += " -d diseases file, (format: gene\\tdisease\\tmode_of_inheritance) .tsv\n" | ||
helpText += " -n de novo file, standard .maf\n" | ||
helpText += " -a assumed de novo file, standard .maf\n" | ||
helpText += " -c co-segregation file, standard .maf\n" | ||
helpText += " -H HotSpot3D clusters file, .clusters\n" | ||
helpText += " -r recurrence threshold (default = 2)\n" | ||
helpText += "Local VEP (works with .vcf input only; suppresses ReST too):\n" | ||
helpText += " --vep-script Path to VEP\n" | ||
helpText += " --vep-dir Path to VEP directory\n" | ||
helpText += " --vep-cache Path to VEP cache directory\n" | ||
helpText += " --vep-version VEP version (default = 87)\n" | ||
helpText += " --vep-output VEP output file (default = charger.vep.vcf)\n" | ||
helpText += " --grch assembly GRCh verion (default = 37)\n" | ||
helpText += " --ensembl-release Ensembl release version (default = 74)\n" | ||
helpText += " --ensembl-release Ensembl release version (default = 75)\n" | ||
helpText += " --reference-fasta VEP reference fasta\n" | ||
helpText += " --fork Number of forked processes used in VEP (default = 0) \n" | ||
helpText += "Local databases (suppresses ReST too):\n" | ||
helpText += " --exac-vcf ExAC vcf.gz\n" | ||
#helpText += " --clinvar-tsv ClinVar (.tsv.gz download)\n" | ||
#helpText += " --clinvar-vcf ClinVar (.vcf.gz download)\n" | ||
helpText += " --mac-clinvar-tsv ClinVar from MacArthur lab (clinvar_alleles.tsv.gz)\n" | ||
helpText += " --mac-clinvar-vcf ClinVar from MacArthur lab (clinvar_alleles.vcf.gz)\n" | ||
#helpText += " --mac-clinvar-vcf ClinVar from MacArthur lab (clinvar_alleles.vcf.gz)\n" | ||
helpText += "Filters:\n" | ||
helpText += " --rare Allele frequency threshold for rare/common (default = 1, process variant with any frequency):\n" | ||
helpText += " --vcf-any-filter Allow variants that do not pass all filters in .vcf input (flag)\n" | ||
helpText += " --mutation-types Comma delimited list of types to allow\n" | ||
helpText += " --mutation-types Comma delimited list (no spaces) of types to allow\n" | ||
helpText += "ReST batch sizes:\n" | ||
helpText += " -v VEP (#variants, default/max allowed = )\n" | ||
helpText += " -v VEP (#variants, default/max allowed = 150)\n" | ||
helpText += " -b ClinVar summary (#variants, default/max allowed = 500)\n" | ||
helpText += " -B ClinVar searchsize (#variants, default/max allowed = 50)\n" | ||
helpText += "Custom columns (0-based)\n" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters