Skip to content

frichter/ore

Repository files navigation

ORE: Outlier-RV enrichment

ORE identifies outlier genes with more rare variants than expected by chance (and vice-versa). Paper in Bioinformatics.

Cursory use of ORE (outlier-RV enrichment) is provided here, visit the latest ORE documentation for more details. Confirm the following are installed:

Then, on the command line, install with

pip install ore

Example run

ore --vcf test.vcf.gz \
    --bed test.bed.gz \
    --output ore_results \
    --distribution normal \
    --threshold 2 3 4 \
    --max_outliers_per_id 500 \
    --af_rare 0.05 0.01 1e-3 \
    --tss_dist 5000

Variants and gene expression are specified with --vcf (line 1) and --bed (line 2), respectively. The output prefix is provided with --output (line 3). In this example, the outlier specifications --distribution (line 4), --threshold (line 5), and --max_outliers_per_id (line 6) indicate that outliers are defined using a normal distribution with a z-score more extreme than two, and samples with more than 500 outliers are excluded. Variant information is specified with --af_rare (line 7) and --tss_dist (line 8) to encode that variants are defined as rare with a intra-cohort allele frequency at varying thresholds (≤ 0.05, 0.01, and 0.001), and to only use variants within 5 kb of the TSS.

Usage, visit the latest ORE documentation for more

ore [-h] [--version] -v VCF -b BED [-o OUTPUT]
         [--outlier_output OUTLIER_OUTPUT] [--enrich_file ENRICH_FILE]
         [--extrema] [--distribution {normal,rank,custom}]
         [--threshold [THRESHOLD [THRESHOLD ...]]]
         [--max_outliers_per_id MAX_OUTLIERS_PER_ID]
         [--af_rare [AF_RARE [AF_RARE ...]]] [--af_vcf]
         [--intracohort_rare_ac INTRACOHORT_RARE_AC]
         [--af_min [AF_MIN [AF_MIN ...]]] [--gq GQ] [--dp DP]
         [--aar AAR AAR] [--tss_dist [TSS_DIST [TSS_DIST ...]]] [--upstream]
         [--downstream] [--annovar]
         [--variant_class {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA,ncRNA_exonic}]
         [--exon_class {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss}]
         [--refgene] [--ensgene] [--annovar_dir ANNOVAR_DIR]
         [--humandb_dir HUMANDB_DIR] [--processes PROCESSES] [--clean_run]
Required arguments:
-v VCF, --vcf VCF
 Location of VCF file. Must be tabixed!
-b BED, --bed BED
 Gene expression file location. Must be tabixed!
Optional file locations:
-o OUTPUT, --output OUTPUT
 Output prefix (default is VCF prefix)
--outlier_output OUTLIER_OUTPUT
 Outlier filename (default is VCF prefix)
--enrich_file ENRICH_FILE
 Output file for enrichment odds ratios and p-values (default is VCF prefix)
Optional outlier arguments:
--extrema Only the most extreme value is an outlier
--distribution DISTRIBUTION
 Outlier distribution. Options: {normal,rank,custom}
--threshold THRESHOLD
 Expression threshold for defining outliers. Must be greater than 0 for normal or (0,0.5) non-inclusive with rank. Ignored with custom
--max_outliers_per_id MAX_OUTLIERS_PER_ID
 Maximum number of outliers per ID
Optional variant-related arguments:
--af_rare AF_RARE
 AF cut-off below which a variant is considered rare (space separated list e.g., 0.1 0.05)
--af_vcf Use the VCF AF field to define an allele as rare.
--intracohort_rare_ac INTRACOHORT_RARE_AC
 Allele COUNT to be used instead of intra-cohort allele frequency. (still uses af_rare for population level AF cut-off)
--af_min AF_MIN
 Lower bound on AF cut-offs for --af_rare, must be same length as --af_rare (e.g., with --af_rare 0.01 0.5 and --af_min 0 0.05 ORE will compare variants within [0,0.01] and [0.05,0.5] to other variants).
--gq GQ Minimum genotype quality each variant in each individual
--dp DP Minimum depth per variant in each individual
--aar AAR Alternate allelic ratio for heterozygous variants (provide two space-separated numbers between 0 and 1, e.g., 0.2 0.8)
--tss_dist TSS_DIST
 Variants within this distance of the TSS are considered
--upstream Only variants UPstream of TSS
--downstream Only variants DOWNstream of TSS
Optional arguments for using ANNOVAR:
--annovar Use ANNOVAR to specify allele frequencies and functional class
--variant_class
 Only variants in these classes will be considered. Options: {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA}
--exon_class Only variants with these exonic impacts will be considered. Options: {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss}
--refgene Filter on RefGene function.
--ensgene Filter on ENSEMBL function.
--annovar_dir ANNOVAR_DIR
 Directory of the table_annovar.pl script
--humandb_dir HUMANDB_DIR
 Directory of ANNOVAR data (refGene, ensGene, and gnomad_genome)
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--processes PROCESSES
 Number of CPU processes
--clean_run Delete temporary files from the previous run

Felix Richter <[email protected]>