Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design UI for cis-eQTL data #16

Open
grosscol opened this issue May 17, 2023 · 0 comments
Open

Design UI for cis-eQTL data #16

grosscol opened this issue May 17, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@grosscol
Copy link
Collaborator

grosscol commented May 17, 2023

Background

There is eQTL scan results for some of the genes and variants in the TOPMed set. Starting out using the Whole Blood tissue sample, there is SuSie fine mapping data (488k variants) and conditional regression data (108k varints) which have ~50k variants when collated.

Cis-eQTL Scan Data

SuSie data columns:

  • Column descriptions:
    • phenotype_id: gene identifier
    • variant_id: genetic variant, in format {chr}_{pos}_{ref}_{alt}
    • pip: SuSiE PIP (essentially, the probability the variant is a causal one for this eQTL signal)
    • af: frequency of the alt allele
    • cs_id: Credible set ID. cs_id + phenotype_id together uniquely identify a credible set. A credible set containing more than one genetic variant will span more than one line.

Conditional regression data columns:
Significant independent eQTL signals for each gene (generated using forward-backward linear regression)

  • Column descriptions:
    • phenotype_id: gene identifier
    • num_var: number of genetic variants tested for association with this gene's expression
    • beta_shape1: First beta distribution parameter used when computing beta-approximated p-value (see FastQTL publication [1]). When there are multiple independent eQTL signals for a gene, this is computed during the backward step, i.e. controlling for each of the gene's other independent eQTL signals.
    • beta_shape2: Second beta distribution parameter used when computing beta-approximated p-value (see FastQTL publication [1]). When there are multiple independent eQTL signals for a gene, this is computed during the backward step, i.e. controlling for each of the gene's other independent eQTL signals.
    • true_df: estimated true degrees of freedom (used when computing beta-approximated p-value; see FastQTL publication [1]). When there are multiple independent eQTL signals for a gene, this is computed during the backward step, i.e. controlling for each of the gene's other independent eQTL signals.
    • pval_true_df: p-value calculated using true_df (used when computing beta-approximated p-value; see FastQTL publication [1]). When there are multiple independent eQTL signals for a gene, this is computed during the backward step, i.e. controlling for each of the gene's other independent eQTL signals.
    • variant_id: genetic variant, in format {variant_chromosome}{variant_position}{variant_ref_allele}_{variant_alt_allele}
    • tss_distance: (signed) distance between the gene TSS and the genetic variant
    • ma_samples: number of samples having the minor allele
    • ma_count: minor allele count
    • af: frequency of the alt allele
    • pval_nominal: nominal p-value for association between the gene expression and genetic variant allele dosage. Note that due to underflow, some p-values may be equal to 0. When there are multiple independent eQTL signals for a gene, this is computed during the backward step, i.e. controlling for each of the gene's other independent eQTL signals.
    • slope: linear regression estimated slope for the allele dosage term when modeling association between gene expression and genetic variant. The effect allele is always the alt allele (which can be inferred from the variant_id as described above), such that in the case of a significant association between gene expression and genetic variant, slope greater than 0 indicates that the alt allele favors higher expression of the gene. When there are multiple independent eQTL signals for a gene, this is computed during the backward step, i.e. controlling for each of the gene's other independent eQTL signals.
    • slope_se: standard error of the estimated slope
    • pval_perm: empirical p-value for association between the gene expression and genetic variant, adjusted for multiple testing at the gene level (i.e. testing many variants against this one gene; NOT genome-wide corrected) using permutations (see FastQTL publication [1]). When there are multiple independent eQTL signals for a gene, this is computed during the backward step, i.e. controlling for each of the gene's other independent eQTL signals.
    • pval_beta: p-value for association between the gene expression and genetic variant, adjusted for multiple testing at the gene level (i.e. testing many variants against this one gene; NOT genome-wide corrected) using the fitted beta distribution (see FastQTL publication [1]). Note that due to underflow, some p-values may be equal to 0. When there are multiple independent eQTL signals for a gene, this is computed during the backward step, i.e. controlling for each of the gene's other independent eQTL signals.
    • rank: rank of the association, based on the order in which the association was discovered during the forward step

Design Directions

Questions

What questions should the UI answer?

  • Does this gene have any eQTLs?
  • How many eQTLs does this gene have?
  • How many eQTLs are in this region?
  • Which genes in this region have eQTL data?
  • What is the magnitude of the effect of an eQTL on a gene?
  • How distant is the eQTL from the gene it is affecting the expression of?
  • What tissue samples have eQTLs in this gene or region?
  • What tissue sample is an eQTL from?
  • Which gene in this region has the most eQTLs?

Table

Re-using the tabulator-tables dependency that is currently used to display a table of variants in a region or gene, design new or adapt the variant table for eQTLs.

  • Separate table?
  • Alternate table swapped for variant table?
  • Additional columns to variant table?

Figure

Visual representation of the genomic region of the gene(s) and associated eQTLs.

Completion Criteria

  • A design or sketch for the UI element is attached.
  • The data required to power the element is specified.
  • A corresponding issue in Bravo API is made
@grosscol grosscol added the enhancement New feature or request label May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant