Gene level threshold #42

Vibha-Acharya · 2024-09-24T20:25:54Z

Dear eQTLCatalogue team,

We are using eQTL catalogue to discover eQTLs in our selected variants. I have several confusions regarding the datasets and was wondering if you could provide some insights into it.

What is the difference between the *.cc.tsv files versus the *.all.tsv.gz files in the repository,

Also, we were thinking to apply gene-based threshold similar to GTEx pipeline , where the genome-wide threshold is determined using FDR and beta parameters of the minimum P value from permutation is used to determine the nominal P value threshold for each gene.

Although the p_beta, p_perm are available in *.permutation files, the beta parameters (shape_1, shape_2 aka alpha, beta) are not available.

I wonder if there is a way we can determine the pvalue threshold for each gene for all variants,

Thank you so much,

Regards,
Vibha

VitorAguiar · 2024-10-04T20:22:16Z

I'm not related to the eQTL Catalogue, but regarding your first question, once @kauralasoo told me that the "*.all.tsv.gz" files contain full summary statistics, and "*.cc.tsv.gz" contain all genes that had FDR < 0.01 and at least one fine mapped credible set.

kauralasoo · 2024-11-07T12:06:38Z

Dear Vibha,

Sorry for the long delay in responding to this. Regarding the difference between ".cc.tsv.gz" and ".all.tsv.gz", please see this section from the eQTL Catalogue 2023 paper:

Fine-mapping-based filtering of transcript-level summary statistics.
A major challenge in working with exon- and transcript-level (transcript usage, txrevise, leafcutter) associations is the large number of correlated traits being tested that result in very large summary statistics files. For example, typical summary statistics for exon and txrevise QTLs are 15–20 times larger than the corresponding files for gene expression QTLs. In addition to complicating our data release and archival procedures, these large file sizes meant that performing comprehensive colocalisation analysis against the eQTL Catalogue required the downloading and processing of >15Tb of data. To reduce the size of these files, we have now implemented fine-mapping based filtering. Briefly, we are using fine mapped credible sets to identify all independent signals at the gene level. We then filter the summary statistics files to only retain the most strongly associated molecular trait (exon, transcript, txrevise event or Leafcutter splice junction) for each signal. This filtering reduces the size of the summary statistics files for those quantification methods by ~98% while retaining the vast majority of significant associations for colocalisation purposes. Reducing the size of the univariate summary statistics files has also allowed us to export SuSiE log Bayes factors for each fine mapped signal and all tested variants [14]. As illustrated below, these log Bayes factors can be directly used in the new coloc.susie method to perform colocalisation analysis between all pairs of independent signals [13].

The ".cc.tsv.gz" files are filtred based on the fine mapping results and the and ".all.tsv.gz" are the unfiltered summary statistics.

There is no option right now to determine a pvalue threshold for each gene for all variants based on the permutation results. Your best option is to either use a fixed genome-wide threshold (e.g. p < 1e-6 or something similar) or use the fine mapped credible sets to identify multiple indepedent signals per gene.

Best wishes,
Kaur

Vibha-Acharya · 2024-11-12T21:29:03Z

Thank you so much Dr. Alasoo and Vitor :)

kauralasoo closed this as completed Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene level threshold #42

Gene level threshold #42

Vibha-Acharya commented Sep 24, 2024

VitorAguiar commented Oct 4, 2024

kauralasoo commented Nov 7, 2024

Vibha-Acharya commented Nov 12, 2024

Gene level threshold #42

Gene level threshold #42

Comments

Vibha-Acharya commented Sep 24, 2024

VitorAguiar commented Oct 4, 2024

kauralasoo commented Nov 7, 2024

Vibha-Acharya commented Nov 12, 2024