Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene level threshold #42

Closed
Vibha-Acharya opened this issue Sep 24, 2024 · 3 comments
Closed

Gene level threshold #42

Vibha-Acharya opened this issue Sep 24, 2024 · 3 comments

Comments

@Vibha-Acharya
Copy link

Dear eQTLCatalogue team,

We are using eQTL catalogue to discover eQTLs in our selected variants. I have several confusions regarding the datasets and was wondering if you could provide some insights into it.

What is the difference between the *.cc.tsv files versus the *.all.tsv.gz files in the repository,

Also, we were thinking to apply gene-based threshold similar to GTEx pipeline , where the genome-wide threshold is determined using FDR and beta parameters of the minimum P value from permutation is used to determine the nominal P value threshold for each gene.

Although the p_beta, p_perm are available in *.permutation files, the beta parameters (shape_1, shape_2 aka alpha, beta) are not available.

I wonder if there is a way we can determine the pvalue threshold for each gene for all variants,

Thank you so much,

Regards,
Vibha

@VitorAguiar
Copy link

I'm not related to the eQTL Catalogue, but regarding your first question, once @kauralasoo told me that the "*.all.tsv.gz" files contain full summary statistics, and "*.cc.tsv.gz" contain all genes that had FDR < 0.01 and at least one fine mapped credible set.

@kauralasoo
Copy link
Member

Dear Vibha,

Sorry for the long delay in responding to this. Regarding the difference between ".cc.tsv.gz" and ".all.tsv.gz", please see this section from the eQTL Catalogue 2023 paper:

Fine-mapping-based filtering of transcript-level summary statistics.
A major challenge in working with exon- and transcript-level (transcript usage, txrevise, leafcutter) associations is the large number of correlated traits being tested that result in very large summary statistics files. For example, typical summary statistics for exon and txrevise QTLs are 15–20 times larger than the corresponding files for gene expression QTLs. In addition to complicating our data release and archival procedures, these large file sizes meant that performing comprehensive colocalisation analysis against the eQTL Catalogue required the downloading and processing of >15Tb of data. To reduce the size of these files, we have now implemented fine-mapping based filtering. Briefly, we are using fine mapped credible sets to identify all independent signals at the gene level. We then filter the summary statistics files to only retain the most strongly associated molecular trait (exon, transcript, txrevise event or Leafcutter splice junction) for each signal. This filtering reduces the size of the summary statistics files for those quantification methods by ~98% while retaining the vast majority of significant associations for colocalisation purposes. Reducing the size of the univariate summary statistics files has also allowed us to export SuSiE log Bayes factors for each fine mapped signal and all tested variants [14]. As illustrated below, these log Bayes factors can be directly used in the new coloc.susie method to perform colocalisation analysis between all pairs of independent signals [13].

The ".cc.tsv.gz" files are filtred based on the fine mapping results and the and ".all.tsv.gz" are the unfiltered summary statistics.

There is no option right now to determine a pvalue threshold for each gene for all variants based on the permutation results. Your best option is to either use a fixed genome-wide threshold (e.g. p < 1e-6 or something similar) or use the fine mapped credible sets to identify multiple indepedent signals per gene.

Best wishes,
Kaur

@Vibha-Acharya
Copy link
Author

Thank you so much Dr. Alasoo and Vitor :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants