Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry on Using MTAG with Highly Correlated Traits from Overlapping Samples #206

Open
YingkaiSun opened this issue Mar 12, 2024 · 4 comments

Comments

@YingkaiSun
Copy link

YingkaiSun commented Mar 12, 2024

Hello:

I am currently working on a project where I intend to use MTAG to analyze genetic data for traits with high genetic correlation. The samples for these traits are completely overlapping. I understand that MTAG is designed to enhance statistical power by leveraging the genetic correlation between traits, even when the samples overlap. However, I'm curious about that, in a scenarios where two traits exhibit almost complete genetic correlation and are derived from the same sample source, does applying MTAG artificially inflate the sample size (Does it equal to count the effect of each SNP twice)?

I know this is just a theoretical scenarios. I have tried to use two same GWAS datasets in MTAG to test this theoretical question and encounter the 'Singular matrix‘ error, but I wonder whether MTAG is statistically valid under such condition, and how the MTAG model adjusts for the increased genetic similarity and sample overlap to avoid potential overestimation of statistical power.

Thank you for your time and assistance.

Best regards,
Sun Yingkai

@paturley
Copy link
Collaborator

paturley commented Mar 12, 2024 via email

@YingkaiSun
Copy link
Author

Thank you for your efficient response. I would like to delve deeper into the point you made about the benefits of MTAG likely being low in scenarios with high overlap and high correlation. My understanding is that in theoretical extremes, MTAG's influence on statistical power can be seen as bounded by two limits: one where there is no overlap (Overlap = 0) and complete correlation (Correlation = 1), representing the maximum statistical power enhancement MTAG can provide (equal to a meta-analysis for one trait based on two independent cohorts), and the other extreme where there is complete overlap (Overlap = 1) but no correlation (Correlation = 0), representing no enhancement in statistical power by MTAG.

In real-world applications, the situation often falls between these two limits. My perception is that MTAG enhances statistical power through the correlation between traits, but this enhancement is tempered by the degree of sample overlap. Hence, in situations where both correlation and overlap are high, the marginal benefit of MTAG on statistical power becomes minimal. Is this a correct understanding of how MTAG operates, or am I overlooking some aspects?

Thank you!

In our simulations and based on the theory, having a large amount of overlap and high phenotype/genetic correlation should not cause problems for MTAG, though the benefits of MTAG will likely be low. We did not test very extreme cases of this though, so if you notice that anything looks funny, I would be cautious.

On Tue, Mar 12, 2024 at 5:29 AM YingkaiSun @.> wrote: Hello: I am currently working on a project where I intend to use MTAG to analyze genetic data for traits with high genetic correlation. The samples for these traits are completely overlapping. I understand that MTAG is designed to enhance statistical power by leveraging the genetic correlation between traits, even when the samples overlap. However, I have some concerns and would appreciate your insights. If I want to use MTAG in scenarios where two traits have almost completely genetic correlation (near to 100%) and are derived from the same sample source, such as the UK Biobank. In this context, does applying MTAG artificially inflate the sample size, given the complete overlap and substantial genetic correlation between the traits? I'm curious about whether this approach is statistically valid and how the MTAG model adjusts for the increased genetic similarity and sample overlap to avoid potential overestimation of statistical power. Thank you for your time and assistance. Best regards, Sun Yingkai — Reply to this email directly, view it on GitHub <#206>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5JNC5JLTS23WU4NPQTYX3DIPAVCNFSM6AAAAABER3DFU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DCMJTGM2DCOA . You are receiving this because you are subscribed to this thread.Message ID: @.>

@paturley
Copy link
Collaborator

paturley commented Mar 12, 2024 via email

@YingkaiSun
Copy link
Author

I got it, Thank you !

Technically, MTAG's power is maximized when the difference between the genetic correlation and the correlation of the estimation error. This will happen when there is perfect sample overlap, a high genetic correlation, and high phenotypic correlation that has an opposite sign to the genetic correlation. I know of no phenotypes where the genetic and phenotypic correlation have opposites signs, so conditionally on them having the same sign, you are right that the best you can do is no overlap and a high genetic correlation (all else equal). You may be able to get a larger sample size by allowing for overlap though, so in practical settings, the best option will often be to use summary statistics with overlapping samples. On Tue, Mar 12, 2024 at 12:35 PM YingkaiSun @.> wrote:

Thank you for your efficient response. I would like to delve deeper into the point you made about the benefits of MTAG likely being low in scenarios with high overlap and high correlation. My understanding is that in theoretical extremes, MTAG's influence on statistical power can be seen as bounded by two limits: one where there is no overlap (Overlap = 0) and complete correlation (Correlation = 1), representing the maximum statistical power enhancement MTAG can provide (equal to a meta-analysis for one trait based on two independent cohorts), and the other extreme where there is complete overlap (Overlap = 1) but no correlation (Correlation = 0), representing no enhancement in statistical power by MTAG. In real-world applications, the situation often falls between these two limits. My perception is that MTAG enhances statistical power through the correlation between traits, but this enhancement is tempered by the degree of sample overlap. Hence, in situations where both correlation and overlap are high, the marginal benefit of MTAG on statistical power becomes minimal. Is this a correct understanding of how MTAG operates, or am I overlooking some aspects? Thank you! In our simulations and based on the theory, having a large amount of overlap and high phenotype/genetic correlation should not cause problems for MTAG, though the benefits of MTAG will likely be low. We did not test very extreme cases of this though, so if you notice that anything looks funny, I would be cautious. … <#m_-7133601274099539056_> On Tue, Mar 12, 2024 at 5:29 AM YingkaiSun @.
> wrote: Hello: I am currently working on a project where I intend to use MTAG to analyze genetic data for traits with high genetic correlation. The samples for these traits are completely overlapping. I understand that MTAG is designed to enhance statistical power by leveraging the genetic correlation between traits, even when the samples overlap. However, I have some concerns and would appreciate your insights. If I want to use MTAG in scenarios where two traits have almost completely genetic correlation (near to 100%) and are derived from the same sample source, such as the UK Biobank. In this context, does applying MTAG artificially inflate the sample size, given the complete overlap and substantial genetic correlation between the traits? I'm curious about whether this approach is statistically valid and how the MTAG model adjusts for the increased genetic similarity and sample overlap to avoid potential overestimation of statistical power. Thank you for your time and assistance. Best regards, Sun Yingkai — Reply to this email directly, view it on GitHub <#206 <#206>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5JNC5JLTS23WU4NPQTYX3DIPAVCNFSM6AAAAABER3DFU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DCMJTGM2DCOA https://github.com/notifications/unsubscribe-auth/AFBUB5JNC5JLTS23WU4NPQTYX3DIPAVCNFSM6AAAAABER3DFU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DCMJTGM2DCOA . You are receiving this because you are subscribed to this thread.Message ID: @.
> — Reply to this email directly, view it on GitHub <#206 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5MHUVAV6MXP2JEMCGLYX4VD7AVCNFSM6AAAAABER3DFU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJSGA4DMMBXGE . You are receiving this because you commented.Message ID: @.
**>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants