-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry on Using MTAG with Highly Correlated Traits from Overlapping Samples #206
Comments
In our simulations and based on the theory, having a large amount of
overlap and high phenotype/genetic correlation should not cause problems
for MTAG, though the benefits of MTAG will likely be low. We did not test
very extreme cases of this though, so if you notice that anything looks
funny, I would be cautious.
…On Tue, Mar 12, 2024 at 5:29 AM YingkaiSun ***@***.***> wrote:
Hello:
I am currently working on a project where I intend to use MTAG to analyze
genetic data for traits with high genetic correlation. The samples for
these traits are completely overlapping. I understand that MTAG is designed
to enhance statistical power by leveraging the genetic correlation between
traits, even when the samples overlap. However, I have some concerns and
would appreciate your insights.
If I want to use MTAG in scenarios where two traits have almost completely
genetic correlation (near to 100%) and are derived from the same sample
source, such as the UK Biobank. In this context, does applying MTAG
artificially inflate the sample size, given the complete overlap and
substantial genetic correlation between the traits? I'm curious about
whether this approach is statistically valid and how the MTAG model adjusts
for the increased genetic similarity and sample overlap to avoid potential
overestimation of statistical power.
Thank you for your time and assistance.
Best regards,
Sun Yingkai
—
Reply to this email directly, view it on GitHub
<#206>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFBUB5JNC5JLTS23WU4NPQTYX3DIPAVCNFSM6AAAAABER3DFU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DCMJTGM2DCOA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you for your efficient response. I would like to delve deeper into the point you made about the benefits of MTAG likely being low in scenarios with high overlap and high correlation. My understanding is that in theoretical extremes, MTAG's influence on statistical power can be seen as bounded by two limits: one where there is no overlap (Overlap = 0) and complete correlation (Correlation = 1), representing the maximum statistical power enhancement MTAG can provide (equal to a meta-analysis for one trait based on two independent cohorts), and the other extreme where there is complete overlap (Overlap = 1) but no correlation (Correlation = 0), representing no enhancement in statistical power by MTAG. In real-world applications, the situation often falls between these two limits. My perception is that MTAG enhances statistical power through the correlation between traits, but this enhancement is tempered by the degree of sample overlap. Hence, in situations where both correlation and overlap are high, the marginal benefit of MTAG on statistical power becomes minimal. Is this a correct understanding of how MTAG operates, or am I overlooking some aspects? Thank you!
|
Technically, MTAG's power is maximized when the difference between the
genetic correlation and the correlation of the estimation error. This will
happen when there is perfect sample overlap, a high genetic correlation,
and high phenotypic correlation that has an opposite sign to the genetic
correlation. I know of no phenotypes where the genetic and phenotypic
correlation have opposites signs, so conditionally on them having the same
sign, you are right that the best you can do is no overlap and a high
genetic correlation (all else equal). You may be able to get a larger
sample size by allowing for overlap though, so in practical settings, the
best option will often be to use summary statistics with overlapping
samples.
On Tue, Mar 12, 2024 at 12:35 PM YingkaiSun ***@***.***>
wrote:
… Thank you for your efficient response. I would like to delve deeper into
the point you made about the benefits of MTAG likely being low in scenarios
with high overlap and high correlation. My understanding is that in
theoretical extremes, MTAG's influence on statistical power can be seen as
bounded by two limits: one where there is no overlap (Overlap = 0) and
complete correlation (Correlation = 1), representing the maximum
statistical power enhancement MTAG can provide (equal to a meta-analysis
for one trait based on two independent cohorts), and the other extreme
where there is complete overlap (Overlap = 1) but no correlation
(Correlation = 0), representing no enhancement in statistical power by MTAG.
In real-world applications, the situation often falls between these two
limits. My perception is that MTAG enhances statistical power through the
correlation between traits, but this enhancement is tempered by the degree
of sample overlap. Hence, in situations where both correlation and overlap
are high, the marginal benefit of MTAG on statistical power becomes
minimal. Is this a correct understanding of how MTAG operates, or am I
overlooking some aspects?
Thank you!
In our simulations and based on the theory, having a large amount of
overlap and high phenotype/genetic correlation should not cause problems
for MTAG, though the benefits of MTAG will likely be low. We did not test
very extreme cases of this though, so if you notice that anything looks
funny, I would be cautious.
… <#m_-7133601274099539056_>
On Tue, Mar 12, 2024 at 5:29 AM YingkaiSun *@*.*> wrote: Hello: I am
currently working on a project where I intend to use MTAG to analyze
genetic data for traits with high genetic correlation. The samples for
these traits are completely overlapping. I understand that MTAG is designed
to enhance statistical power by leveraging the genetic correlation between
traits, even when the samples overlap. However, I have some concerns and
would appreciate your insights. If I want to use MTAG in scenarios where
two traits have almost completely genetic correlation (near to 100%) and
are derived from the same sample source, such as the UK Biobank. In this
context, does applying MTAG artificially inflate the sample size, given the
complete overlap and substantial genetic correlation between the traits?
I'm curious about whether this approach is statistically valid and how the
MTAG model adjusts for the increased genetic similarity and sample overlap
to avoid potential overestimation of statistical power. Thank you for your
time and assistance. Best regards, Sun Yingkai — Reply to this email
directly, view it on GitHub <#206
<#206>>, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AFBUB5JNC5JLTS23WU4NPQTYX3DIPAVCNFSM6AAAAABER3DFU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DCMJTGM2DCOA
<https://github.com/notifications/unsubscribe-auth/AFBUB5JNC5JLTS23WU4NPQTYX3DIPAVCNFSM6AAAAABER3DFU2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DCMJTGM2DCOA>
. You are receiving this because you are subscribed to this thread.Message
ID: @.*>
—
Reply to this email directly, view it on GitHub
<#206 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFBUB5MHUVAV6MXP2JEMCGLYX4VD7AVCNFSM6AAAAABER3DFU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJSGA4DMMBXGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I got it, Thank you !
|
Hello:
I am currently working on a project where I intend to use MTAG to analyze genetic data for traits with high genetic correlation. The samples for these traits are completely overlapping. I understand that MTAG is designed to enhance statistical power by leveraging the genetic correlation between traits, even when the samples overlap. However, I'm curious about that, in a scenarios where two traits exhibit almost complete genetic correlation and are derived from the same sample source, does applying MTAG artificially inflate the sample size (Does it equal to count the effect of each SNP twice)?
I know this is just a theoretical scenarios. I have tried to use two same GWAS datasets in MTAG to test this theoretical question and encounter the 'Singular matrix‘ error, but I wonder whether MTAG is statistically valid under such condition, and how the MTAG model adjusts for the increased genetic similarity and sample overlap to avoid potential overestimation of statistical power.
Thank you for your time and assistance.
Best regards,
Sun Yingkai
The text was updated successfully, but these errors were encountered: