Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Interpretation of 5hmC and 5mC #384

Open
Proy321 opened this issue Feb 24, 2025 · 7 comments
Open

Clarification on Interpretation of 5hmC and 5mC #384

Proy321 opened this issue Feb 24, 2025 · 7 comments
Labels
question Looking for clarification on inputs and/or outputs

Comments

@Proy321
Copy link

Proy321 commented Feb 24, 2025

Hello @ArtRand

I have a query regarding the interpretation of the attached screenshot. Specifically, I would like clarification on the following points:

In the b_counts column, for the positions:
Start: 813563 | End: 813564 – Since the counts show h:1, m:2, does this indicate that one strand of DNA carries 5hmC and the other strand carries 5mC?
Start: 753496 | End: 753497 – Since the counts show h:0, m:3, does this mean that only 5mC is present on one strand of DNA and there is no 5hmC on the complementary strand?
I want to ensure that I am interpreting this correctly. Kindly let me know if my understanding is accurate or if there’s another explanation I should consider.

Thanks & Regards
Priyanka Roy

Image
@ArtRand
Copy link
Contributor

ArtRand commented Feb 24, 2025

Hello @Proy321,

Start: 813563 | End: 813564 – Since the counts show h:1, m:2, does this indicate that one strand of DNA carries 5hmC and the other strand carries 5mC?

This means there was one read with a 5hmC call and 2 reads with 5mC calls. If by "strand" you mean "read" or DNA molecule, then yes. But often people use "strand" to mean the positive or negative strand with respect to the reference - this is not what that means.

Start: 753496 | End: 753497 – Since the counts show h:0, m:3, does this mean that only 5mC is present on one strand of DNA and there is no 5hmC on the complementary strand?

Same explanation as above, three reads had 5mC calls, and zero reads had 5hmC calls (but 5hmC probabilities were present).

I think you've got the right idea.

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Feb 24, 2025
@PRIYANKA-22091995
Copy link

Hello @ArtRand
Thank you for your response. I have a couple of follow-up questions for further clarification.

When both h:1 and m:2 are present at a given position, does this indicate that 5mC is being converted into 5hmC? For instance, at the position Start: 813563 | End: 813564, since it represents a single position, should I interpret this as 5mC undergoing conversion to 5hmC at the same site?

In another case, at Start: 753496 | End: 753497, where h:0 and m:3, only 5mC is present. Given that there is no call for 5hmC, why is this position included in the h,m context? Should it not be reflected solely in the m context? A possible explanation for this would be helpful.

Looking forward to your insights on this.

Thanks & Regards
Priyanka Roy

@Proy321
Copy link
Author

Proy321 commented Feb 27, 2025

Hello @ArtRand

It would be nice to have your inputs on the above queries.

Thanks & Regards
Priyanka Roy

@ArtRand
Copy link
Contributor

ArtRand commented Feb 27, 2025

Hello @Proy321

I apologize for the delay.

When both h:1 and m:2 are present at a given position, does this indicate that 5mC is being converted into 5hmC?

I can't really say, this function in Modkit is really just a statistical test on counts, it's up to you to use these data to inform your biological question. What I would say, however, is that making too strong of a conclusion from ~5 reads might not be advised.

Given that there is no call for 5hmC, why is this position included in the h,m context? Should it not be reflected solely in the m context? A possible explanation for this would be helpful.

The output will report on all of the modifications encountered. So this record indicates that the base modification model output 5hmC probabilities, but that none of the passing calls were for 5hmC.

@Proy321
Copy link
Author

Proy321 commented Feb 28, 2025

Hello @ArtRand
Thank you so much for your response.I have a followup question regarding the same, and it would be nice too have your inputs on the same.
Specifically, I am trying to understand how both modifications can be present at a single position rather than being assigned to distinct positions. For example, I observe both 5mC and 5hmC at positions 813563–813564. Could you please clarify how this is possible.

Image

Additionally, I would appreciate your insights on the minimum read count threshold that should be considered for making a robust conclusion regarding DMR.

Thanks & Regards

@Proy321
Copy link
Author

Proy321 commented Mar 3, 2025

Hello @ArtRand

It would be nice to have your inputs on the above queries.

Thanks & Regards
Priyanka Roy

@ArtRand
Copy link
Contributor

ArtRand commented Mar 5, 2025

Hello @Proy321,

Specifically, I am trying to understand how both modifications can be present at a single position rather than being assigned to distinct positions. For example, I observe both 5mC and 5hmC at positions 813563–813564. Could you please clarify how this is possible.

What this table is showing you is that you have two reads reporting 5mC at position 813563 and one read reporting 5hmC. Generally speaking, base modifications can change at a given genomic position, thus individual reads/molecules will report different base modifications. What dmr tries to do is determine if the latent generative process that describes the observations between two conditions is different.

Additionally, I would appreciate your insights on the minimum read count threshold that should be considered for making a robust conclusion regarding DMR.

For larger effect sizes (>= 60%), 10 reads is probably sufficient. The MAP-based p-value will be higher (less significant) when the coverage is low. You can find the details of the MAP-based p-value and log-likelihood ratio score on the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

3 participants