-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3 Prime untemplated versus Mismatches #11
Comments
Hi, that is a very valid question, and this is the story: There is no enzyme in the literature that support the addition of other nts beside A/U, it is what I called template canonical additions. So those sequences will go away when The reason they are called addition is because there are more than 1 mismatch at the end, and it is easier to explain biologically that as addition than mismatches. But it is true that is arbitrary decision and probably will be wrong some times. It needs more research to have a better rules to decide this, that actually I am trying to do together with other researchers. I hope that helps. Cheers |
Hi,
So I guess any single true mismatch in the last three basepairs won't be called as a mismatch if there isn't any other change on the 3 prime end. In this case I don't think it is a real mismatch, and if I use your recommended method of removing 'non-canonical' additions and having a vaf cut-off of 0.2 then these wouldn't make it into my final set of isomiRs anyway but I wanted to point it out as it might not be the behaviour that everyone would expect. I totally understand that it isn't always easy to come up with rules that cover every case and is definitely still an open research question. |
Hi, thanks for looking into this. And actually, I agree totally with you. The rule is, if there are any mismatches in the last 3, then call it as un-template addition. Something that many time will be wrong, but difficult to come with the reality. Probably it would be better to do this, if it is 2 mismatches and not only one. This is something we can implement easily into mirtop project, that actually would be an output of bcbio, and can be converted into the mirna files needed by isomiRs package. Hopefully, we'll improve a lot all these calling during the next months when we get to compare the right data to come up with the best conclusion. If you are interesting to participate in that point of the project, let me know, and I would be happy to add you. Thanks for all the feedback you add here. Cheers |
Yes it is always tricky trying to figure out the best way to call things when there could be many ways of getting to the sequence we detect. Have you also thought about integrating dbSNP annotation to identify common SNPs that might be causing mismatches? I have found in my data there are a few mismatches that coincide with SNPs but it isn't so easy to track since the annotation output doesn't have genomic coordinates. It would be a great addition but could be kind of complicated to implement, particularly for the miRNAs that can come from multiple regions of the genome. No worries at all. Thanks for all the development you do for the tools in the miRNA and smallncRNA field! I had a look at the mirtop project and I'd be happy to help contribute if there's anything I can do. I will keep an eye on the issues there and see if I can help with anything. Cheers, |
Yes, actually this is something it would be good to have now that mirtop is centralizing the format. The code is there actually, and ideally for the next BOSC codeFest we can have this quite close to be a reality. I’ll add this to the list of issue in GitHub!
Thanks for keeping an eye on mirtop, I am sure you can help someway.
Cheers
… On Apr 17, 2018, at 12:11 PM, Marion ***@***.***> wrote:
Yes it is always tricky trying to figure out the best way to call things when there could be many ways of getting to the sequence we detect. Have you also thought about integrating dbSNP annotation to identify common SNPs that might be causing mismatches? I have found in my data there are a few mismatches that coincide with SNPs but it isn't so easy to track since the annotation output doesn't have genomic coordinates. It would be a great addition but could be kind of complicated to implement, particularly for the miRNAs that can come from multiple regions of the genome.
No worries at all. Thanks for all the development you do for the tools in the miRNA and smallncRNA field! I had a look at the mirtop project and I'd be happy to help contribute if there's anything I can do. I will keep an eye on the issues there and see if I can help with anything.
Cheers,
Marion
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABi_HPUC9WoM9InoQ4FmMQLgYIrMTMdyks5tphQwgaJpZM4S9q-g>.
|
Hi, it's me again
I have a question about how the isomiRs/seqbuster pipeline is annotating isomiRs. For example I have these two isomiRs that have been categorised as having untemplated additions:
But I realised they could equally be categorised as having a mismatch at the 3rd base in from the three prime end. Is there a particular reason behind favouring one annotation over another?
Also if I had changed the argument
canonicalAdd
to the defaultTRUE
when importing files withIsomirDataSeqFromFiles
would it instead find a mismatch at that position or would it not be separated out? Or perhaps it would depend on the allele frequency of the mismatch? Or are mismatches effectively not called in the last three positions of the read.Thanks!
The text was updated successfully, but these errors were encountered: