Function questions #10

mshadbolt · 2018-02-23T18:37:39Z

Hi and thanks for the great isomiRs package, it is super handy.

I have a few questions/suggestions about various functions to ensure that I'm interpreting things correctly

IsomirDataSeqFromFiles()

Would you be able to explain more what you mean by the uniqueMism parameter? Does it mean to only keep mismatches if they are found in one isomiR type? or maybe you could provide an example?
It would be great to have an option to remove untemplated additions that contain 'N's

isoPlot()

When you set type to all you get a side-by-side plot, left side is labelled 'freq' right side is 'unique' I believe for this plot each line represents a sample and the position along the plane for each isomir type indicates the percentage of that type in that sample, i.e. if you added up the points on each plane for each sample it would add to 100%, let me know if this interpretation is correct.
I am not so clear on what the 'unique' graph is representing, does that mean when a particular isomiR is seen only in a single sample?
For the other isoPlot() types, the y-axis is labelled '# of unique sequences', but it is actually a decimal fraction or proportion whereas a # implies a count. For example when I set type = add I get my samples sitting around 0.5-0.6 for a 1-bp addition. Is the proportion of all isomiRs for a particular sample? Or is the proportion of all 'add' type isomiRs?
For a simplified example, say I have a sample with 10 total isomiRs of various types, a value of 0.6 at 1 bp means 6 of these have a 1 bp addition OR if I have 10 total isomiRs of various types, and 5 of these have an addition of between 1 and 3 bp, and I see a point around 0.6 for 1bp addition that means 0.6*5 so there are 3 isomiRs with a 1bp addition. And does the same logic then follow for the other single plots?
I'm also not overly sure what you mean by 'unique sequences'.
When you say the size of the points is proportional to the total counts, is that the count across all isomiRs?

I guess I am mostly confused by the way you use the term 'unique' and what it means in each context.

Thanks again for the cool package

The text was updated successfully, but these errors were encountered:

lpantano · 2018-02-26T17:45:04Z

Hi @mshadbolt,

thanks for all the questions, they are useful and help me to fix the documentation.

Yes, that mainly refers to only keep mismatches when the sequences only map to one place on the database. So, for instance, if the sequence map to two different miRNAs, then it would be removed.

2.Thanks for this, I never found that but I'll add it as soon as possible for sure.

1-isoPlot. Sure, this is confusing always. I should change the names. freq means that the number represented is the sum of the counts for each isomiRs type. unique is the number of different sequences for each isomiRs. For instance, if a isomiRs is a ISO5 type and has a count of 100, for the freq figure it would add to the total 100, but for the unique will add to the total just 1. Is this better?

2-isoPlot. You are right, I fixed that right now. And it is considering all the sequences that are in this isomiR type. So in your example, it would be 0.6*5.

3-isoPlot. It's showing the proportion considering as total the all the sequences in the sample.

I hope this helps, and I added some changes to the man pages.

Thanks a bunch.

mshadbolt · 2018-02-28T17:42:50Z

hi @lpantano
Thanks a lot for the answers, they do clear things up a lot. I have a follow up clarifcation.

1-isoPlot
So the freq is counting kind of the proportion of expression of each isomiR type for each sample, whereas the unique is counting the proportion of each type of isomiR that was detected. So the ref is low because there is only one ref per mature strand, whereas there can be multiple kinds of the other isomiR types per strand?

lpantano · 2018-03-01T18:36:38Z

Yes, that is correct.

…

On Feb 28, 2018, at 12:49 PM, Marion ***@***.***> wrote: hi @lpantano <https://github.com/lpantano> Thanks a lot for the answers, they do clear things up a lot. I have a follow up clarifcation. 1-isoPlot So the freq is counting kind of the proportion of expression of each isomiR type for each sample, whereas the unique is counting the proportion of each type of isomiR that was detected. So the ref is low because there is only one ref per mature strand, whereas there can be multiple kinds of the other isomiR types per strand? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABi_HLPSvCNpM5tiJCUn7dhjgtdXlp0Gks5tZZAagaJpZM4SRTKQ>.

@mshadbolt

#10 thanks to @mshadbolt

mshadbolt · 2018-03-08T22:10:16Z

Hi again,

I just thought of another thing you might want to add to the documentation. I think it would be good to specify that for the mm tag, the position is the position within the detected sequence, not within the reference strand miRNA.

lpantano · 2018-03-16T17:15:00Z

thanks. I added.

lpantano added a commit that referenced this issue Mar 2, 2018

Remove sequences with N nts in addition isomiR type.

ee74234

#10 thanks to @mshadbolt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function questions #10

Function questions #10

mshadbolt commented Feb 23, 2018

lpantano commented Feb 26, 2018

mshadbolt commented Feb 28, 2018

lpantano commented Mar 1, 2018 via email

mshadbolt commented Mar 8, 2018

lpantano commented Mar 16, 2018

Function questions #10

Function questions #10

Comments

mshadbolt commented Feb 23, 2018

lpantano commented Feb 26, 2018

mshadbolt commented Feb 28, 2018

lpantano commented Mar 1, 2018 via email

mshadbolt commented Mar 8, 2018

lpantano commented Mar 16, 2018