Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function questions #10

Open
mshadbolt opened this issue Feb 23, 2018 · 5 comments
Open

Function questions #10

mshadbolt opened this issue Feb 23, 2018 · 5 comments

Comments

@mshadbolt
Copy link

Hi and thanks for the great isomiRs package, it is super handy.

I have a few questions/suggestions about various functions to ensure that I'm interpreting things correctly

IsomirDataSeqFromFiles()

  1. Would you be able to explain more what you mean by the uniqueMism parameter? Does it mean to only keep mismatches if they are found in one isomiR type? or maybe you could provide an example?
  2. It would be great to have an option to remove untemplated additions that contain 'N's

isoPlot()

  1. When you set type to all you get a side-by-side plot, left side is labelled 'freq' right side is 'unique' I believe for this plot each line represents a sample and the position along the plane for each isomir type indicates the percentage of that type in that sample, i.e. if you added up the points on each plane for each sample it would add to 100%, let me know if this interpretation is correct.
    I am not so clear on what the 'unique' graph is representing, does that mean when a particular isomiR is seen only in a single sample?

  2. For the other isoPlot() types, the y-axis is labelled '# of unique sequences', but it is actually a decimal fraction or proportion whereas a # implies a count. For example when I set type = add I get my samples sitting around 0.5-0.6 for a 1-bp addition. Is the proportion of all isomiRs for a particular sample? Or is the proportion of all 'add' type isomiRs?
    For a simplified example, say I have a sample with 10 total isomiRs of various types, a value of 0.6 at 1 bp means 6 of these have a 1 bp addition OR if I have 10 total isomiRs of various types, and 5 of these have an addition of between 1 and 3 bp, and I see a point around 0.6 for 1bp addition that means 0.6*5 so there are 3 isomiRs with a 1bp addition. And does the same logic then follow for the other single plots?
    I'm also not overly sure what you mean by 'unique sequences'.

  3. When you say the size of the points is proportional to the total counts, is that the count across all isomiRs?

I guess I am mostly confused by the way you use the term 'unique' and what it means in each context.

Thanks again for the cool package

@lpantano
Copy link
Owner

Hi @mshadbolt,

thanks for all the questions, they are useful and help me to fix the documentation.

  1. Yes, that mainly refers to only keep mismatches when the sequences only map to one place on the database. So, for instance, if the sequence map to two different miRNAs, then it would be removed.

2.Thanks for this, I never found that but I'll add it as soon as possible for sure.

1-isoPlot. Sure, this is confusing always. I should change the names. freq means that the number represented is the sum of the counts for each isomiRs type. unique is the number of different sequences for each isomiRs. For instance, if a isomiRs is a ISO5 type and has a count of 100, for the freq figure it would add to the total 100, but for the unique will add to the total just 1. Is this better?

2-isoPlot. You are right, I fixed that right now. And it is considering all the sequences that are in this isomiR type. So in your example, it would be 0.6*5.

3-isoPlot. It's showing the proportion considering as total the all the sequences in the sample.

I hope this helps, and I added some changes to the man pages.

Thanks a bunch.

@mshadbolt
Copy link
Author

hi @lpantano
Thanks a lot for the answers, they do clear things up a lot. I have a follow up clarifcation.

1-isoPlot
So the freq is counting kind of the proportion of expression of each isomiR type for each sample, whereas the unique is counting the proportion of each type of isomiR that was detected. So the ref is low because there is only one ref per mature strand, whereas there can be multiple kinds of the other isomiR types per strand?

@lpantano
Copy link
Owner

lpantano commented Mar 1, 2018 via email

@mshadbolt
Copy link
Author

Hi again,

I just thought of another thing you might want to add to the documentation. I think it would be good to specify that for the mm tag, the position is the position within the detected sequence, not within the reference strand miRNA.

@lpantano
Copy link
Owner

thanks. I added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants