Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Figure7 from Avino et al paper + grouping the cophylogeny analysis as high or low #146

Open
Jigyasa3 opened this issue Mar 8, 2021 · 10 comments

Comments

@Jigyasa3
Copy link

Jigyasa3 commented Mar 8, 2021

Dear @ArtPoon lab

Thank you so much for such an amazing R package and a great paper (https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.5185).
I have a few questions regarding the paper-
a) I am trying to reproduce Figure7 in the paper, but I cannot seem to load the "groups" variable to the ggbiplot function.

g <- ggbiplot(p, groups=temp$Group, labels=rownames(temp), labels.size=3, var.col=rgb(0,0,0,0.4)) g <- g + scale_color_manual(name="Group", values=c('firebrick', 'cadetblue')) g <- g + theme(legend.position='none') print(g) ##where is the "Group" column?

b) I am interested in performing a similar analysis on my dataset and was wondering (i) for normalizing the different distance methods, why was the following normalize function used?
normalize <- function(x) { (x-min(x)) / (max(x)-min(x)) }

c) Is the data ("https://github.com/PoonLab/cophylo/edit/master/data/TotalandKernelS1.csv") in Figure 7, raw data? (i.e. generated by running each pair of host-symbiont tree in Kaphi and then normalized in the above-mentioned function)

d) How were "high" and "low" cophylogeny determined for each dataset in the paper? Is there a specific cut-off, or a relative value after normalization?

Looking forward to your reply!

@ArtPoon
Copy link
Contributor

ArtPoon commented Mar 9, 2021

Hi @Jigyasa3, sorry for the delayed response - it's a busy term. I am going to ping the lead author @mavino who should be able to help you with the R code and data files.

@mavino
Copy link

mavino commented Mar 9, 2021

Thank you @ArtPoon,
Hi @Jigyasa3 and thank you so much for your interest.
Give me some time to answer point by point to your answer since I have not been working on this project for a long time.
I will soon start replying to your answer.
Thank you very much again...

@mavino
Copy link

mavino commented Mar 9, 2021

Regarding to your point a), there was a problem in the code I just fixed mavino/cophylo@3409bab6
The "Group" refers the last column of file "TotalandKernelfig7.csv" which tells you if that host-parasite pair is at high or low cophylogeny. This will be coloured in the resulting biplot.

@mavino
Copy link

mavino commented Mar 9, 2021

Regarding to your point b), we needed to make comparable the different distances because they have different scales, some distances have values between 0 and 1, some from 0 to plus infinity. Thus we performed a min-max normalization to put them on a same scale. Not sure why we did not use a Z-score normalization, maybe we noticed we did not have many outliers.

@mavino
Copy link

mavino commented Mar 9, 2021

Regarding to your point c), yes they are raw data and then eventually normalized with min-max function

@mavino
Copy link

mavino commented Mar 9, 2021

Regarding to your point d), as it is specified in the paper, high and low degree of cophylogeny was just based on authors’ assessment specific to the paper it refers. We did not specify any further cut-off.

@Jigyasa3
Copy link
Author

Hey @mavino

Thank you so much for replying and answering all my questions. I have two follow-up questions. I do not come from a statistical background, so it's possible that my understanding of the kernel method discussed in the paper is wrong. So please correct me (sorry!).

Question 1- It is mentioned in the paper that the kernel method accounts for differences in branch lengths of the host and symbiont (or parasite), and the number of nodes in the tree. Does that mean that I can compare (as in point (c) above) host-symbiont trees of different sizes (nodes) and rate of evolution with each other and draw conclusions about how much co-evolution is taking place (in kLn method)?

Question2- To examine how much coevolution is taking place, I am interested in comparing different microbial groups against the same host tree and instead of grouping them as parasitic, symbiotic or mutualistic (as done in the papers referred to in your study), I want to examine if one microbe-host tree is more coevolving than the other. So, if I use the normalized values of different distance measures, can I say that a higher value of kLn and Align means the microbe1-host tree is more co-evolving than the microbe2-host tree?

The host tree is the same for all the microbes but the no. of nodes will vary depending on how microbe-host interacts.
Please let me know if my questions make sense I can explain them in more detail.

Thanks again for all the help!

@mavino
Copy link

mavino commented Mar 10, 2021

no worries, it is actually our pleasure to be useful for you!!
I would say yes to both of the questions. Your rationale is correct and there is the way I would use the distances.
Be careful about the way the software for kLn match labels of host and parasites.

@Jigyasa3
Copy link
Author

Thank you so much!

@Jigyasa3
Copy link
Author

One last query- I understand that different distance methods have different requirements for the tree type. Align() documentation says phylo object is required while TripL() documentation requires both the trees to be rooted. Just want to confirm, Align(), MAST(), and Sim() do not require the symbiont tree to be rooted right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants