Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering order supersedes manual groups #116

Open
micwij opened this issue Apr 13, 2023 · 10 comments
Open

Clustering order supersedes manual groups #116

micwij opened this issue Apr 13, 2023 · 10 comments

Comments

@micwij
Copy link

micwij commented Apr 13, 2023

I was trying to use tidyHeatmap to make heatmaps of metabolomics data, when I noticed a strange behaviour that rows escaped their manually assigned grouping and ended up in the wrong grouping.
It is a bit tricky to explain, so I am providing a small example here:

example <- tribble(~Compound_Name, ~Compound_Class, ~col, ~log2fc,
"L-homoserineAA", "AA", 1, 2.93,
"cellobioseCH", "CH", 1, 2.09,
"D-maltoseCH", "CH", 1, 3.08,
"pectinCH", "CH", 1, -3.04,
"raffinoseCH", "CH", 1, -2.10)

example %>%
group_by(Compound_Class) %>%
heatmap(.row = Compound_Name, .col = col, .value = log2fc)

example2 <- example %>%
mutate(Compound_Name = as_factor(Compound_Name))

example2 %>%
group_by(Compound_Class) %>%
heatmap(.row = Compound_Name, .col = col, .value = log2fc)

AA stands for amino acid and CH stands for carbohydrate (this is not important for the understanding of the issue, just to provide some context). I also added the compound class to the end of the compound name.

When the .row variable is just a character vector D-maltoseCH is switched with L-homoserine and both show up in the wrong group (putatively due to the clustering by the value?)
example

When mutating Compound_Name into a factor they both get correctly assigned:
example2

I don't know if this is an issue of tidyHeatmap or of the underlying ComplexHeatmap package but I think it would be important to find out and fix this behavior. Transforming the .row variable to a factor seems to work but I am not sure whether this is how this vector is most commonly used.

Let me know if something is unclear.

Sorry for this somewhat strange example. I tried to recreate the example with mtcars or diamonds but I wasn't able to achieve this strange behavior.

@stemangiola
Copy link
Owner

Thanks for the heads up. If you could check if the behaviour occurs

  • having two columns
  • factoring and grouping (rather the other way around)

@micwij
Copy link
Author

micwij commented Apr 13, 2023

Thanks for the heads up. If you could check if the behaviour occurs

  • having two columns

Yes this also occurs also with two or more columns (my original data has more than 10 columns). Here is a replacement for the example above, where I added a second column and modified the values slightly.

example <- tribble(~Compound_Name, ~Compound_Class, ~col, ~log2fc,
"L-homoserineAA", "AA", 1, 2.93,
"cellobioseCH", "CH", 1, 2.09,
"D-maltoseCH", "CH", 1, 1.08,
"pectinCH", "CH", 1, -3.04,
"raffinoseCH", "CH", 1, -2.10,
"L-homoserineAA", "AA", 2, -2.10,
"cellobioseCH", "CH", 2, -3.04,
"D-maltoseCH", "CH", 2, 1.08,
"pectinCH", "CH", 2, 2.09,
"raffinoseCH", "CH", 2, 2.93)

Upon modifying the values, it seems that the issue might not stem from the clustering after all, so maybe it is related to the names?

  • factoring and grouping (rather the other way around)

I think this is what I did in example2 above, or how do you mean it? Indeed in this case the behavior does not occur. When grouping by the variable as character vector and then mutating it into a factor the behavior still occurs.
E.g.:

example %>%
group_by(Compound_Class) %>%
mutate(Compound_Class = as_factor(Compound_Class)) %>%
heatmap(.row = Compound_Name, .col = col, .value = log2fc)

@stemangiola
Copy link
Owner

Puzzling.. Right now, I don't have the throughput to debug the function. I will put it on the do-to list. If you happen to want to give it a shot, you might be able to fix the bug in a short time and become part of the tidy* family ;)

@micwij
Copy link
Author

micwij commented Apr 13, 2023

Thanks for the heads up. If you could check if the behaviour occurs

  • having two columns

Yes this also occurs also with two or more columns (my original data has more than 10 columns). Here is a replacement for the example above, where I added a second column and modified the values slightly.

example <- tribble(~Compound_Name, ~Compound_Class, ~col, ~log2fc, "L-homoserineAA", "AA", 1, 2.93, "cellobioseCH", "CH", 1, 2.09, "D-maltoseCH", "CH", 1, 1.08, "pectinCH", "CH", 1, -3.04, "raffinoseCH", "CH", 1, -2.10, "L-homoserineAA", "AA", 2, -2.10, "cellobioseCH", "CH", 2, -3.04, "D-maltoseCH", "CH", 2, 1.08, "pectinCH", "CH", 2, 2.09, "raffinoseCH", "CH", 2, 2.93)

Upon modifying the values, it seems that the issue might not stem from the clustering after all, so maybe it is related to the names?

Small addition: I just removed the "D-" and "L-" from "D-maltoseCH" and "L-homoserineAA" and indeed the behavior does not appear. Hope this info helps in finding the issue.

Of course, those are globally not the most common names, but these are quite common in metabolomics and I could imagine similar names for e.g. cell lines, or strains, so I think this is still worth looking into.

@micwij
Copy link
Author

micwij commented Apr 13, 2023

Puzzling.. Right now, I don't have the throughput to debug the function. I will put it on the do-to list. If you happen to want to give it a shot, you might be able to fix the bug in a short time and become part of the tidy* family ;)

Sure. No worries and no hurry! I might try to look into it but I am not sure if I am experienced enough to solve it. I will report it here if I find anything.

@AleksZakirov
Copy link

I can confirm that the issue can be fixed by converting the variable into a factor. I tried replacing all dots, spaces and dash characters with underscores, thinking that it could somehow be related to that, but this made no difference. But converting to factor works for now.

@stemangiola
Copy link
Owner

Can you please send me the list of variables, in their simplest form, where they fail if not transformed into factors? This bit puzzles me a lot.

Try to get them in the simplest form and the smallest number where the error appears, we might be able to identify what is the cause. We need to fix this.

@stemangiola
Copy link
Owner

Hello all, thanks for bringing this to our attention. We will have a dedicated person for tidyomics who will also maintain tidyHeatmap.

Hopefully, this will happen soon.

@stemangiola
Copy link
Owner

on it..

@stemangiola
Copy link
Owner

I can confirm that the issue can be fixed by converting the variable into a factor. I tried replacing all dots, spaces and dash characters with underscores, thinking that it could somehow be related to that, but this made no difference. But converting to factor works for now.

Just to clarify I fixed converting the row names into factor.
But I am going to fix the source problem anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants