Implement `group_by` and `with_groups` for `tidySingleCellExperiment` and `tidyseurat` #71

stemangiola · 2023-07-20T12:47:28Z

the issue for tidyseurat is here

stemangiola · 2023-07-29T07:38:46Z

group_split is also a new useful function

B0ydT · 2023-07-29T12:20:12Z

I'm interested in having a go at this. I've looked at the dplyr code and think I have a handle on it. Are there any implementations, especially part of tidyomics that you'd prefer I use as a starting point?

stemangiola · 2023-07-29T13:52:12Z

Hello @B0ydT, great initiative.

This is probably the transcriptomics hardest challenge (the one for tidySE is even harder).

Currently, group_by leads to a table in return, as I never approached the hard problem of making a proper API for this method.

This challenge needs a design document first, as we need to iron out all possible pitfalls of grouping an arbitrary set of columns and doing some operations on them. Some combinations may lead to valid single cell experiment objects, while others may lead to invalid objects, and therefore A tibble must be returned.

have a look to this workshop to learn more about tidySingleCellExperiment https://tidytranscriptomics-workshops.github.io/bioc2022_tidytranscriptomics/articles/tidytranscriptomics_case_study.html

Let's start from group_by, imagine how many ways a user could group_by across cell metadata and reduced dimensions, and how many ways it could operate on this object, e.g. mutate, summarize, something else.

Then pretty much you have creative freedom in proposing a design document, describing the logics of the group_by API for SingleCellExperiment

stemangiola · 2023-08-20T12:27:59Z

I'm interested in having a go at this. I've looked at the dplyr code and think I have a handle on it. Are there any implementations, especially part of tidyomics that you'd prefer I use as a starting point?

Any news? Let me know if you need help.

B0ydT · 2023-09-15T09:47:40Z

Hi @stemangiola. Apologies for the slow response, I've been in the middle of changing jobs. I certainly wasn't looking to jump on the thorniest issue of the lot; It took your comment for me to see how this could be quite a lot trickier than the dplyr implementation.

I have tried to think through the issues you've raised. From what I remember of the dplyr implementation, you define the groups when you group_by, but most error detection occurs when you attempt to perform an operation on the grouped object. If we're starting with with_groups as opposed to methods for each relevant function, I could catch those errors and provide informative feedback to the user. Otherwise, I'm very happy to help with implementation, but I'm not sure I'm the right person to work it through from first principles.

stemangiola · 2023-09-15T10:19:27Z

Hello @B0ydT, congrats for your new job!

You can start with group_split, it is useful and much easier.

A quick function for backend I found is

splitColData <- function(x, f) {
  # This is by @jma1991
  # at https://github.com/drisso/SingleCellExperiment/issues/55
  
  i <- split(seq_along(f), f)
  
  v <- vector(mode = "list", length = length(i))
  
  names(v) <- names(i)
  
  for (n in names(i)) { v[[n]] <- x[, i[[n]]] }
  
  return(v)
  
}

Of course our interface will allow any of the cell-wise info, including reduced dimensions, etc..

B0ydT · 2023-09-23T04:33:51Z

Thank you very much! That looks great. I'll get it sorted shortly.

B0ydT · 2023-12-10T05:44:27Z

Stoked to have this merged. I'll repurpose for tidyseurat and tidySE in the next couple days. I'm also pretty confident I could get a decent implementation of with_groups built on top of this. Thanks for all of your feedback, I've been learning heaps.

Just to follow up, the groups column itself is preserved just fine (if .keep = TRUE), but if I try and add a new column for logical comparisons it gets mangled. dplyr adds a column for these comparisons, i.e. a > 5

tibble(a = 1:10) |> group_by(a>5)
# A tibble: 10 × 2
# Groups:   a > 5 [2]
       a `a > 5`
   <int> <lgl>  
 1     1 FALSE  
 2     2 FALSE  
 3     3 FALSE  
 4     4 FALSE  
 5     5 FALSE  
 6     6 TRUE   
 7     7 TRUE   
 8     8 TRUE   
 9     9 TRUE   
10    10 TRUE

That said, if you're fine with it I'm fine with it, just wanted to be clear.

stemangiola · 2023-12-10T05:49:49Z

Stoked to have this merged. I'll repurpose for tidyseurat and tidySE in the next couple days.

Repurposing to tidyseurat will take 3 minutes.

Repurposing to tidySE might take 3 weeks.

For tidySE, to avoid inefficiencies (some large pseudobulk might include 1 Billion rows), specific flows for samples-only queries and features-only queries must be designed, if the query is samples and features, we should decide what to do. For example, (1) it does not make sense, or (2) use the inefficient method with message, (3)...

For tidySE look at nest, unnest, where I developed such strategies.

In fact you could get away using the nest quite optimised framework I developed. I would suggest doing sonmething like this in the backend

data |> mutate(...) |> nest(data_nested = -...) |> pull(data_nested)

B0ydT · 2023-12-10T06:25:15Z

Repurposing to tidySE might take 3 weeks.

Aha, noted. I may not have much time between now and Christmas, but I'll see what I can do.

stemangiola · 2023-12-10T07:42:24Z

Repurposing to tidySE might take 3 weeks.

Aha, noted. I may not have much time between now and Christmas, but I'll see what I can do.

On the other hand, if you manage to pull off my trick could take 5 minutes :)

william-hutchison · 2024-01-04T02:30:26Z

Hi @B0ydT, thanks for your great contribution! Please add your details to this authorship list if you would like to be included in our upcoming publication:

https://docs.google.com/spreadsheets/d/19XqhN3xAMekCJ-esAolzoWT6fttruSEermjIsrOFcoo/edit?usp=sharing

B0ydT · 2024-01-04T02:42:04Z

Thanks @william-hutchison, I will need to clear it with my supervisor but should complete it shortly.

B0ydT · 2024-01-04T03:20:10Z

@william-hutchison Have added my details, thanks again

william-hutchison · 2024-01-05T01:13:11Z

@B0ydT great, thanks for your work!

stemangiola added the enhancement New feature or request label Jul 21, 2023

stemangiola changed the title ~~Implement group_by and with_groups for tidySingleCellExperiment~~ Implement group_by and with_groups for tidySingleCellExperiment and tidyseurat Jul 23, 2023

stemangiola changed the title ~~Implement group_by and with_groups for tidySingleCellExperiment and tidyseurat~~ Implement group_by and with_groups for tidySCE and tidyseurat` Jul 28, 2023

stemangiola changed the title ~~Implement group_by and with_groups for tidySCE and tidyseurat`~~ Implement group_by and with_groups for tidySCE and tidyseurat Jul 28, 2023

stemangiola changed the title ~~Implement group_by and with_groups for tidySCE and tidyseurat~~ Implement group_by and with_groups for tidySingleCellExperiment and tidyseurat Jul 28, 2023

stemangiola assigned B0ydT Aug 15, 2023

B0ydT mentioned this issue Oct 15, 2023

group_split #97

Merged

B0ydT mentioned this issue Feb 10, 2024

Implement group_by and with_groups for tidySummarizedExperiment stemangiola/tidySummarizedExperiment#75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `group_by` and `with_groups` for `tidySingleCellExperiment` and `tidyseurat` #71

Implement `group_by` and `with_groups` for `tidySingleCellExperiment` and `tidyseurat` #71

stemangiola commented Jul 20, 2023 •

edited

Loading

stemangiola commented Jul 29, 2023

B0ydT commented Jul 29, 2023

stemangiola commented Jul 29, 2023 •

edited

Loading

stemangiola commented Aug 20, 2023

B0ydT commented Sep 15, 2023

stemangiola commented Sep 15, 2023

B0ydT commented Sep 23, 2023

B0ydT commented Dec 10, 2023

stemangiola commented Dec 10, 2023 •

edited

Loading

B0ydT commented Dec 10, 2023

stemangiola commented Dec 10, 2023 •

edited

Loading

william-hutchison commented Jan 4, 2024 •

edited

Loading

B0ydT commented Jan 4, 2024

B0ydT commented Jan 4, 2024

william-hutchison commented Jan 5, 2024

Implement group_by and with_groups for tidySingleCellExperiment and tidyseurat #71

Implement group_by and with_groups for tidySingleCellExperiment and tidyseurat #71

Comments

stemangiola commented Jul 20, 2023 • edited Loading

stemangiola commented Jul 29, 2023

B0ydT commented Jul 29, 2023

stemangiola commented Jul 29, 2023 • edited Loading

stemangiola commented Aug 20, 2023

B0ydT commented Sep 15, 2023

stemangiola commented Sep 15, 2023

B0ydT commented Sep 23, 2023

B0ydT commented Dec 10, 2023

stemangiola commented Dec 10, 2023 • edited Loading

B0ydT commented Dec 10, 2023

stemangiola commented Dec 10, 2023 • edited Loading

william-hutchison commented Jan 4, 2024 • edited Loading

B0ydT commented Jan 4, 2024

B0ydT commented Jan 4, 2024

william-hutchison commented Jan 5, 2024

Implement `group_by` and `with_groups` for `tidySingleCellExperiment` and `tidyseurat` #71

Implement `group_by` and `with_groups` for `tidySingleCellExperiment` and `tidyseurat` #71

stemangiola commented Jul 20, 2023 •

edited

Loading

stemangiola commented Jul 29, 2023 •

edited

Loading

stemangiola commented Dec 10, 2023 •

edited

Loading

stemangiola commented Dec 10, 2023 •

edited

Loading

william-hutchison commented Jan 4, 2024 •

edited

Loading