Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods for splitting SingleCellExperiment objects #55

Open
jma1991 opened this issue Oct 2, 2020 · 5 comments
Open

Methods for splitting SingleCellExperiment objects #55

jma1991 opened this issue Oct 2, 2020 · 5 comments

Comments

@jma1991
Copy link

jma1991 commented Oct 2, 2020

Is there scope to define a splitColData and splitRowData methods for the SingleCellExperiment class?

I am working with a rather large SingleCellExperiment object and I often find myself needing to split the object into a list of smaller objects for pre-processing based on either the column or row data.

This can obviously be done with the following:

# Split by column data
var <- colData(sce)$variable
sce <- lapply(var, function(x) sce[, colData(sce)$variable == x])

# Split by row data
var <- rowData(sce)$variable
sce <- lapply(var, function(x) sce[rowData(sce)$variable == x, ])

However, I've found this approach to be slower than using a for-loop with pre-allocation (e.g. similar to the code already in the splitAltExps function):

splitColData <- function(x, f) {
  
  i <- split(seq_along(f), f)

  v <- vector(mode = "list", length = length(i))
  
  names(v) <- names(i)
  
  for (n in names(i)) { v[[n]] <- x[, i[[n]]] }
  
  return(v)

}

If there is a need for these methods I can submit a pull-request? If not, it would be super helpful if you could advise what is the most robust and efficient method for splitting SCE objects. Thank you.

@LTLA
Copy link
Collaborator

LTLA commented Oct 2, 2020

However, I've found this approach to be slower than using a for-loop with pre-allocation (e.g. similar to the code already in the splitAltExps function):

Well, yes, that's because you're looping over every element of var rather than its unique levels.

If there is a need for these methods I can submit a pull-request?

Possibly, but this would likely go to the SummarizedExperiment repository rather than this one. Any such methods should benefit all SE subclasses, there isn't any reason that it would just be useful for SCEs.

Tagging @mtmorgan: does this functionality already exist in SE?S4Vectors::split() kind of works but it's hard to remember that it splits by row instead of column in an SE. (Also I just noticed SCE doesn't implement extractROWS properly: need to fix.)

@LTLA
Copy link
Collaborator

LTLA commented Oct 2, 2020

bc220ca fixes the split() issue, so a hypothetical splitByRow() would be as easy as:

split(sce, rowData(sce)$variable)

@lambdamoses
Copy link

Any update on this? Seurat has the SplitObject function. But actually I'm asking because I'm writing a method to split a SpatialFeatureExperiment object by geometry so for instance cells in different pieces of tissue can be split into different SFE objects; I want to keep the style consistent with any existing split function in SCE and SpatialExperiment that splits by columns rather than rows.

@LTLA
Copy link
Collaborator

LTLA commented Jul 18, 2024

No, it seems I clobbered my own PR (linked above) and also no one cared about it.

Perhaps consider making a PR to the SummarizedExperiment repo with something like:

# Completely untested!
setGeneric("splitByCol", function(x, f, ...) standardGeneric("splitByCol"))

setMethod("splitByCol", "SummarizedExperiment", function(x, f, ...) {
    f <- as.factor(f)
    by.levels <- split(seq_along(f), f)
    for (i in seq_along(by.levels)) {
        by.levels[[i]] <- x[, by.levels[[i]], drop=FALSE]
    }
    by.levels
})
  • Not really sure it needs to be a generic as the [ method should handle everything already.
  • Perhaps this could be generalized to other 2D structures in the BioC-S4 ecosystem, e.g., S4Arrays, in which case it makes some sense to be a generic and may need to live in BiocGenerics.

Don't have the time/will to do it myself but it seems useful enough that a PR would warrant some consideration.

@lambdamoses
Copy link

I renamed the split function for SFE to splitByCol and added a generic for it in the SFE package to avoid confusion when split would split by row for SummarizedExperiment. I may do a PR to SummarizedExperiment later but I don't have the time before the Bioc2024 conference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants