Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combineCols() errors when called via do.call() with a named list of objects to be combined #62

Open
PeteHaitch opened this issue Jan 11, 2022 · 5 comments

Comments

@PeteHaitch
Copy link
Contributor

PeteHaitch commented Jan 11, 2022

Please note that this may properly be a bug report for S4Vectors or SumarizedExperiment, so I'm tagging @hpages, but I encountered it when applying combineCols() via do.call() to a list of SingleCellExperiment objects and perhaps something needs to be fixed here too?

suppressPackageStartupMessages(library(SingleCellExperiment))
example(SingleCellExperiment, echo = FALSE)

# Helper function to create SCEs with overlapping genes and non-overlapping samples.
f <- function(y) {
  x <- sce
  rd <- rowData(x)
  rownames(x) <-  paste0(
    "Gene_",
    sort(sample(2 * nrow(sce), nrow(x), replace = FALSE)))
  colnames(x) <- paste0(y, "_", seq_len(ncol(x)))
  x
}

sce1 <- f("A")
sce2 <- f("B")
sce3 <- f("C")

# Works when list is unnamed
list_of_sce <- list(sce1, sce2, sce3)
do.call(combineCols, list_of_sce)
#> class: SingleCellExperiment 
#> dim: 349 300 
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(349): Gene_3 Gene_4 ... Gene_381 Gene_393
#> rowData names(0):
#> colnames(300): A_1 A_2 ... C_99 C_100
#> colData names(0):
#> reducedDimNames(2): PCA tSNE
#> mainExpName: NULL
#> altExpNames(0):

# Errors when list is named
named_list_of_sce <- list_of_sce
names(named_list_of_sce) <- c("A", "B", "C")
do.call(combineCols, named_list_of_sce)
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'combineCols' for signature '"missing"'

I'm unsure exactly when this error is being triggered but I note that this error exists/persists lower down in the combineCols() methods stack, e.g., when applied to a DataFrame:

do.call(combineCols, lapply(list_of_sce, colData))
#> DataFrame with 300 rows and 0 columns
do.call(combineCols, lapply(named_list_of_sce, colData))
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'combineCols' for signature '"missing"'

Created on 2022-01-12 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Ubuntu 20.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_AU:en
#>  collate  en_AU.UTF-8
#>  ctype    en_AU.UTF-8
#>  tz       Australia/Melbourne
#>  date     2022-01-12
#>  pandoc   2.11.4 @ /usr/lib/rstudio/bin/pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version  date (UTC) lib source
#>  Biobase              * 2.54.0   2021-10-26 [1] RSPM (R 4.1.2)
#>  BiocGenerics         * 0.40.0   2021-10-26 [1] RSPM (R 4.1.2)
#>  bitops                 1.0-7    2021-04-24 [1] RSPM (R 4.1.0)
#>  cli                    3.1.0    2021-10-27 [1] RSPM (R 4.1.2)
#>  DelayedArray           0.20.0   2021-10-26 [1] RSPM (R 4.1.2)
#>  digest                 0.6.29   2021-12-01 [1] RSPM (R 4.1.2)
#>  evaluate               0.14     2019-05-28 [1] RSPM (R 4.1.0)
#>  fastmap                1.1.0    2021-01-25 [1] RSPM (R 4.1.0)
#>  fs                     1.5.2    2021-12-08 [1] RSPM (R 4.1.2)
#>  GenomeInfoDb         * 1.30.0   2021-10-26 [1] RSPM (R 4.1.2)
#>  GenomeInfoDbData       1.2.7    2021-10-28 [1] RSPM (R 4.1.1)
#>  GenomicRanges        * 1.46.1   2021-11-18 [1] RSPM (R 4.1.2)
#>  glue                   1.6.0    2021-12-17 [1] RSPM (R 4.1.0)
#>  highr                  0.9      2021-04-16 [1] RSPM (R 4.1.0)
#>  htmltools              0.5.2    2021-08-25 [1] RSPM (R 4.1.0)
#>  IRanges              * 2.28.0   2021-10-26 [1] RSPM (R 4.1.2)
#>  knitr                  1.37     2021-12-16 [1] RSPM (R 4.1.0)
#>  lattice                0.20-45  2021-09-22 [4] CRAN (R 4.1.1)
#>  magrittr               2.0.1    2020-11-17 [1] RSPM (R 4.1.0)
#>  Matrix                 1.4-0    2021-12-08 [4] CRAN (R 4.1.2)
#>  MatrixGenerics       * 1.6.0    2021-10-26 [1] RSPM (R 4.1.2)
#>  matrixStats          * 0.61.0   2021-09-17 [1] RSPM (R 4.1.1)
#>  RCurl                  1.98-1.5 2021-09-17 [1] RSPM (R 4.1.1)
#>  reprex                 2.0.1    2021-08-05 [1] RSPM (R 4.1.0)
#>  rlang                  0.4.12   2021-10-18 [1] RSPM (R 4.1.1)
#>  rmarkdown              2.11     2021-09-14 [1] RSPM (R 4.1.1)
#>  rstudioapi             0.13     2020-11-12 [1] RSPM (R 4.1.0)
#>  S4Vectors            * 0.32.3   2021-11-21 [1] RSPM (R 4.1.2)
#>  sessioninfo            1.2.2    2021-12-06 [1] RSPM (R 4.1.2)
#>  SingleCellExperiment * 1.16.0   2021-10-26 [1] RSPM (R 4.1.2)
#>  stringi                1.7.6    2021-11-29 [1] RSPM (R 4.1.2)
#>  stringr                1.4.0    2019-02-10 [1] RSPM (R 4.1.0)
#>  SummarizedExperiment * 1.24.0   2021-10-26 [1] RSPM (R 4.1.2)
#>  withr                  2.4.3    2021-11-30 [1] RSPM (R 4.1.2)
#>  xfun                   0.29     2021-12-14 [1] RSPM (R 4.1.0)
#>  XVector                0.34.0   2021-10-26 [1] RSPM (R 4.1.2)
#>  yaml                   2.2.1    2020-02-01 [1] RSPM (R 4.1.0)
#>  zlibbioc               1.40.0   2021-10-26 [1] RSPM (R 4.1.2)
#> 
#>  [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@LTLA
Copy link
Collaborator

LTLA commented Jan 18, 2022

This is because combineCols's generic has x as its first argument, so when you have a named list where no element is named x, the x is missing. Not really sure what can be done here as the error pops up in the dispatch before any method is run.

@PeteHaitch
Copy link
Contributor Author

Does the generic need to have x as its first argument or could it be like cbind() and just have ...?
I have a vague idea this makes methods dispatch tricky.

@hpages
Copy link
Contributor

hpages commented Jan 18, 2022

Note that even in base R they couldn't make their mind: cbind() dispatches on ... and c() dispatches on its first argument x.

The thing is that there are some complications when defining a generic with dispatch on .... One of them is that if you define a method for a given class, then dispatch will select that method only if all the supplied arguments have that class:

setGeneric("foo", signature="...", function(...) standardGeneric("foo"))
setMethod("foo", "character", function(...) "I'm the foo() method for character")

foo("a", 1)
# Error in standardGeneric("foo") : 
#   no method or default matching the "..." arguments in foo("a", 1)

foo(1, "a")
# Error in standardGeneric("foo") : 
#   no method or default matching the "..." arguments in foo(1, "a")

foo("a", "b")
# [1] "I'm the foo() method for character"

This is generally too strict. For example, a reasonable expectation is that if some of the supplied objects don't have the class of the first object, they'll be coerced and things will just work. Like in:

c(Rle(1:3), 21:22)
# integer-Rle of length 5 with 5 runs
#   Lengths:  1  1  1  1  1
#   Values :  1  2  3 21 22

For cbind() (and rbind()), they use a complicated mechanism to make this work:

cbind(DataFrame(c=11:14), data.frame(b=21:24))
# DataFrame with 4 rows and 2 columns
#           c         b
#   <integer> <integer>
# 1        11        21
# 2        12        22
# 3        13        23
# 4        14        24

Yes it works, but that's because additional generic cbind2() is involved and called internally by cbind() under some conditions, and also because S4Vectors defines 3 cbind2() methods in addition to the cbind() method for DataFrame objects.

All that complexity/overhead is avoided by simply dispatching on the first argument like base::c(), S4Vectors::combineRows(), and S4Vectors::combineCols() do.

Not a perfect solution because, as you've noticed, named arguments can get in the way:

c(A=Rle(1:3), B=21:22)
# $A
# integer-Rle of length 3 with 3 runs
#   Lengths: 1 1 1
#   Values : 1 2 3
# 
# $B1
# [1] 21
# 
# $B2
# [1] 22

Even though this is a BIG gotcha with c(), it's not new, and, personally I've learned to live with it by always passing the list of objects to combine thru unname() in the do.call(c, unname(objects)) construct. So it's kind of natural for me to do the same thing with combineRows() or combineCols().

If we wanted to fix this, we'd need to introduce the combineRows2() and combineCols2() generics. They would dispatch on their first 2 arguments x and y. Then make combineRows() and combineCols() ordinary functions with ... for first argument. Note that these would no longer need to be generics, just ordinary functions that would call combineRows2()/combineCols2() recursively or in a loop.

Unfortunately there's no way we can fix this for c().

@PeteHaitch
Copy link
Contributor Author

Thanks for the details, @hpages.
It seems like more trouble than it's worth to change the code.
But perhaps an addition to the documentation showing the do.call(combineCols, unname(objects)) construct would be valuable?
I'd be happy to write a PR, but where would this be best documented?

@LTLA
Copy link
Collaborator

LTLA commented Jan 19, 2022

One of them is that if you define a method for a given class, then dispatch will select that method only if all the supplied arguments have that class:

Indeed, this was a major issue with the first versions of the combineCols precursors, given that I expected to be able to run them on a mixture of SCEs, RSEs and base SEs and get a result that didn't throw away (too much) information from derived classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants