assay<- is too pedantic when the dimensions are named #51

LTLA · 2021-07-19T08:31:50Z

Identified in drisso/SingleCellExperiment#59:

mat <- matrix(rnorm(1000), ncol=20)
colnames(mat) <- LETTERS[1:20]

library(SummarizedExperiment)
se <- SummarizedExperiment(list(thingy=mat))

names(dim(mat)) <- c("Gene", "Cell")
assay(se) <- mat
## Error in `assays<-`(`*tmp*`, withDimnames = withDimnames, ..., value = `*vtmp*`) : 
##   please use 'assay(x, withDimnames=FALSE)) <- value' or 'assays(x,
##   withDimnames=FALSE)) <- value' when the dimnames on the supplied
##   assay(s) are not identical to the dimnames on SummarizedExperiment
##   object 'x'

I'll put aside the question of whether it's a good idea to rename the dim vector, because someone or something is clearly doing it, so assay<- should be robust to it. I'm guessing there's an identical() call on the old and new dimnames() that is getting tripped by the new names on the dimnames().

Session information

R version 4.1.0 Patched (2021-05-20 r80347)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/R-4-1-branch-dev/lib/libRblas.so
LAPACK: /home/luna/Software/R/R-4-1-branch-dev/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] SingleCellExperiment_1.15.0 SummarizedExperiment_1.23.0
 [3] Biobase_2.53.0              GenomicRanges_1.45.0       
 [5] GenomeInfoDb_1.29.0         IRanges_2.27.0             
 [7] S4Vectors_0.31.0            BiocGenerics_0.39.0        
 [9] MatrixGenerics_1.5.0        matrixStats_0.58.0         
[11] BiocFileCache_2.1.0         dbplyr_2.1.1               

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6             XVector_0.33.0         pillar_1.6.1          
 [4] compiler_4.1.0         zlibbioc_1.39.0        bitops_1.0-7          
 [7] tools_4.1.0            bit_4.0.4              lattice_0.20-44       
[10] RSQLite_2.2.7          memoise_2.0.0          lifecycle_1.0.0       
[13] tibble_3.1.2           pkgconfig_2.0.3        rlang_0.4.11          
[16] Matrix_1.3-3           DelayedArray_0.19.0    DBI_1.1.1             
[19] filelock_1.0.2         curl_4.3.1             fastmap_1.1.0         
[22] GenomeInfoDbData_1.2.6 withr_2.4.2            dplyr_1.0.6           
[25] httr_1.4.2             generics_0.1.0         vctrs_0.3.8           
[28] rappdirs_0.3.3         grid_4.1.0             bit64_4.0.5           
[31] tidyselect_1.1.1       glue_1.4.2             R6_2.5.0              
[34] fansi_0.4.2            purrr_0.3.4            blob_1.2.1            
[37] magrittr_2.0.1         ellipsis_0.3.2         assertthat_0.2.1      
[40] utf8_1.2.1             RCurl_1.98-1.3         cachem_1.0.5          
[43] crayon_1.4.1

The text was updated successfully, but these errors were encountered:

LiNk-NY · 2021-07-19T20:19:23Z

These look like the relevant lines. I agree that checking names on dimnames are a bit too much.

SummarizedExperiment/R/SummarizedExperiment-class.R

Lines 219 to 242 in 39fa368

    
           .assays_have_identical_dimnames <- function(assays, x_dimnames) 
        
           { 
        
               if (length(assays) == 0L) 
        
                   return(TRUE) 
        
               ## Like DelayedArray:::simplify_NULL_dimnames() but first replaces list 
        
               ## elements that are equal to character(0) with NULLs. 
        
               normalize_dimnames <- function(dn) { 
        
                   empty_idx <- which(lengths(dn) == 0L) 
        
                   if (length(empty_idx) == length(dn)) 
        
                       return(NULL) 
        
                   dn[empty_idx] <- list(NULL) 
        
                   dn 
        
               } 
        
               x_dimnames <- normalize_dimnames(x_dimnames) 
        
               ok <- vapply(seq_along(assays), 
        
                   function(i) { 
        
                       a <- getListElement(assays, i) 
        
                       a_dimnames <- normalize_dimnames(dimnames(a)) 
        
                       identical(a_dimnames, x_dimnames) 
        
                   }, 
        
                   logical(1) 
        
               ) 
        
               all(ok) 
        
           }

hpages · 2021-08-24T18:39:15Z

Addressed in SummarizedExperiment 1.23.3 (see commit 3b26fb6)

Your example @LTLA will still fail with this new version because it turns out that doing names(dim(mat)) <- c("Gene", "Cell") actually drops the dimnames of the matrix (I didn't know that and I find it very surprising) so you would need to do names(dimnames(mat)) <- c("Gene", "Cell") instead.

hpages closed this as completed Sep 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assay<- is too pedantic when the dimensions are named #51

assay<- is too pedantic when the dimensions are named #51

LTLA commented Jul 19, 2021

LiNk-NY commented Jul 19, 2021

hpages commented Aug 24, 2021

assay<- is too pedantic when the dimensions are named #51

assay<- is too pedantic when the dimensions are named #51

Comments

LTLA commented Jul 19, 2021

LiNk-NY commented Jul 19, 2021

hpages commented Aug 24, 2021