Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarize doesn't work with weighted mean #38

Closed
UnkindPartition opened this issue Jun 5, 2018 · 3 comments
Closed

summarize doesn't work with weighted mean #38

UnkindPartition opened this issue Jun 5, 2018 · 3 comments
Labels

Comments

@UnkindPartition
Copy link

library(plyranges)

set.seed(2018)
n <- 10
r <- GRanges(seqnames = rep("chr1", n),
                  ranges = IRanges(start = sample(20, n, replace = T),
                                   width = sample(6,  n, replace = T))
                  )
mcols(r) <- data.frame(score = runif(n, 0, 100), condition = rep_len(c("One","Two"), n))
r
## GRanges object with 10 ranges and 2 metadata columns:
##        seqnames    ranges strand |            score condition
##           <Rle> <IRanges>  <Rle> |        <numeric>  <factor>
##    [1]     chr1       7-9      * | 26.0100793326274       One
##    [2]     chr1     10-13      * | 56.8360368022695       Two
##    [3]     chr1       2-7      * | 15.1498265331611       One
##    [4]     chr1       4-8      * | 9.49368453584611       Two
##    [5]     chr1     10-14      * |  76.191659620963       One
##    [6]     chr1      7-10      * | 53.5928548080847       Two
##    [7]     chr1     13-14      * | 32.7880936674774       One
##    [8]     chr1       3-6      * | 92.4356027971953       Two
##    [9]     chr1     20-24      * | 59.8311740672216       One
##   [10]     chr1     11-15      * | 6.46366507280618       Two
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

These work:

r %>% group_by(condition) %>% summarize(score = mean(score))
## DataFrame with 2 rows and 2 columns
##   condition            score
##    <factor>        <numeric>
## 1       One 41.9941666442901
## 2       Two 43.7643688032404
r %>% group_by(condition) %>% summarize(score = mean(width))
## DataFrame with 2 rows and 2 columns
##   condition     score
##    <factor> <numeric>
## 1       One       4.2
## 2       Two       4.4
r %>% group_by(condition) %>% reduce_ranges(score = weighted.mean(score, width))
## GRanges object with 3 ranges and 2 metadata columns:
##       seqnames    ranges strand | condition            score
##          <Rle> <IRanges>  <Rle> |  <factor>        <numeric>
##   [1]     chr1      2-14      * |       One 38.4664801647887
##   [2]     chr1     20-24      * |       One 59.8311740672216
##   [3]     chr1      3-15      * |       Two 40.5111238942482
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

... but this doesn't:

r %>% group_by(condition) %>% summarize(score = weighted.mean(score, width))
## Error in as.vector(x, mode): coercing an AtomicList object to an atomic vector is supported only for
##   objects with top-level elements of length <= 1
sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: Fedora 28 (Workstation Edition)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
## 
## locale:
##  [1] LC_CTYPE=ru_RU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=ru_RU.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] knitr_1.20           rtracklayer_1.40.3   plyranges_1.1.6     
## [4] GenomicRanges_1.32.3 GenomeInfoDb_1.16.0  IRanges_2.14.10     
## [7] S4Vectors_0.18.2     BiocGenerics_0.26.0 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.17                pillar_1.2.3               
##  [3] BiocInstaller_1.30.0        compiler_3.5.0             
##  [5] git2r_0.21.0                XVector_0.20.0             
##  [7] bindr_0.1.1                 bitops_1.0-6               
##  [9] tools_3.5.0                 zlibbioc_1.26.0            
## [11] digest_0.6.15               evaluate_0.10.1            
## [13] tibble_1.4.2                memoise_1.1.0              
## [15] lattice_0.20-35             pkgconfig_2.0.1            
## [17] rlang_0.2.1                 Matrix_1.2-14              
## [19] DelayedArray_0.6.0          curl_3.2                   
## [21] yaml_2.1.19                 bindrcpp_0.2.2             
## [23] GenomeInfoDbData_1.1.0      stringr_1.3.1              
## [25] withr_2.1.2                 httr_1.3.1                 
## [27] dplyr_0.7.5                 Biostrings_2.48.0          
## [29] devtools_1.13.5             tidyselect_0.2.4           
## [31] grid_3.5.0                  glue_1.2.0                 
## [33] Biobase_2.40.0              R6_2.2.2                   
## [35] XML_3.98-1.11               BiocParallel_1.14.1        
## [37] tidyr_0.8.1                 purrr_0.2.4                
## [39] magrittr_1.5                Rsamtools_1.32.0           
## [41] matrixStats_0.53.1          GenomicAlignments_1.16.0   
## [43] assertthat_0.2.0            SummarizedExperiment_1.10.1
## [45] stringi_1.2.2               RCurl_1.95-4.10
@sa-lee
Copy link
Collaborator

sa-lee commented Jun 5, 2018

This is to do with some issues we've been talking about in #30 - I need to handle the case when there is not "fast" method for dispatching the grouped object better.

@lawremi I'm surprised there is no List method for weighted.mean?

@lawremi
Copy link
Collaborator

lawremi commented Jun 6, 2018

This is a limitation in the stats package that I will try to fix. But for now it's a good use case for the fallback to looping.

@sa-lee
Copy link
Collaborator

sa-lee commented Nov 15, 2019

This should be fixed in more recent versions. The more general issue is discussed in #74

@sa-lee sa-lee closed this as completed Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants