Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug in reduce function #607

Open
wong-ziyi opened this issue Nov 15, 2024 · 1 comment
Open

A bug in reduce function #607

wong-ziyi opened this issue Nov 15, 2024 · 1 comment

Comments

@wong-ziyi
Copy link

The use of missing() function in Line 1226 of utils.R makes addIdentificationData() function not working when id is a data.frame in R 4.3.2.

> quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"), full.name = TRUE, pattern = "mzXML$")
> id <- readMzIdData(identFile)
> key <- "spectrumID"
> reduce(id, key)
Error in .local(x, ...) : Need a key column to reduce the data.frame

However, it works when passing key=key to reduce() function as shown below:

> reduce(id, key=key)
                             sequence spectrumID chargeState rank passThreshold          experimentalMassToCharge            calculatedMassToCharge peptideRef modNum
1 VESITARHGEVLQLRPK;IKPQAVIETLHRLTEGK     scan=1         3;3  1;2     TRUE;TRUE 645.374145507812;645.374145507812 645.037475585938;645.045837402344  Pep2;Pep3    0;0
2                       IDGQWVTHQWLKK     scan=2           3    1          TRUE                  546.958618164062                  546.963256835938       Pep1      0
3                             LVILLFR     scan=5           2    1          TRUE                  437.804016113281                  437.299652099609       Pep4      0
      isDecoy post pre   start     end  DatabaseAccess DBseqLength DatabaseSeq                                                                         DatabaseDescription
1 FALSE;FALSE  A;A R;K 170;372 186;388 ECA0984;ECA3829     231;572           ; ECA0984 DNA mismatch repair protein;ECA3829 acetolactate synthase isozyme III large subunit
2       FALSE    A   K      50      62         ECA1028         275                              ECA1028 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-succinyltransferase
3       FALSE    L   K      22      28         ECA0510         166                                        ECA0510 putative capsular polysacharide biosynthesis transferase
  scan.number.s. acquisitionNum                      spectrumFile                          idFile MS.GF.RawScore MS.GF.DeNovoScore          MS.GF.SpecEValue
1            1;1            1;1 dummyiTRAQ.mzXML;dummyiTRAQ.mzXML dummyiTRAQ.mzid;dummyiTRAQ.mzid        -39;-39             77;77 5.527468e-05;5.527468e-05
2              2              2                  dummyiTRAQ.mzXML                 dummyiTRAQ.mzid            -30                39              9.399048e-06
3              5              5                  dummyiTRAQ.mzXML                 dummyiTRAQ.mzid            -42                 5             0.00025778305
         MS.GF.EValue modPeptideRef modName modMass modLocation subOriginalResidue subReplacementResidue subLocation
1 79.369576;79.369576         NA;NA   NA;NA   NA;NA       NA;NA              NA;NA                 NA;NA       NA;NA
2           13.466147          <NA>    <NA>    <NA>        <NA>               <NA>                  <NA>        <NA>
3           366.38422          <NA>    <NA>    <NA>        <NA>               <NA>                  <NA>        <NA>

This is because the missing() function is used to determine whether an argument was supplied. Please check an example below:

> i <- function(a, b) {
+   c(missing(a), missing(b))
+ }
> i()
[1] TRUE TRUE
> i(a = 1)
[1] FALSE  TRUE
> i(b = 2)
[1]  TRUE FALSE
> i(1, 2)
[1] FALSE FALSE
> a<-1
> b<-2
> i(a, b)
[1] FALSE FALSE

Therefore, Line 183 of functions-addIdentificationData.R will disable .addDataFrameIdentificationData () function in any situation.

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=ja_JP.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=ja_JP.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=ja_JP.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C       

time zone: Asia/Tokyo
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reticulate_1.40.0     dplyr_1.1.4           limma_3.58.1          sva_3.50.0            BiocParallel_1.36.0   genefilter_1.84.0     mgcv_1.9-1           
 [8] nlme_3.1-166          reshape2_1.4.4        ggplot2_3.5.1         RColorBrewer_1.1-3    RforProteomics_1.40.0 MSnbase_2.28.1        ProtGenerics_1.34.0  
[15] S4Vectors_0.40.2      mzR_2.36.0            Rcpp_1.0.13-1         Biobase_2.62.0        BiocGenerics_0.48.1  

loaded via a namespace (and not attached):
 [1] DBI_1.2.3               bitops_1.0-9            RBGL_1.78.0             rlang_1.1.4             magrittr_2.0.3          clue_0.3-66             matrixStats_1.4.1      
 [8] compiler_4.3.2          RSQLite_2.3.7           png_0.1-8               vctrs_0.6.5             stringr_1.5.1           crayon_1.5.3            pkgconfig_2.0.3        
[15] fastmap_1.2.0           XVector_0.42.0          utf8_1.2.4              biocViews_1.70.0        rmarkdown_2.29          graph_1.80.0            preprocessCore_1.64.0  
[22] bit_4.5.0               xfun_0.49               zlibbioc_1.48.2         cachem_1.1.0            jsonlite_1.8.9          GenomeInfoDb_1.38.8     blob_1.2.4             
[29] parallel_4.3.2          cluster_2.1.6           R6_2.5.1                stringi_1.8.4           iterators_1.0.14        knitr_1.49              R.utils_2.12.3         
[36] IRanges_2.36.0          Matrix_1.6-5            splines_4.3.2           tidyselect_1.2.1        rstudioapi_0.17.1       yaml_2.3.10             doParallel_1.0.17      
[43] codetools_0.2-20        affy_1.80.0             RUnit_0.4.33            lattice_0.22-6          tibble_3.2.1            plyr_1.8.9              withr_3.0.2            
[50] KEGGREST_1.42.0         evaluate_1.0.1          survival_3.7-0          Biostrings_2.70.3       pillar_1.9.0            affyio_1.72.0           BiocManager_1.30.25    
[57] MatrixGenerics_1.14.0   foreach_1.5.2           MALDIquant_1.22.3       ncdf4_1.23              generics_0.1.3          RCurl_1.98-1.16         munsell_0.5.1          
[64] scales_1.3.0            xtable_1.8-4            glue_1.8.0              tools_4.3.2             mzID_1.40.0             vsn_3.70.0              locfit_1.5-9.10        
[71] annotate_1.80.0         XML_3.99-0.17           grid_4.3.2              impute_1.76.0           edgeR_4.0.16            MsCoreUtils_1.14.1      AnnotationDbi_1.64.1   
[78] colorspace_2.1-1        GenomeInfoDbData_1.2.11 cli_3.6.3               fansi_1.0.6             pcaMethods_1.94.0       gtable_0.3.6            R.methodsS3_1.8.2      
[85] digest_0.6.37           memoise_2.0.1           htmltools_0.5.8.1       R.oo_1.27.0             lifecycle_1.0.4         httr_1.4.7              statmod_1.5.0          
[92] bit64_4.5.2             MASS_7.3-60.0.1  
@wong-ziyi
Copy link
Author

wong-ziyi commented Nov 15, 2024

Moreover, the reduce() function in the Line 183 of functions-addIdentificationData.R shall be moved into a line larger than 195. Because running reduce() after filterIdentificationDataFrame() will cause an error: "Error in !x[, decoy] : invalid argument type". This is because the reduce() function combined multiple logical values (FALSE;TRUE;) into one cell under the "isDecoy" column. Those cells with combined logical values (as characters type) will cause an error of invalid argument type.

> library("msdata")
> f <- "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid"
> idf <- msdata::ident(full.names = TRUE, pattern = f)
> iddf <- readMzIdData(idf)
> iddf <- reduce(iddf, key = "spectrumID")
> iddf <- filterIdentificationDataFrame(iddf, verbose = TRUE)
Starting with 5343 PSMs:
Error in !x[, decoy] : invalid argument type

The error above caused by running the running reduce() before filterIdentificationDataFrame().

> library("msdata")
> f <- "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid"
> idf <- msdata::ident(full.names = TRUE, pattern = f)
> iddf <- readMzIdData(idf)
> iddf <- filterIdentificationDataFrame(iddf, verbose = TRUE)
Starting with 5802 PSMs:
 removed 2896 decoy hits
 removed 155 PSMs with rank > 1
 removed 41 non-proteotypic peptides
2710 PSMs left.
> iddf <- reduce(iddf, key = "spectrumID")

Works normally when running the reduce() after filterIdentificationDataFrame().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant