Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r2 for hurdle models: different values between pscl and glmmTMB #612

Closed
MarcRieraDominguez opened this issue Sep 12, 2023 · 2 comments
Closed
Labels
3 investigators ❔❓ Need to look further into this issue

Comments

@MarcRieraDominguez
Copy link

MarcRieraDominguez commented Sep 12, 2023

Hi! Congratulations on the great package! I've noticed an unexpected behaviour: r2() for hurdle models is different depending on the package used. If one uses r2(), hurdle models fitted with pscl default to r2_zeroinflated(), while glmmTMB ones don't. Moreover, r2_zeroinflated() do not return the same values, perhaps due to small differences in fitted values? None of this is particularly worrisome, and possibly it is not even related to the performance package (but rather to pscl and glmmTMB) but I thought it best to report it here. Cheers!

library(glmmTMB)
#> Warning: package 'glmmTMB' was built under R version 4.2.3
#> Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
#> TMB was built with Matrix version 1.6.1
#> Current Matrix version is 1.4.1
#> Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
library(performance)
#> Warning: package 'performance' was built under R version 4.2.3
library(pscl)
#> Warning: package 'pscl' was built under R version 4.2.3
#> Classes and Methods for R developed in the
#> Political Science Computational Laboratory
#> Department of Political Science
#> Stanford University
#> Simon Jackman
#> hurdle and zeroinfl functions by Achim Zeileis


data("bioChemists", package = "pscl")
summary(bioChemists)
#>       art            fem           mar           kid5             phd       
#>  Min.   : 0.000   Men  :494   Single :309   Min.   :0.0000   Min.   :0.755  
#>  1st Qu.: 0.000   Women:421   Married:606   1st Qu.:0.0000   1st Qu.:2.260  
#>  Median : 1.000                             Median :0.0000   Median :3.150  
#>  Mean   : 1.693                             Mean   :0.4951   Mean   :3.103  
#>  3rd Qu.: 2.000                             3rd Qu.:1.0000   3rd Qu.:3.920  
#>  Max.   :19.000                             Max.   :3.0000   Max.   :4.620  
#>       ment       
#>  Min.   : 0.000  
#>  1st Qu.: 3.000  
#>  Median : 6.000  
#>  Mean   : 8.767  
#>  3rd Qu.:12.000  
#>  Max.   :77.000


list.mod <- list(
  hurdle = pscl::hurdle(art ~ fem + mar, data = bioChemists, dist = "poisson", zero.dist = "binomial", link = "logit"),
  glmmtmb = glmmTMB::glmmTMB(art ~ fem + mar, data = bioChemists, family = truncated_poisson(), ziformula = ~ fem + mar)
)


lapply(list.mod, summary)
#> $hurdle
#> 
#> Call:
#> pscl::hurdle(formula = art ~ fem + mar, data = bioChemists, dist = "poisson", 
#>     zero.dist = "binomial", link = "logit")
#> 
#> Pearson residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.1392 -1.0178 -0.3446  0.3905 10.2843 
#> 
#> Count model coefficients (truncated poisson with log link):
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.847598   0.064084  13.226  < 2e-16 ***
#> femWomen    -0.237351   0.064199  -3.697 0.000218 ***
#> marMarried   0.008846   0.066944   0.132 0.894867    
#> Zero hurdle model coefficients (binomial with logit link):
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.90794    0.15640   5.805 6.42e-09 ***
#> femWomen    -0.24195    0.14923  -1.621    0.105    
#> marMarried   0.07802    0.15636   0.499    0.618    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
#> 
#> Number of iterations in BFGS optimization: 9 
#> Log-likelihood: -1670 on 6 Df
#> 
#> $glmmtmb
#>  Family: truncated_poisson  ( log )
#> Formula:          art ~ fem + mar
#> Zero inflation:       ~fem + mar
#> Data: bioChemists
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>   3352.2   3381.1  -1670.1   3340.2      909 
#> 
#> 
#> Conditional model:
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.847585   0.064084  13.226  < 2e-16 ***
#> femWomen    -0.237339   0.064199  -3.697 0.000218 ***
#> marMarried   0.008862   0.066944   0.132 0.894684    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Zero-inflation model:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -0.90794    0.15640  -5.805 6.42e-09 ***
#> femWomen     0.24195    0.14923   1.621    0.105    
#> marMarried  -0.07802    0.15636  -0.499    0.618    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


cor(fitted(list.mod$hurdle), fitted(list.mod$glmmtmb))
#> [1] 1


all.equal(fitted(list.mod$hurdle), fitted(list.mod$glmmtmb))
#> [1] "names for target but not for current"  
#> [2] "Mean relative difference: 4.620395e-06"

lapply(list.mod, performance::r2)
#> Warning: mu of 2.1 is too close to zero, estimate of random effect variances may
#>   be unreliable.
#> Random effect variances not available. Returned R2 does not account for random effects.
#> $hurdle
#> # R2 for Zero-Inflated and Hurdle Regression
#>        R2: 0.028
#>   adj. R2: 0.025
#> 
#> $glmmtmb
#> # R2 for Mixed Models
#> 
#>   Conditional R2: NA
#>      Marginal R2: 0.095


lapply(list.mod, performance::r2_zeroinflated)
#> $hurdle
#> # R2 for Zero-Inflated and Hurdle Regression
#>        R2: 0.028
#>   adj. R2: 0.025
#> 
#> $glmmtmb
#> # R2 for Zero-Inflated and Hurdle Regression
#>        R2: 0.031
#>   adj. R2: 0.027


sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8   
#> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] pscl_1.5.5.1       performance_0.10.4 glmmTMB_1.1.7     
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.10         compiler_4.2.0      nloptr_2.0.1       
#>  [4] highr_0.9           TMB_1.9.6           tools_4.2.0        
#>  [7] boot_1.3-28         digest_0.6.29       lme4_1.1-29        
#> [10] evaluate_0.15       lifecycle_1.0.3     nlme_3.1-157       
#> [13] lattice_0.20-45     rlang_1.1.1         reprex_2.0.2       
#> [16] Matrix_1.4-1        cli_3.6.1           rstudioapi_0.13    
#> [19] yaml_2.3.5          mvtnorm_1.1-3       xfun_0.40          
#> [22] fastmap_1.1.0       coda_0.19-4         withr_2.5.0        
#> [25] stringr_1.5.0       knitr_1.39          fs_1.5.2           
#> [28] vctrs_0.6.3         grid_4.2.0          glue_1.6.2         
#> [31] survival_3.3-1      rmarkdown_2.14      multcomp_1.4-19    
#> [34] TH.data_1.1-1       minqa_1.2.4         magrittr_2.0.3     
#> [37] codetools_0.2-18    htmltools_0.5.6     emmeans_1.8.8      
#> [40] splines_4.2.0       MASS_7.3-57         insight_0.19.1.3   
#> [43] xtable_1.8-4        numDeriv_2016.8-1.1 sandwich_3.0-1     
#> [46] stringi_1.7.6       estimability_1.4.1  zoo_1.8-10

Created on 2023-09-12 with reprex v2.0.2

@strengejacke
Copy link
Member

It looks like stats::residuals(m1, type = "pearson") returns slightly different results for the two models, thus, the small differences in R2.

@strengejacke strengejacke added the 3 investigators ❔❓ Need to look further into this issue label Sep 13, 2023
@strengejacke
Copy link
Member

Closing for now, as there seems to be no issue in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ Need to look further into this issue
Projects
None yet
Development

No branches or pull requests

2 participants