subsampling LOO estimates with diff-est-srs-wor start #496

avehtari · 2024-04-10T09:01:57Z

Work in progress.

Tested with

family="normal" and stats %in% c("elpd", "mlpd", "gmpd", "mse", "rmse")
family="bernoulli" and stats %in% c("acc","pctcorr")
family="poisson" usings trad and latent approaches
tests pass expect some plot tests fail with svglite: ... Graphics API version mismatch

Notes

n_loo matters only if validate_search = TRUE (with FALSE there is no speed advantage)
in cv_varsel if nloo<n and fast PSIS-LOO result is not yet available, fast PSIS-LOO result is computed
in cv_varsel if nloo<n, fast PSIS-LOO result is stored in slot $summaries_fast
the subsampling indices are stored in slot $loo_inds
the actual subsampling estimating happens in summary_funs.R get_stat()
removed some NA checking and need to recheck if those need to put back
improved quantiles for "mse" if the value is close to 0 (can't get negative lq)
"auc" is not supported (complicated)

add support for incrementally increasing nloo?

tagging @n-kall

This reverts commit 781e331.

avehtari · 2024-04-15T11:09:05Z

I wanted to use R2, and as I had rewrote summary stats anyway, added R2 and made all mse, rmse, and R2 to use only normal approximation with as much shared computation as possible

With the added R2 support, this PR will close also #483

n-kall · 2024-04-16T13:23:14Z

I can take a look

R/cv_varsel.R

R/glmfun.R

R/misc.R

fweber144 · 2024-04-21T20:05:49Z

I have added some comments, but I'm not done with the review yet.

Besides, I think documentation needs to be updated (at least re-roxygenized, but also progressr should be mentioned), the vignettes perhaps as well, and I haven't run R CMD check (including the unit tests) yet. The NEWS file would also need to be updated.

applied to the subsampled LOO results)

also need to be applied to the subsampled LOO results)

fweber144 · 2024-09-26T21:42:59Z

While working on the tests, I discovered a few bugs and fixed them. However (apart from what is still open in the comments above), we still need to think about the following:

In cv_varsel.vsel(), is it correct to treat summaries_fast like the search-related arguments? Or do we need to set summaries_fast to NULL if not usable, just like rk_foldwise?

In case of not-subsampled LOO, I think we can "usually" (for latent projection, this requires a well-behaving family$latent_ilink() function) expect no NAs in reference and standard submodel summaries (mu and lppd). In case of subsampled LOO, I think we can expect no NAs in reference and fast submodel summaries. So in such cases, should (unexpected) NAs in the corresponding summaries be left as-is so that downstream code "sees" them? What I have in mind is for example whether we can remove the na.rm = TRUE part from

projpred/R/summary_funs.R

Lines 305 to 306 in 7469aea

    
           value <- sum(lppd - lppd_baseline, na.rm = TRUE) 
        
           value_se <-sd(lppd - lppd_baseline, na.rm = TRUE) * sqrt(n_full)

so that the default of na.rm = FALSE is used.

For the tests, I think it would make sense to add subsampled-LOO cv_varsel() calls already in setup.R so that all post-processing functions are tested for such subsampled-LOO cv_varsel() output. When doing so, we should pay attention to cover augmented-data and latent projection cases.

Finally, while experimenting with the tests, I discovered some strange NAs in the summaries_sub (of .tabulate_stats()) after running

projpred/tests/testthat/test_varsel.R

Lines 1452 to 1456 in 7469aea

    
           cvvs_nloo <- suppressWarnings(do.call(cv_varsel, c( 
        
             list(object = refmods[[args_cvvs_i$tstsetup_ref]], 
        
                  nloo = nloo_tst), 
        
             excl_nonargs(args_cvvs_i) 
        
           )))

(slightly adapted, namely using excl_nonargs(args_cvvs_i, nms_excl_add = "validate_search")) on tstsetup <- "rstanarm.glm.cumul.stdformul.without_wobs.with_offs.latent.default_meth.default_cvmeth.default_search_trms". This needs to be investigated.

`validate_search = FALSE`, the search is not run again when creating `summaries_fast`. Only the performance evaluation (including the re-projections required for it) is run again. Hence, it would be inconsistent to treat `summaries_fast` like the search-related arguments of `cv_varsel.refmodel()` when calling `cv_varsel.refmodel()` from within `cv_varsel.vsel()`. Thus, a lot of code related to `summaries_fast` can be removed, which is done here.

avehtari · 2024-10-10T18:12:50Z

In cv_varsel.vsel(), is it correct to treat summaries_fast like the search-related arguments? Or do we need to set summaries_fast to NULL if not usable, just like rk_foldwise?

I think I have forgotten the context to understand this question

In case of not-subsampled LOO, I think we can "usually" (for latent projection, this requires a well-behaving family$latent_ilink() function) expect no NAs in reference and standard submodel summaries (mu and lppd). In case of subsampled LOO, I think we can expect no NAs in reference and fast submodel summaries. So in such cases, should (unexpected) NAs in the corresponding summaries be left as-is so that downstream code "sees" them? What I have in mind is for example whether we can remove the na.rm = TRUE part from

Fine for me. NA handling was there before I started this PR, and I left most of that there.

fweber144 · 2024-10-12T19:35:22Z

In cv_varsel.vsel(), is it correct to treat summaries_fast like the search-related arguments? Or do we need to set summaries_fast to NULL if not usable, just like rk_foldwise?

I think I have forgotten the context to understand this question

No problem, I think I figured this out and pushed commit 0dcfcf2 which resolves this.

In case of not-subsampled LOO, I think we can "usually" (for latent projection, this requires a well-behaving family$latent_ilink() function) expect no NAs in reference and standard submodel summaries (mu and lppd). In case of subsampled LOO, I think we can expect no NAs in reference and fast submodel summaries. So in such cases, should (unexpected) NAs in the corresponding summaries be left as-is so that downstream code "sees" them? What I have in mind is for example whether we can remove the na.rm = TRUE part from

Fine for me. NA handling was there before I started this PR, and I left most of that there.

Ok, I will check this in detail and will commit changes (if possible).

why some test snapshots changed unexpectedly).

fweber144 · 2024-10-16T21:16:07Z

I just pushed commit 6151b0c which reverts changes that are unrelated to subsampled LOO-CV because I observed some unexpected changes in the test snapshots. The state before reverting these subsampling-unrelated changes is now in branch misc_from_fix-subsampling. It turned out that the unexpected snapshot changes were probably due to the changes concerning SIS and PSIS in lines

projpred/R/cv_varsel.R

Lines 797 to 891 in 6151b0c

    
           if (nrow(log_lik_ref) > 1) { 
        
             # Use loo::sis() if the projected draws (i.e., the draws resulting 
        
             # from the clustering or thinning) have nonconstant weights: 
        
             if (refdist_eval$const_wdraws_prj) { 
        
               # Internally, loo::psis() doesn't perform the Pareto smoothing if the 
        
               # number of draws is small (as indicated by object `no_psis_eval`, see 
        
               # below). In projpred, this can occur, e.g., if users request a number 
        
               # of projected draws (for performance evaluation, either after 
        
               # clustering or thinning the reference model's posterior draws) that is 
        
               # much smaller than the default of 400. In order to throw a customized 
        
               # warning message (and to avoid the calculation of Pareto k-values, see 
        
               # loo issue stan-dev/loo#227), object `no_psis_eval` indicates whether 
        
               # loo::psis() would perform the Pareto smoothing or not (for the 
        
               # decision rule, see loo:::n_pareto() and loo:::enough_tail_samples(), 
        
               # keeping in mind that we have `r_eff = 1` for all observations here). 
        
               S_for_psis_eval <- nrow(log_lik_ref) 
        
               no_psis_eval <- ceiling(min(0.2 * S_for_psis_eval, 
        
                                           3 * sqrt(S_for_psis_eval))) < 5 
        
               if (no_psis_eval) { 
        
                 if (getOption("projpred.warn_psis", TRUE)) { 
        
                   warning( 
        
                     "Using standard importance sampling (SIS), as the number of ", 
        
                     "draws or clusters is too small for PSIS. For improved ", 
        
                     "accuracy, increase the number of draws or clusters, or use ", 
        
                     "K-fold-CV." 
        
                   ) 
        
                 } 
        
                 # Use loo::sis(). 
        
                 # In principle, we could rely on loo::psis() here (because in such a 
        
                 # case, it would internally switch to SIS automatically), but using 
        
                 # loo::sis() explicitly is safer because if the loo package changes 
        
                 # its decision rule, we would get a mismatch between our customized 
        
                 # warning here and the IS method used by loo. See also loo issue 
        
                 # stan-dev/loo#227. 
        
                 importance_sampling_nm <- "sis" 
        
               } else { 
        
                 # Use loo::psis(). 
        
                 # Usually, we have a small number of projected draws here (400 by 
        
                 # default), which means that the 'loo' package will automatically 
        
                 # perform the regularization from Vehtari et al. (2024, 
        
                 # <https://jmlr.org/papers/v25/19-556.html>, appendix G). 
        
                 importance_sampling_nm <- "psis" 
        
               } 
        
             } else { 
        
               if (getOption("projpred.warn_psis", TRUE)) { 
        
                 warning( 
        
                   "The projected draws used for the performance evaluation have ", 
        
                   "different (i.e., nonconstant) weights, so using standard ", 
        
                   "importance sampling (SIS) instead of Pareto-smoothed importance ", 
        
                   "sampling (PSIS). In general, PSIS is recommended over SIS." 
        
                 ) 
        
               } 
        
               # Use loo::sis(). 
        
               importance_sampling_nm <- "sis" 
        
             } 
        
             importance_sampling_func <- get(importance_sampling_nm, 
        
                                             asNamespace("loo")) 
        
             mssgs_warns_capt <- capt_mssgs_warns( 
        
               sub_psisloo <- importance_sampling_func(-log_lik_ref, cores = 1, 
        
                                                       r_eff = NA) 
        
             ) 
        
             mssgs_warns_capt <- setdiff(mssgs_warns_capt, "") 
        
             # Filter out Pareto k-value warnings (we throw a customized one instead): 
        
             mssgs_warns_capt <- grep( 
        
               "Some Pareto k diagnostic values are (too|slightly) high", 
        
               mssgs_warns_capt, value = TRUE, invert = TRUE 
        
             ) 
        
             if (length(mssgs_warns_capt) > 0) { 
        
               warning(mssgs_warns_capt) 
        
             } 
        
             if (importance_sampling_nm == "psis") { 
        
               pareto_k_eval <- loo::pareto_k_values(sub_psisloo) 
        
               warn_pareto( 
        
                 n07 = sum(pareto_k_eval > .ps_khat_threshold(dim(psisloo)[1])), n = n, 
        
                 khat_threshold = .ps_khat_threshold(dim(sub_psisloo)[1]), 
        
                 warn_txt = paste0( 
        
                   "Some (%d / %d) Pareto k's for the reference model's PSIS-LOO ", 
        
                   "weights given ", 
        
                   ifelse(clust_used_eval, 
        
                          paste0(nclusters_pred, " clustered "), 
        
                          paste0(ndraws_pred, " posterior ")), 
        
                   "draws are > %s." 
        
                 ) 
        
               ) 
        
             } 
        
             lw_sub <- weights(sub_psisloo) 
        
           } else { 
        
             lw_sub <- matrix(0, nrow = nrow(log_lik_ref), ncol = ncol(log_lik_ref)) 
        
           } 
        
           # Take into account that clustered draws usually have different weights: 
        
           lw_sub <- lw_sub + log(refdist_eval$wdraws_prj) 
        
           # This re-weighting requires a re-normalization (as.array() is applied to 
        
           # have stricter consistency checks, see `?sweep`): 
        
           lw_sub <- sweep(lw_sub, 2, as.array(apply(lw_sub, 2, log_sum_exp)))

(which used to read

projpred/R/cv_varsel.R

Lines 819 to 902 in 0dcfcf2

    
           if (nrow(log_lik_ref) > 1) { 
        
             # Take into account that clustered draws usually have different weights: 
        
             lw_sub <- log_lik_ref + log(refdist_eval$wdraws_prj) 
        
             # This re-weighting requires a re-normalization (as.array() is applied to 
        
             # have stricter consistency checks, see `?sweep`): 
        
             lw_sub <- sweep(lw_sub, 2, as.array(apply(lw_sub, 2, log_sum_exp))) 
        
             # Internally, loo::psis() doesn't perform the Pareto smoothing if the 
        
             # number of draws is small (as indicated by object `no_psis_eval`, see 
        
             # below). In projpred, this can occur, e.g., if users request a number 
        
             # of projected draws (for performance evaluation, either after 
        
             # clustering or thinning the reference model's posterior draws) that is 
        
             # much smaller than the default of 400. In order to throw a customized 
        
             # warning message (and to avoid the calculation of Pareto k-values, see 
        
             # loo issue stan-dev/loo#227), object `no_psis_eval` indicates whether 
        
             # loo::psis() would perform the Pareto smoothing or not (for the 
        
             # decision rule, see loo:::n_pareto() and loo:::enough_tail_samples(), 
        
             # keeping in mind that we have `r_eff = 1` for all observations here). 
        
             S_for_psis_eval <- nrow(log_lik_ref) 
        
             no_psis_eval <- ceiling(min(0.2 * S_for_psis_eval, 
        
                                         3 * sqrt(S_for_psis_eval))) < 5 
        
             if (no_psis_eval) { 
        
               if (getOption("projpred.warn_psis", TRUE)) { 
        
                 message( 
        
                   "Using standard importance sampling (SIS) due to a small number of", 
        
                   ifelse(refit_prj, 
        
                          ifelse(!is.null(nclusters_pred), 
        
                                 " clusters", 
        
                                 " draws (from thinning)"), 
        
                          ifelse(!is.null(nclusters), 
        
                                 " clusters", 
        
                                 " draws (from thinning)")) 
        
                 ) 
        
               } 
        
               # Use loo::sis(). 
        
               # In principle, we could rely on loo::psis() here (because in such a 
        
               # case, it would internally switch to SIS automatically), but using 
        
               # loo::sis() explicitly is safer because if the loo package changes 
        
               # its decision rule, we would get a mismatch between our customized 
        
               # warning here and the IS method used by loo. See also loo issue 
        
               # stan-dev/loo#227. 
        
               importance_sampling_nm <- "sis" 
        
             } else { 
        
               # Use loo::psis(). 
        
               # Usually, we have a small number of projected draws here (400 by 
        
               # default), which means that the 'loo' package will automatically 
        
               # perform the regularization from Vehtari et al. (2022, 
        
               # <https://doi.org/10.48550/arXiv.1507.02646>, appendix G). 
        
               importance_sampling_nm <- "psis" 
        
             } 
        
             importance_sampling_func <- get(importance_sampling_nm, 
        
                                             asNamespace("loo")) 
        
             mssgs_warns_capt <- capt_mssgs_warns( 
        
               sub_psisloo <- importance_sampling_func(-log_lik_ref, cores = 1, 
        
                                                       r_eff = NA) 
        
             ) 
        
             mssgs_warns_capt <- setdiff(mssgs_warns_capt, "") 
        
             # Filter out Pareto k-value warnings (we throw a customized one instead): 
        
             mssgs_warns_capt <- grep( 
        
               "Some Pareto k diagnostic values are (too|slightly) high", 
        
               mssgs_warns_capt, value = TRUE, invert = TRUE 
        
             ) 
        
             if (length(mssgs_warns_capt) > 0) { 
        
               warning(mssgs_warns_capt) 
        
             } 
        
             if (importance_sampling_nm == "psis") { 
        
               pareto_k_eval <- loo::pareto_k_values(sub_psisloo) 
        
               warn_pareto( 
        
                 n07 = sum(pareto_k_eval > .ps_khat_threshold(dim(psisloo)[1])), n = n, 
        
                 khat_threshold = .ps_khat_threshold(dim(sub_psisloo)[1]), 
        
                 warn_txt = paste0( 
        
                   "Some (%d / %d) Pareto k's for the reference model's PSIS-LOO ", 
        
                   "weights given ", 
        
                   ifelse(clust_used_eval, 
        
                          paste0(nclusters_pred, " clustered "), 
        
                          paste0(ndraws_pred, " posterior ")), 
        
                   "draws are > %s." 
        
                 ) 
        
               ) 
        
             } 
        
             lw_sub <- weights(sub_psisloo) 
        
           } else { 
        
             lw_sub <- matrix(0, nrow = nrow(log_lik_ref), ncol = ncol(log_lik_ref)) 
        
           }

before reverting the subsampling-unrelated changes). I think there is a bug in the SIS / PSIS changes which I will comment on above (I had some further questions concerning those lines a while ago). I would suggest to keep this PR as much restricted to subsampled LOO-CV as possible and to create a separate PR for the subsampling-unrelated changes (which are now in branch misc_from_fix-subsampling) after this PR has been merged. And, of course, there is still branch mixed_deltas_plot which needs to be worked on and eventually merged as well.

…mpled LOO-CV.

fweber144 · 2024-10-22T20:09:43Z

I just pushed commit 360354a which ensures that the tests run through with this PR. The changes in the test snapshots are now as expected. However, the parts

For the tests, I think it would make sense to add subsampled-LOO cv_varsel() calls already in setup.R so that all post-processing functions are tested for such subsampled-LOO cv_varsel() output. When doing so, we should pay attention to cover augmented-data and latent projection cases.

and

Finally, while experimenting with the tests, I discovered some strange NAs in the summaries_sub (of .tabulate_stats()) after running

projpred/tests/testthat/test_varsel.R

Lines 1452 to 1456 in 7469aea

cvvs_nloo <- suppressWarnings(do.call(cv_varsel, c(

list(object = refmods[[args_cvvs_i$tstsetup_ref]],

nloo = nloo_tst),

excl_nonargs(args_cvvs_i)

)))

(slightly adapted, namely using excl_nonargs(args_cvvs_i, nms_excl_add = "validate_search")) on tstsetup <- "rstanarm.glm.cumul.stdformul.without_wobs.with_offs.latent.default_meth.default_cvmeth.default_search_trms". This needs to be investigated.

from #496 (comment) are still to-do.

This reverts commit 0bd3830. Reason for the revert: The "fix" from that commit would have to take `summaries_fast` into account as well. For simplicity, we will avoid the early check for all-`NA`s in `get_stat()` in its entirety.

this ensures `all(wobs == 1)` after lines ``` wobs <- rep(wobs, y_wobs_test$wobs) wobs <- n_full * wobs / sum(wobs) ``` (but note that `NA`s in `mu` should lead to `NA` output anyway).

fweber144 · 2024-11-17T15:18:53Z

I have now investigated the NA handling in detail, as discussed in

In case of not-subsampled LOO, I think we can "usually" (for latent projection, this requires a well-behaving family$latent_ilink() function) expect no NAs in reference and standard submodel summaries (mu and lppd). In case of subsampled LOO, I think we can expect no NAs in reference and fast submodel summaries. So in such cases, should (unexpected) NAs in the corresponding summaries be left as-is so that downstream code "sees" them? What I have in mind is for example whether we can remove the na.rm = TRUE part from

Fine for me. NA handling was there before I started this PR, and I left most of that there.

Ok, I will check this in detail and will commit changes (if possible).

above. It turned out that we can indeed leave (unexpected) NAs in the corresponding summaries as-is so that downstream code "sees" them. Hence, I pushed 5 commits (0bd3830 to 8666643) which accomplish this. At first, I tried to fix the existing early return for all-NAs in get_stat(), but I realized that this gets cumbersome and error-prone (see the message for the reverting commit 3923005).

To ensure that NAs are propagated from input to output of get_stat(), I also added NA tests for .srs_diff_est_w(), weighted.sd(), and auc() (functions which are used by get_stat() and where NA handling is not obvious).

Concerning the tests, it would be good to add more detailed (not only NA-related) tests for .srs_diff_est_w(). There is a test in the loo package, lines https://github.com/stan-dev/loo/blob/49642207e6eaf5c34edebb9e6e23ff5a14b795d9/tests/testthat/test_loo_subsampling.R#L642-L676. @avehtari: Would it be ok to copy those lines over to projpred?

avehtari · 2024-11-17T15:24:46Z

Would it be ok to copy those lines over to projpred?

Sure

fweber144 · 2024-11-17T15:40:04Z

Would it be ok to copy those lines over to projpred?

Sure

Great, I'll do this asap.

fweber144 · 2024-11-17T15:42:57Z

A new issue I discovered:

In get_stat(), I think we need to replace at least one n_full division with an (n_full - 1) division, namely in lines

projpred/R/summary_funs.R

Lines 353 to 354 in 8666643

    
           cov_mse_e_b <- mean(wobs * ((mu - y)^2 - mse_e) * 
        
                                 ((mu_baseline - y)^2 - mse_b)) / n_full

(concerns MSE, RMSE, and R2). Currently,

data("df_binom", package = "projpred")
dat <- data.frame(y = df_binom$y, df_binom$x)
rfit <- rstanarm::stan_glm(y ~ X1 + X2 + X3, family = binomial(), data = dat,
                           chains = 1, iter = 500, seed = 987, refresh = 0)
devtools::load_all()
cvvs <- cv_varsel(rfit, cv_method = "kfold", validate_search = FALSE, method = "L1",
                  nclusters_pred = 2, seed = 123, latent = TRUE)
summary(cvvs, deltas = TRUE, stats = c("elpd", "mlpd", "gmpd", "mse", "rmse", "acc", "auc"), seed = 456)

gives me a non-zero mse.se and rmse.se for the reference model (note the deltas = TRUE in that last summary() call):

------
Response-scale family:

Family: binomial 
Link function: logit 

------
Latent-scale family:

Family: gaussian 
Link function: identity 

------
Formula: .y ~ X1 + X2 + X3
Observations: 100
Projection method: latent
CV method: K-fold CV with K = 5 and search not included (i.e., a full-data search only)
Search method: L1
Maximum submodel size for the search: 3
Number of projected draws in the search: 1 (from clustered projection)
Number of projected draws in the performance evaluation: 2 (from clustered projection)
Argument `refit_prj`: TRUE

Submodel performance evaluation summary (response scale) with `deltas = TRUE` and `cumulate = FALSE`:
 size ranking_fulldata cv_proportions_diag  elpd elpd.se    mlpd mlpd.se gmpd gmpd.se     mse mse.se
    0      (Intercept)                  NA -2.32    3.93 -0.0232  0.0393 0.98  0.0384  0.0137 0.0172
    1               X2                  NA  0.56    2.38  0.0056  0.0238 1.01  0.0240 -0.0019 0.0101
    2               X3                  NA  0.29    0.71  0.0029  0.0071 1.00  0.0071 -0.0021 0.0040
    3               X1                  NA -0.49    0.31 -0.0049  0.0031 1.00  0.0031  0.0013 0.0027
    rmse rmse.se   acc acc.se     auc auc.se
  0.0139  0.0173 -0.08  0.060 -0.2029 0.0738
 -0.0020  0.0105 -0.03  0.039  0.0085 0.0351
 -0.0022  0.0041  0.00  0.014  0.0150 0.0122
  0.0014  0.0028  0.00  0.000 -0.0012 0.0023

Reference model performance evaluation summary (response scale) with `deltas = TRUE`:
   elpd elpd.se    mlpd mlpd.se    gmpd gmpd.se     mse  mse.se    rmse rmse.se     acc  acc.se 
 0.0000  0.0000  0.0000  0.0000  1.0000  0.0000  0.0000  0.0025  0.0000  0.0026  0.0000  0.0000 
    auc  auc.se 
 0.0000  0.0000

The same holds for the "best" submodel (the size 1 submodel) in

summary(cvvs, deltas = TRUE, stats = c("elpd", "mlpd", "gmpd", "mse", "rmse", "acc", "auc"), seed = 456,
        baseline = "best")

which yields

------
Response-scale family:

Family: binomial 
Link function: logit 

------
Latent-scale family:

Family: gaussian 
Link function: identity 

------
Formula: .y ~ X1 + X2 + X3
Observations: 100
Projection method: latent
CV method: K-fold CV with K = 5 and search not included (i.e., a full-data search only)
Search method: L1
Maximum submodel size for the search: 3
Number of projected draws in the search: 1 (from clustered projection)
Number of projected draws in the performance evaluation: 2 (from clustered projection)
Argument `refit_prj`: TRUE

Submodel performance evaluation summary (response scale) with `deltas = TRUE` and `cumulate = FALSE`:
 size ranking_fulldata cv_proportions_diag  elpd elpd.se    mlpd mlpd.se gmpd gmpd.se      mse
    0      (Intercept)                  NA -2.88     3.4 -0.0288   0.034 0.97   0.033  0.01559
    1               X2                  NA  0.00     0.0  0.0000   0.000 1.00   0.000  0.00000
    2               X3                  NA -0.27     2.3 -0.0027   0.023 1.00   0.023 -0.00018
    3               X1                  NA -1.05     2.5 -0.0105   0.025 0.99   0.024  0.00325
 mse.se     rmse rmse.se   acc acc.se     auc auc.se
 0.0148  0.01591  0.0148 -0.05  0.056 -0.2114  0.075
 0.0022  0.00000  0.0023  0.00  0.000  0.0000  0.000
 0.0098 -0.00018  0.0101  0.03  0.036  0.0065  0.034
 0.0104  0.00336  0.0107  0.03  0.039 -0.0097  0.036

Reference model performance evaluation summary (response scale) with `deltas = TRUE`:
   elpd elpd.se    mlpd mlpd.se    gmpd gmpd.se     mse  mse.se    rmse rmse.se     acc  acc.se 
-0.5598  2.3835 -0.0056  0.0238  0.9944  0.0237  0.0019  0.0101  0.0020  0.0104  0.0300  0.0388 
    auc  auc.se 
-0.0085  0.0351

When dividing by (n_full - 1) instead of n_full in lines

projpred/R/summary_funs.R

Lines 353 to 354 in 8666643

    
           cov_mse_e_b <- mean(wobs * ((mu - y)^2 - mse_e) * 
        
                                 ((mu_baseline - y)^2 - mse_b)) / n_full

I get (numerically) zero mse.se and rmse.se for the reference model and the "best" submodel (both with deltas = TRUE) as desired.

When changing this, we should also check for similar division issues in get_stat() (in particular, concerning performance statistics other than MSE and RMSE, but also concerning other MSE and RMSE code).

@avehtari: Could you check this (and fix, if it is really incorrect)?

avehtari · 2024-11-17T16:08:28Z

@avehtari: Could you check this (and fix, if it is really incorrect)?

Dividing by n_full-1 makes sense as we're using sd() for full LOO estimator. I don't have time to do any coding until after November 25

<#496 (comment)>

<#496 (comment)>)

fweber144 · 2024-12-22T20:05:54Z

@avehtari: Could you check this (and fix, if it is really incorrect)?

Dividing by n_full-1 makes sense as we're using sd() for full LOO estimator. I don't have time to do any coding until after November 25

Now done in commit 0a6b8ba. I hope I have found all places where this change was necessary.

fweber144 · 2024-12-22T20:06:54Z

Would it be ok to copy those lines over to projpred?

Sure

Great, I'll do this asap.

Done in commit e8f17e8.

avehtari added 8 commits April 10, 2024 11:45

subsampling LOO estimates with diff-est-srs-wor start

bba6adf

put back unnecessarily removed weights back in kfold

781e331

Revert "put back unnecessarily removed weights back in kfold"

d50f4bd

This reverts commit 781e331.

subsampling LOO for acc and pctcor

4de50b7

ignore nloo if validate_search=FALSE

c50bdbf

fix tests

649f0ad

fix mse interval for delta=TRUE

aba8670

don't stop due to repeated arguments

81b5dc3

avehtari requested a review from fweber144 April 12, 2024 13:03

normal approximation for mse, rmse, and R2

0ed8391

n-kall self-assigned this Apr 16, 2024

avehtari requested a review from n-kall April 16, 2024 13:25

avehtari assigned avehtari and unassigned n-kall Apr 16, 2024

avehtari added 5 commits April 16, 2024 18:15

rename internal function select -> .select

a36a3d8

with delta and mse/rmse/R2/acc/pctcorr/auc, plot values in orig scale

2c846a3

don't warn about subsampling

fbda70e

improve messages

1fa7fcd

if available, use progressr for parallel progress bar

7e7fc7a