You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in doing variable selection on just a subset of variables, while including the rest in all submodels. This naturally leads to the use of the search_terms argument within cv_varsel. Assume I have a data set of 5 covariates, a treatment variable t and a response y. I let one of the 5 covariates be a confounder, so I am interested in doing variable selection of the remaining 4 covariates, while including t and the confounder x1 in each submodel. A reproducible example of the data generating process, reference model construction and cv_varsel call are below:
library(brms)
library(projpred)
#function to create the space of submodels to consider
get_search_terms <- function(idx_select, idx_fixed = NULL){
search_terms <- c()
n_select <- length(idx_select)
for(i in 1:n_select){
search_terms <- c(search_terms, apply(combn(idx_select[1:n_select], i), 2, function(x) paste(paste0("x", x),collapse = "+")))
}
fixed_terms <- ''
if(length(idx_fixed)!=0){
fixed_terms <- paste(paste0('x', idx_fixed), collapse = '+')
search_terms <- paste(fixed_terms, search_terms,sep = '+')
base_model <- paste('t',fixed_terms,sep='+')
}else{
base_model <- 't'
}
search_terms <- c(base_model,paste0('t+', search_terms))
return(search_terms)
}
set.seed(2)
N <- 100
p <- 5 #number of additional parameters
p_conf <- 1 #number of confounding variables
p_rel <- p_conf + 2 #number of "relevant" parameters affecting y, excluding t
dat <- as.data.frame(matrix(rnorm(N*p), nrow = N, ncol = p)) #initialize data frame with p covariates
names(dat) <- paste0('x', 1:p)
if(p_conf>0){
t_betas <- rnorm(p_conf) # effects of confounders on treatment
dat$t <- rnorm(N, as.matrix(dat[, paste0('x', 1:p_conf)]) %*% t_betas) #treatment is a noisy observation of a linear combination of confounders
}else{
dat$t <- rnorm(N) #no confounders
}
y_betas <- rnorm(p_rel+1) # effects of treatment and "relevant" parameters on response y
dat$y <- rnorm(N, mean=as.matrix(dat[, c('t', paste0('x', 1:p_rel))]) %*% y_betas) #y is a noisy observation of a linear combination of relevant parameters and treatment
formula_all <- as.formula(paste0('y~', paste(c('t', paste0('x', 1:p)), collapse = '+')))
ref_mod <- brm(formula_all, data = dat, refresh = 0) # fit reference model
possible_mods <- get_search_terms(idx_select = seq(p_conf+1, p), idx_fixed = seq_len(p_conf))
cv_select_prj <- cv_varsel(ref_mod, method = 'forward', cv_method = 'LOO', refit_prj = F, validate_search = F, search_terms = possible_mods) #refit_prj=F and validate_search=F for speed
This produces an error,
Error in cv_varsel.refmodel(refmodel, ...) :
Unexpected number of rows in `solution_terms_cv_chr`. Please notify the package maintainer.
It seems it has to do with the dimension of the solution_terms_mat which uses nterms_max to infer number of submodels to fit, but in this case, both t and x1 are included in the same step so nterms_max does not include information on the number of submodels in the search path. This issue is related #307 and it seems like fixing this would require some significant changes to the search_forward function. I would be happy to contribute to such changes if there is interest.
Best,
Sölvi
The text was updated successfully, but these errors were encountered:
…rms in case search_terms argument is NOT NULL. Importantly, it also changes a detail in the way new solution_terms are represented, by removing all previously selected variables from them, see change in select_possible_terms_size function.
Hi,
I am interested in doing variable selection on just a subset of variables, while including the rest in all submodels. This naturally leads to the use of the search_terms argument within cv_varsel. Assume I have a data set of 5 covariates, a treatment variable t and a response y. I let one of the 5 covariates be a confounder, so I am interested in doing variable selection of the remaining 4 covariates, while including t and the confounder x1 in each submodel. A reproducible example of the data generating process, reference model construction and cv_varsel call are below:
This produces an error,
It seems it has to do with the dimension of the solution_terms_mat which uses nterms_max to infer number of submodels to fit, but in this case, both t and x1 are included in the same step so nterms_max does not include information on the number of submodels in the search path. This issue is related #307 and it seems like fixing this would require some significant changes to the search_forward function. I would be happy to contribute to such changes if there is interest.
Best,
Sölvi
The text was updated successfully, but these errors were encountered: