Description
Hi,
I am working on a model with highly imbalanced data which originates from few (as few as 30 observations) observations of species presences and a few thousand randomly selected background observations/pseudo-absences. I would like to use projpred to select the most relevant predictor variables from a pool of about 100 but am unsure whether any of the currently implemented varsel statistics deal well with severe class imbalance. The Brier (Skill) Score and Precision Recall AUC could be useful evaluation statistics, and were suggested by Aki Vehtari in issue #25 but, it seems, not implemented.
Since I would secondly like to cross-validate my variable selection, I was wondering whether there it is possible to somehow stratify the cross-validation procedure to ensure comparable class imbalances across folds? In the worst case some folds will only have (pseudo-) absences and no presences and hence fail.
Thanks so much for your help.
Cheers
Andy