Varsel stat and stratified cross-validation for highly imbalanced data

Hi,

I am working on a model with highly imbalanced data which originates from few (as few as 30 observations) observations of species presences and a few thousand randomly selected background observations/pseudo-absences. I would like to use projpred to select the most relevant predictor variables from a pool of about 100 but am unsure whether any of the currently implemented varsel statistics deal well with severe class imbalance. The Brier (Skill) Score and Precision Recall AUC could be useful evaluation statistics, and were suggested by Aki Vehtari in issue #25 but, it seems, not implemented. 

Since I would secondly like to cross-validate my variable selection, I was wondering whether there it is possible to somehow stratify the cross-validation procedure to ensure comparable class imbalances across folds? In the worst case some folds will only have (pseudo-) absences and no presences and hence fail. 

Thanks so much for your help. 
Cheers

Andy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Varsel stat and stratified cross-validation for highly imbalanced data #328

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Varsel stat and stratified cross-validation for highly imbalanced data #328

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions