Skip to content

RealPolitiX/bfm-robust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

bfm-robust

Companion repo for robustness specification in biomedical foundation models (BFMs)

The preprints have been previously posted on SSRN and arXiv. If you use the resources in this repo, please cite the following published version for reference.

@article{xian_robustness_2025,
    title = {Robustness tests for biomedical foundation models should tailor to specifications},
    volume = {8},
    copyright = {2025 The Author(s)},
    issn = {2398-6352},
    url = {https://www.nature.com/articles/s41746-025-01926-2},
    doi = {10.1038/s41746-025-01926-2},
    language = {en},
    urldate = {2025-11-24},
    journal = {npj Digital Medicine},
    author = {Xian, R. Patrick and Baker, Noah R. and David, Tom and Cui, Qiming and Holmgren, A. Jay and Bauer, Stefan and Sushil, Madhumita and Abbasi-Asl, Reza},
    month = aug,
    year = {2025},
    note = {Publisher: Nature Publishing Group},
    keywords = {Biomedical engineering, Health policy, Machine learning},
    pages = {557},
}

Robustness tests in existing BFMs

We carreid out the search of BFMs from a few existing GitHub repositories, review papers, and directly on the internet. We selected a total of about 50 representative BFMs (mostly published in 2023-2024) in publication and preprints, covering a broad range of biomedical domains. We then extracted the relevant information on the model name, developers, modality, domain, capabilities, and any robustness tests that have been described for the each model. The information is gathered here. In the following, we break down the claimed robustness tests conducted for the BFMs. While about a third of the models don't have an explicit robustness test, a small number of models have been subject to multiple ones.

BFM Robusttest

  • Eval. (evaluation) on multiple datasets = evaluating robustness using existing benchmarks, training and testing are from the same data distribution.
  • Eval. on external site data = evaluating robustness using datasets from (hospital) sites not used in the development, so the evaluation set can have unknown distribution shift.
  • Eval. on shifted data = evaluating robustness using a constructed dataset with shifted distribution, usually along the dimension of one parameter (e.g. age, race, etc.).
  • Eval. on synthetic data
  • None = no specified robustness tests.

Robustness categorization and examples

A combination of theoretical and application-oriented resources are collected for robustness. The categorization of robustness follows that provided in the paper.

Surveys, perspectives & tutorials (general domains)

Robustness in the context of foundation models

Surveys, perspectives & tutorials (biomedical domains)

Group robustness

Instance-wise/Individual robustness

Interventional robustness

Aggregated robustness

Uncertainty awareness & Uncertainty-aware robustness

Longitudinal/Temporal robustness

Vendor/Acquisition-shift robustness

Knowledge robustness

Behavioral robustness

Pref-BFM adversarial robustness (language)

Pref-BFM adversarial robustness (vision)

Robustness evaluation & monitoring

About

Companion repo for robustness specification in biomedical foundation models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published