Implement feature_importances_ in sksurv.ensemble.RandomSurvivalForest #140

mtomaszewski95 · 2020-09-30T15:12:30Z

Implement feature_importances_ in sksurv.ensemble.RandomSurvivalForest.
Examples:
https://cran.r-project.org/web/packages/randomForestSRC/randomForestSRC.pdf
https://square.github.io/pysurvival/models/random_survival_forest.html
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364686/

sebp · 2020-10-02T20:23:05Z

Feature importances based on node/split statistics are rather flawed (see e.g. this paper). Therefore, I'm hesitant to implement this feature. In particular, you can already compute permutation-based feature importance via ELI5. It is more expensive to compute, but has better properties.

funnell · 2023-06-14T16:02:56Z

My vote would be for adding the feature, at the very least for compatibility with scikit-learn.

sebp · 2023-06-15T21:13:38Z

sklearn has https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html#sklearn.inspection.permutation_importance now, which is the much better option.

funnell · 2023-06-16T15:09:40Z

Yes, thanks! I understand your point of view, and that there are alternative ways to compute importance.
Still, even if it's not an ideal algorithm, it can still be nice to have. Some things presume feature_importances_ is available (e.g. RFECV) and not having it might add a little friction for new scikit-survival users already familiar with scikit-learn. It's also a lot faster which can be helpful during early iteration.

Thanks for the package and thanks for considering! :)

anwurl · 2024-01-17T09:02:33Z

I also have a use-case where I am only interested in which feature are used or not used. For that, the feature importances based on node/split statistics could do the job and would be quick to calculate. In contrast, the calculation of permutation feature importances takes so much longer.

Thanks a lot for this package and your work.

sebp · 2024-01-17T16:53:36Z

Feature importances based on split criteria have been requested in the past. Unfortunately, the way sklearn implemented feature importance in the tree-growing algorithm doesn't work with the log-rank criteria used to grow the survival tree. The log-rank criteria measures the quality of the split, but sklearn assumes feature importance measure the purity of a node.

sebp added the enhancement label Oct 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement feature_importances_ in sksurv.ensemble.RandomSurvivalForest #140

Implement feature_importances_ in sksurv.ensemble.RandomSurvivalForest #140

mtomaszewski95 commented Sep 30, 2020

sebp commented Oct 2, 2020

funnell commented Jun 14, 2023

sebp commented Jun 15, 2023

funnell commented Jun 16, 2023

anwurl commented Jan 17, 2024

sebp commented Jan 17, 2024 via email

Implement feature_importances_ in sksurv.ensemble.RandomSurvivalForest #140

Implement feature_importances_ in sksurv.ensemble.RandomSurvivalForest #140

Comments

mtomaszewski95 commented Sep 30, 2020

sebp commented Oct 2, 2020

funnell commented Jun 14, 2023

sebp commented Jun 15, 2023

funnell commented Jun 16, 2023

anwurl commented Jan 17, 2024

sebp commented Jan 17, 2024 via email