-
-
Notifications
You must be signed in to change notification settings - Fork 285
Open
Description
I've recently been looking into adding Random Forest to linfa. Since Ensemble Learning is on the roadmap anyway I think the best way to do this would be to add Bootstrap Aggregation for any classifier rather than specialising the implementation to Decision Trees. I'm not totally sure what the design of this should look like though, especially since there don't seem to be any fixed conventions for implementing classifiers in linfa.
Would general bootstrap aggregation be a useful addition? If so I'm interested in other's opinions on how this should interface with existing/future classifiers in linfa along with any other design considerations.
Activity
YuhanLiin commentedon Feb 18, 2022
In
impl_dataset.rs
we already have bootstrap aggregation code that produces sub-samples from a dataset. We just need a generalized way of fitting classifiers over the subsamples. We have the traitlinfa::traits::Fit
that represents the training of a model using a set of hyperparameters, and we havelinfa::traits::PredictInplace
representing prediction using a trained model. You can define a new ensemble classifier that's generic over these traits, similar to howcross_validate
is defined. ItsFit
impl fits its "inner" classifier/regressor multiple times over the subsamples, and itsPredict
impl averages/votes on predictions made across its inner models.EricTulowetzke commentedon Aug 3, 2022
Here is a WIP PR for RF
#43
YuhanLiin commentedon Aug 7, 2022
The work for that PR for ensemble learning ended up in #66 which didn't pan out for some reason. The current work is in #229.