Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend predictors to zero-cost case #132

Open
jr2021 opened this issue Sep 13, 2022 · 4 comments
Open

Extend predictors to zero-cost case #132

jr2021 opened this issue Sep 13, 2022 · 4 comments
Assignees
Labels
zero cost merge Merge of zerocost with Develop into Develop_copy

Comments

@jr2021
Copy link
Collaborator

jr2021 commented Sep 13, 2022

In the zerocost branch, Ensemble has been extended to the Zero Cost case and contains a single option for its base predictor, XGBoost. The XGBoost predictor has been adapted to the zero-cost case by implementing a set_pre_computations function, as well as modifying the BaseTree class. Currently, the Ensemble class supports only this single predictor:

trainable_predictors = {
    "xgb": XGBoost(
        ss_type=self.ss_type, zc=self.zc, encoding_type="adjacency_one_hot", zc_only=self.zc_only
    )
}

Should all other predictors be available in the merged Ensemble class and extended to the zero-cost case?

@jr2021 jr2021 added the zero cost merge Merge of zerocost with Develop into Develop_copy label Sep 13, 2022
@jr2021 jr2021 changed the title Extend predictors to Zero Cost case Extend predictors to zero-cost case Sep 17, 2022
@Neonkraft
Copy link
Collaborator

Ideally, yes. But there are about 19 predictors in the original ensemble. Let's focus now on extending the ZC case to the tree-based predictors, which are LGBoost, NGBoost, and RandomForestPredictor.

The other predictors must be available, of course, but without the option of using a ZC predictor. This also means that the get_ensemble method must be modified to return the sets of predictors based on self.zc

@jr2021
Copy link
Collaborator Author

jr2021 commented Sep 21, 2022

Sounds good.

We found that in the zerocost branch, the XGBoost class contains three functions specific to the zero-cost case, set_pre_computations, _verify_zc_info, and _set_zc_names which are also applicable to the other tree-based predictors.

In order to not duplicate these functions, we placed them in the BaseTree class, which is the parent class of all tree-based predictors.

@jr2021
Copy link
Collaborator Author

jr2021 commented Sep 21, 2022

One small remaining issue is a discrepancy between the zerocost and Develop implementation of fit in XGBoost.

In the Develop branch, it is possible for the user to load in custom hyper-parameters from a config file

def fit(self, xtrain, ytrain, train_info=None, params=None, **kwargs):
        if self.hparams_from_file and self.hparams_from_file not in ['False', 'None'] \
        and os.path.exists(self.hparams_from_file):
            self.hyperparams = json.load(open(self.hparams_from_file, 'rb'))['xgb']
            print('loaded hyperparams from', self.hparams_from_file)
        elif self.hyperparams is None:
            self.hyperparams = self.default_hyperparams.copy()
        return super(XGBoost, self).fit(xtrain, ytrain, train_info, params, **kwargs)

while in the zerocost branch, this is not an option.

def fit(self, xtrain, ytrain, train_info=None, params=None, **kwargs):
        if self.hyperparams is None:
            self.hyperparams = self.default_hyperparams.copy()
        return super(XGBoost, self).fit(xtrain, ytrain, train_info, params, **kwargs)

Which functionality should be adopted in the Develop_copy branch? Is this a case where the code in the zerocost branch should be taken as the more updated version?

@Neonkraft
Copy link
Collaborator

Best to be able to read from config file, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zero cost merge Merge of zerocost with Develop into Develop_copy
Projects
None yet
Development

No branches or pull requests

3 participants