Enhancement request: use xgboost as base learner #250

ivan-marroquin · 2021-04-28T16:20:51Z

Hi all,

I have Python 3.6.5 with xgboost 1.1.0 and ngboost 0.3.10

So, when I train a NGBRegressor with xgboost as base learner, I get the following warning message:

c:\temp\python\python3.6.5\lib\site-packages\xgboost\core.py:445: UserWarning: Use subset (sliced data) of np.ndarray is not recommended because it will generate extra copies and increase memory consumption
"memory consumption")

which may be the source of the poor result shown on the plot on the left in attached image.

Is it possible to use xgboost as a base learner? Please advise.

The code source is as follows:

import numpy as np
import xgboost as xgb
import ngboost
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston
from sklearn.metrics import median_absolute_error
from sklearn.model_selection import train_test_split
import multiprocessing

if name == 'main':
cpu_count= 2 if (multiprocessing.cpu_count() < 4) else (multiprocessing.cpu_count() - 2)

 x, y= load_boston(return_X_y= True)

y= y.astype(np.float32)

x= ((x - np.mean(x, axis= 0)) / np.std(x, axis= 0)).astype(np.float32)

x_train, x_validation, y_train, y_validation= train_test_split(x, y, test_size= 0.4, random_state= 1969)

Using xgboost with ngboost

learner= xgb.XGBRegressor(max_depth= 6, n_estimators= 300, verbosity= 1, objective= 'reg:squarederror',
booster= 'gbtree', tree_method= 'exact', n_jobs= cpu_count, learning_rate= 0.05, gamma= 0.15,
reg_alpha= 0.20, reg_lambda= 0.50, random_state= 1969)

ngb_1= ngboost.NGBRegressor(Dist= ngboost.distns.Normal, Score= ngboost.scores.CRPScore, Base= learner, 
                                                    natural_gradient= True, n_estimators= 1, learning_rate= 0.01, verbose= False, 
                                                    random_state= 1969)

ngb_1.fit(x_train, y_train, X_val= x_validation, Y_val= y_validation)

y_preds_1= ngb_1.predict(x_validation)

median_abs_error_1= median_absolute_error(y_validation, y_preds_1)

# Using only ngboost
learner= DecisionTreeRegressor(max_depth= 6, criterion= 'friedman_mse', min_impurity_decrease= 0, random_state= 1969)

ngb_2= ngboost.NGBRegressor(Dist= ngboost.distns.Normal, Score= ngboost.scores.CRPScore, natural_gradient= True, 
                                                    n_estimators= 300, learning_rate= 0.01, verbose= False, random_state= 1969)

ngb_2.fit(x_train, y_train, X_val= x_validation, Y_val= y_validation)

y_preds_2= ngb_2.predict(x_validation)

median_abs_error_2= median_absolute_error(y_validation, y_preds_2)

# Generate plot to compare results
fig, ax= plt.subplots(nrows= 1, ncols= 2)

ax[0].plot(range(0,len(y_validation)), y_validation, '-k')

ax[0].plot(range(0,len(y_validation)), y_preds_1, '--r')

ax[0].set_title("XGBOOST + NGBOOST: \n MedianAbsError {:.4f}".format(median_abs_error_1))

ax[1].plot(range(0,len(y_validation)), y_validation, '-k')

ax[1].plot(range(0,len(y_validation)), y_preds_2, '--r')

ax[1].set_title("NGBOOST \n MedianAbsError {:.4f}".format(median_abs_error_2))

comparison_xgboost-ngboost_against_only_ngboost.zip

The text was updated successfully, but these errors were encountered:

avati · 2021-04-28T16:30:51Z

You would want to do at least two changes to your code:

The base learner needs to be a Python constructor, so that each boosting stage gets its own model. Whereas in your case it is a pre-instantiated object which is getting repurposed/refit (i.e. modified) for every future boosting stage. In effect your whole boosted model is no more expressive than a single base learner.
Ideally you want your base learner xgboost to have n_estimators=1 and the NGBoost model to have n_estimators=300, and not the other way around.

this is an interesting experiment and would love to see how it works out! Thanks for giving it a shot and sharing the results!

ivan-marroquin · 2021-04-28T18:40:24Z

Hi @avati

Thanks for your prompt answer. I made the chance to the code, in which the xgboost n_estimators= 1 while NGBoost n_estimators = 300. Unfortunately, I still get the same result.

By any chance, do you have a Python code example on how to change the xgboost model to be more like a Python constructor?

Ivan

avati · 2021-04-28T18:47:14Z

Here's one way. Instead of:

learner = xgb.XGBRegressor(...)

do:

learner = lambda args: xgb.XGBRegressor(args)

ivan-marroquin · 2021-04-28T19:10:44Z

Hi @avati

Thanks for the suggestion. Before pursuing more work with xgboost, I tried the following code:

#_________________
from sklearn.ensemble import GradientBoostingRegressor

learner= GradientBoostingRegressor(loss= 'ls', learning_rate= 0.05, n_estimators= 1, criterion= 'mse', max_depth= 6, min_impurity_decrease= 0, random_state= 1969)

ngb= ngboost.NGBRegressor(Dist= ngboost.distns.Normal, Score= ngboost.scores.CRPScore, Base= learner,
natural_gradient= True, n_estimators= 300, learning_rate= 0.01, verbose= False,
random_state= 1969)
ngb.fit(x_train, y_train, X_val= x_validation, Y_val= y_validation)

y_preds= ngb.predict(x_validation)
#_________________

It gave a reasonable result which could be improved by playing with the hyperparameters.

This shows the strength of NGBoost to take learners from scikit-learn library.

On the other hand, the xgboost (although I am using its scikit-learn api) does not seem to work well with NGBoost - as you well explained. Could be possible that the xgboost's api library is missing something required by NGBoost?

Do you have more suggestions?

Ivan

avati · 2021-05-02T06:48:57Z

The same suggestion as my previous comment. Use learner with a 'lambda' as shown, whether it is for XGB or GBR.

ivan-marroquin · 2021-05-03T14:24:47Z

Hi @avati

Thanks for the suggestion, I tried the command with lambda, and get this message:

Cannot clone object '<function at 0x000001F05A98A840>' (type <class 'function'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods

I am pretty sure that I am missing something on how to implement this approach. Could you provide a more detailed code example?

Ivan

caiquanyou · 2021-07-21T01:35:41Z

I also want to use LightGBM as base learner and the same issue with @ivan-marroquin , Could you provide some advise?

ivan-marroquin · 2021-07-30T14:08:24Z

Hi @caiquanyou

I think that way on how to run xgboost with ngboost (and perhaps, it applies as well to lightgbm). I found this publication:
https://www.researchgate.net/publication/349528379_Reliable_Evapotranspiration_Predictions_with_a_Probabilistic_Machine_Learning_Framework

and the code source used in this publication can be found at:
https://codeocean.com/capsule/5244281/tree/v1

to make it work with xgboost, it is required to set number of estimators (along with the number of trees used in ngboost). I have xgboost 1.1.0 and ngboost 0.3.10.

I used the toy example used by ngboost (adapted to work with xgboost):

import numpy as np
import ngboost
import xgboost
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import multiprocessing
import matplotlib.pyplot as plt

if name == 'main':
cpu_count= 2 if (multiprocessing.cpu_count() < 4) else (multiprocessing.cpu_count() - 2)

x, y= load_boston(return_X_y= True)

mean_scaler= np.mean(x, axis= 0)

std_scaler= np.std(x, axis= 0)

x= (x - mean_scaler) / std_scaler

x_train, x_validation, y_train, y_validation= train_test_split(x, y, test_size= 0.4, random_state= 1969)

# using only ngboost
ngb_1= ngboost.NGBRegressor(Dist= ngboost.distns.Normal, Score= ngboost.scores.MLE,
                                                    natural_gradient= True, n_estimators= 300, learning_rate= 0.01, 
                                                    verbose= False, random_state= 1969)

ngb_1.fit(x_train, y_train)

y_preds_ngboost= ngb_1.predict(x_validation)

# using xgboost with ngboost
learner= xgboost.XGBRegressor(max_depth= 6, n_estimators= 300, verbosity= 1, objective= 
                                                    'reg:squarederror',  booster= 'gbtree', tree_method= 'exact', n_jobs= 
                                                    cpu_count, learning_rate= 0.05, gamma= 0.15, reg_alpha= 0.20, 
                                                    reg_lambda= 0.50, random_state= 1969)


ngb_2= ngboost.NGBRegressor(Dist= ngboost.distns.Normal, Score= ngboost.scores.MLE, Base= 
                                                    learner, natural_gradient= True, n_estimators= 300, learning_rate= 
                                                    0.01, verbose= False, random_state= 1969)

ngb_2.fit(x_train, y_train)

y_preds_hyboost= ngb_2.predict(x_validation)

fig, ax= plt.subplots(nrows= 1, ncols= 3, figsize= (10,5))    

ax[0].plot(range(0,len(x_validation)), y_validation, '-k', label= 'validation')    
ax[0].plot(range(0,len(x_validation)), y_preds_ngboost, '--r', label= 'ngboost')    
ax[0].set_title("NGBOOST: validation & prediction")
ax[0].legend()

ax[1].plot(range(0,len(x_validation)), y_validation, '-k', label= 'validation')    
ax[1].plot(range(0,len(x_validation)), y_preds_hyboost, '--r', label= 'hyboost')    
ax[1].set_title("HYBOOST: validation & prediction")
ax[1].legend()

ax[2].plot(range(0,len(x_validation)), y_preds_ngboost, '-k', label= 'ngboost')    
ax[2].plot(range(0,len(x_validation)), y_preds_hyboost, '--r', label= 'hyboost')    
ax[2].set_title("NGBOOST - HYBOOST: prediction")
ax[2].legend()

plt.show()

Note that xgboost will raise the following warning message:
Warning (from warnings module):
File "C:\Temp\Python\Python3.6.5\lib\site-packages\xgboost\core.py", line 445
"memory consumption")
UserWarning: Use subset (sliced data) of np.ndarray is not recommended because it will generate extra copies and increase memory consumption

I don't know whether this issue may influence the quality of the result. Let me know what do you find on your side,

Hope this helps,

Ivan

thomasaarholt · 2021-08-06T07:52:45Z

That warning shouldn't influence the predictions, but will increase the ram consumption of the computation. I'd be interested in hearing more experiences with using other packages as the Base learner.

CDonnerer · 2021-08-14T09:38:23Z

In case it's useful, I've written a "native" xgboost version of ngboost, implemented in the xgboost scitkit-learn API.

thomasaarholt · 2021-08-14T10:22:10Z

Exciting! Looking forward to checking it out!

alejandroschuler · 2021-08-14T21:52:41Z

In case it's useful, I've written a "native" xgboost version of ngboost, implemented in the xgboost scitkit-learn API.

This is fantastic @CDonnerer. If you're willing, I'd love to have features like these ported into the core NGBoost library. We've had previous discussions on how to make ngboost faster and easier to develop that you would be more than welcome to contribute to.

astrogilda · 2021-08-15T05:40:06Z

In case it's useful, I've written a "native" xgboost version of ngboost, implemented in the xgboost scitkit-learn API.

Really cool library! Related question: does xgboost-distribution offer a gpu implementation like xgboost, or nah? I'm assuming the relative performance numbers are for runs on the CPU, right?

CDonnerer · 2021-08-15T15:46:55Z

@alejandroschuler Thanks! Sure, I'll have a look at those discussions, there might be options to port those features across in a generic way.

@astrogilda No GPU support for xgboost-distribution yet, indeed, the performance numbers refer to CPU runs.

kmedved · 2021-08-16T22:39:04Z

@CDonnerer - just want to say that's a fantastic library you've written. I don't know how practical it would be to port the features over to NGboost as @alejandroschuler suggested, and the coding is way over my head. If that's at all possible, as a user, that would be a great solution (rather than having forked development across two different probabilistic libraries). This would be especially helpful for the purposes of adding additional distribution support in a consistent way.

StatMixedML · 2021-12-18T09:55:44Z

@CDonnerer seems like there is quite some overlap with XGBoostLSS, an approach I have developed in 2019

https://github.com/StatMixedML/XGBoostLSS

ivan-marroquin · 2021-12-19T17:13:23Z

@StatMixedML thanks for sharing the link of your approach!

tkzeng · 2022-06-11T22:33:53Z

@ivan-marroquin
I think this should work--looks like the learning rate has an effect even when there is just one tree, and the way this interacts with NGBoost's learning rates might cause unexpected behavior.

learner = xgb.XGBRegressor(max_depth=3, n_estimators=1, learning_rate=1)
ngb_1 = ngboost.NGBRegressor(Base=learner)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement request: use xgboost as base learner #250

Enhancement request: use xgboost as base learner #250

ivan-marroquin commented Apr 28, 2021

avati commented Apr 28, 2021

ivan-marroquin commented Apr 28, 2021

avati commented Apr 28, 2021

ivan-marroquin commented Apr 28, 2021

avati commented May 2, 2021

ivan-marroquin commented May 3, 2021

caiquanyou commented Jul 21, 2021

ivan-marroquin commented Jul 30, 2021

thomasaarholt commented Aug 6, 2021 •

edited

Loading

CDonnerer commented Aug 14, 2021

thomasaarholt commented Aug 14, 2021

alejandroschuler commented Aug 14, 2021

astrogilda commented Aug 15, 2021

CDonnerer commented Aug 15, 2021

kmedved commented Aug 16, 2021

StatMixedML commented Dec 18, 2021

ivan-marroquin commented Dec 19, 2021

tkzeng commented Jun 11, 2022

Enhancement request: use xgboost as base learner #250

Enhancement request: use xgboost as base learner #250

Comments

ivan-marroquin commented Apr 28, 2021

Using xgboost with ngboost

avati commented Apr 28, 2021

ivan-marroquin commented Apr 28, 2021

avati commented Apr 28, 2021

ivan-marroquin commented Apr 28, 2021

avati commented May 2, 2021

ivan-marroquin commented May 3, 2021

caiquanyou commented Jul 21, 2021

ivan-marroquin commented Jul 30, 2021

thomasaarholt commented Aug 6, 2021 • edited Loading

CDonnerer commented Aug 14, 2021

thomasaarholt commented Aug 14, 2021

alejandroschuler commented Aug 14, 2021

astrogilda commented Aug 15, 2021

CDonnerer commented Aug 15, 2021

kmedved commented Aug 16, 2021

StatMixedML commented Dec 18, 2021

ivan-marroquin commented Dec 19, 2021

tkzeng commented Jun 11, 2022

thomasaarholt commented Aug 6, 2021 •

edited

Loading