-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TreeSHAP, libxgboost, and implications for predict function #169
Comments
I don't see any additional options that we can pass to To be clear, the parameters we already have in that I'm also not seeing any reference anywhere to |
I just found the following at XGBoost C Package
Looking over I think the parameters I saw reflect how they are named in the Python package, but according to the location I referenced they are implemented through the 'type' parameter which is not true/false but rather 0-6. I apologize if I have this incorrect. |
Here is the proper link: XGBoosterPredict |
Ah, I was looking at the wrong one, indeed we are using I assume you are interested in additional values for I'll probably get to this eventually. Of course, a PR would be welcome. |
Btw, a really quick and minimal effort way of getting this working which I would be happy to merge is if we just added a |
Perhaps adding the 'type' keyword is the best approach. It seems most flexible particularly if more options are added in the future. I am willing to give a try at a PR (it would be my first ever), but it would need to be heavily edited. My programming skills are no where near yours. I am most concerned on how best to handle all the return shape configurations this creates. In R, there is a way to specify a list of parameter options. I see julia packages that do this (Plots.jl comes to mind) but I don't know how to code this. |
am attempting a version of predict that allows for differing Here are 3 lines in current routine that I think I understand but not certain.
In seems that the 'reverse' function will effect a reshape when the unsafe_wrap converts the c array to a Julia Array. The last line affects a transpose if dims are >1. I understand this is in 2 dimensions (and it completes the conversion from row major to column major). I am not familiar how transpose works and what would happen if applied to a 3 dimensional array as might come from type=4 i.e., interaction (or type=2 i.e., contribution, in a multi: model). Any thoughts would be greatly appreciated. |
These lines are merely for adapting libxgboost's internal memory format (in which it returns) to the memory format of Julia arrays (in particular, the former is row-major and the latter is column-major). If the other |
I must not be conveying the issue correctly. Here is my understanding and working with my data bears out that understanding. libxgboost returns 3 dimensional arrays for type 4 and 5 ALWAYS and for type 2 and 3 when the objective is multi:softprob/multi:softmax. The current format (i.e. , Rather than modify a function to create situations that would fail, I think it better to leave the current I will change the function name. Since I am proposing a new function there is no need for backward compatibility and keeping margin is redundant. It will take me a bit to figure out how to roll back my fork so the current |
I'm a bit confused... why not just check if I'm not necessarily opposed to adding a new, lower-level function, that might have some advantages. However, the only think I can think of stopping us from just returning whatever is the appropriate array here is type stability and, again, that's already pretty compromised so I'm not sure it makes sense to try to keep it narrowed down. |
Am looking to 'modernize' my approach and switch from partial dependence plots to Shapely plots. Shapley values are computationally demanding and would like to take advantage of the TreeSHAP algorithm that is built in to libxgboost. This feature is accessible via the predict function by using the keyword parameter 'preds_contribs' ; libxgboost predict options.
Although XGBoost.predict accepts keyword parameters, there is a limited set that is passed to libxgboost.
As a short term solution, I can write a personalized version to allow additional keyword parameters. I also realize that the current approach reduces risk of breaking older code.
There are three parameters (pred_contribs, pred_interactions, and pred_leaf) that could be handy to have available. Adding these parameters adds complexity related to the shape of data returned. Perhaps there is a role for a separate function i.e., 'predict_shapley' that specifically handles these additional parameters -- this would be least likely to break any pre-written code. As a new function it would be less hassle implementing 'strict_shape=true' and users can code with it in mind. Currently multi:softmax and multi:softprob add an additional dimension and need separate coding - 'strict-shape' adds a dimension called 'group' so that all objectives return the same number of dimensions. The TreeSHAP algorithms return additional dimension(s) and as we found with mult: models, those arrays are row major (C standard) where Julia is column major so it gets complicated reshaping 3(or 4) dimensional arrays.
Thank you for consideration.
The text was updated successfully, but these errors were encountered: