The prediction of XGBoostClassifier doesn't match the output of ShapIQ #353

Linh-nk · 2025-03-24T21:24:37Z

The log odd of f(x) in the waterfall plot does not match the prediction I got from XGBoostClassifier model predict_proba method.

Also, should the baseline value be close to the log odd of the average of y? The log odd of E[f(X)] I got from the waterfall plot does not match the average of ground truth. Please correct me if I'm wrong.

mmschlk · 2025-03-25T08:53:54Z

I don't see an image/waterfall plot. So I can only speculate. But yes, you are right, the SV do not match with what comes out of XGBoost when you run predict_proba. Similar to SHAP, we explain the margin of the prediction and not the predicted probabilities (see this test case test for a reference):

# explain with shapiq
explainer_shapiq = TreeExplainer(
   model=xgb_clf_model, max_order=1, index="SV", class_index=class_label
)
sv = explainer_shapiq.explain(x=x_explain_shapiq)
sum_of_sv = sv

# get the margin prediction of the model
prediction = xgb_clf_model.predict(x_explain.reshape(1, -1), output_margin=True)
prediction = prediction[0, class_label]

print(prediction == sum_of_sv)
True

There is a paper discussing that the Margins are not a very nice object to explain models with. But up to now neither shap nor shapiq supports another means of explanation here.

Does this answer your question?

Linh-nk · 2025-03-25T14:14:32Z

Thank you for the prompt response! However, I'm still not getting the same output from shapiq and XGBClassifier. Here is the code that I used

import shapiq

explainer_shapiq = shapiq.TreeExplainer(
   model=model, max_order=1, index="SV"
)
sv = explainer_shapiq.explain(x=X.iloc[0].to_numpy())
sum_of_sv = sv.get_n_order_values(1).sum()
base_value = sv.baseline_value

prediction = model.predict(X.iloc[0].to_numpy().reshape(1, -1), output_margin=True)

print(f'prediction: {prediction}')
print(f'shapiq prediction: {base_value + sum_of_sv}')

where model is an XGBClassifier model. And here is the output I got

prediction: [0.91437876]
shapiq prediction: 2.201628711526583

Linh-nk · 2025-03-25T14:54:08Z

When I use the shap package, I could get the accurate prediction

import shap

explainer_shap = shap.TreeExplainer(
   model=model, #max_order=1, index="SV"
)
sv = explainer_shap(X=X.iloc[0].to_numpy().reshape(1, -1))

prediction = model.predict(X.iloc[0].to_numpy().reshape(1, -1), output_margin=True)

print(f'prediction: {prediction}')
print(f'shapiq prediction: {sv.values.sum() + sv.base_values}')

prediction: [0.91437876]
shapiq prediction: [0.9143772]

At the first glance, the SHAP values output by shap and shapiq seem to match, but their baseline_value do not

mmschlk · 2025-03-25T15:34:35Z

Ah okay, just to make sure: First, are you certain you are explaining the correct class in the shapiq case?
Omitting the class_index in TreeExplainer might default to some other class than what you are getting predictions for. Can you check your case for all class indices?

explainer_shapiq = shapiq.TreeExplainer(
   model=model, max_order=1, index="SV", class_index=class_index
)

Second, what happens if you do not run:

sum_of_sv = sv.get_n_order_values(1).sum()
base_value = sv.baseline_value

but

sum_of_sv = sum(sv.values)

since with shapiq v1.2.3, we made sure that min_order is set to 0 for TreeExplainer which should include the baseline value inside the sv.values array.

Third is it possible to create a minimal reproducible case where shap returns the prediction and shapiq does not? Because here it is the same.

Best
Max

Linh-nk · 2025-03-25T18:14:57Z

Hi Max,

I'm running the same example you have but for binary case

from sklearn.datasets import make_classification, make_regression


import numpy as np
import xgboost
import copy

def background_clf_dataset() -> tuple[np.ndarray, np.ndarray]:
    """Return a simple background dataset."""
    X, y = make_classification(
        n_samples=100,
        n_features=10,
        random_state=42,
        n_classes=2,  #binary here
        n_informative=5,
        n_repeated=0,
        n_redundant=0,
    )
    return copy.deepcopy(X), copy.deepcopy(y)

def xgb_clf_model(background_clf_dataset):
    """Return a simple xgboost classification model."""

    X, y = background_clf_dataset
    model = xgboost.XGBClassifier(random_state=42, n_estimators=3)
    model.fit(X, y)
    return model

background_clf_dataset = background_clf_dataset()
xgb_clf_model = xgb_clf_model(background_clf_dataset)
background_clf_data, y = background_clf_dataset

explanation_instance = 1
class_label = 1

# the following code is used to get the shap values from the SHAP implementation
import shap
model_copy = copy.deepcopy(xgb_clf_model)
explainer_shap = shap.TreeExplainer(model=model_copy)
baseline_shap = float(explainer_shap.expected_value)

x_explain_shap = copy.deepcopy(background_clf_data[explanation_instance].reshape(1, -1))
sv_shap_all_classes = explainer_shap.shap_values(x_explain_shap)
sv_shap = sv_shap_all_classes#[0][:, class_label]

# compute with shapiq
import shapiq
explainer_shapiq = shapiq.TreeExplainer(
    model=xgb_clf_model, max_order=1, index="SV", class_index=class_label
)
x_explain_shapiq = copy.deepcopy(background_clf_data[explanation_instance])
sv_shapiq = explainer_shapiq.explain(x=x_explain_shapiq)
sv_shapiq_values = sv_shapiq.get_n_order_values(1)
baseline_shapiq = sv_shapiq.baseline_value

prediction = xgb_clf_model.predict(x_explain_shapiq.reshape(1, -1), output_margin=True)

And here is the output that I got

baseline_shap: 0.0
SHAP values: [[ 0.21501832  0.01152     0.10970242  0.12228977  0.09716809  0.02027433
   0.8166808   0.         -0.01836885 -0.22206578]]
baseline_shapiq: 0.4894029824013125
SHAPIQ values: [ 0.21501832  0.01152     0.10970243  0.12228977  0.09716809  0.02027433
  0.81668073  0.         -0.01836885 -0.22206577]
baseline_shap + SHAP values: 1.1522190570831299
baseline_shapiq + SHAPIQ values: 1.6416220384585056
prediction: [1.1416221]

mmschlk · 2025-03-26T07:54:59Z

Ah okay, I will take a look at this more closely! However, the Shapley values are the same for shap and shapiq, which is at least the most important thing. It might be that the baseline_value is not properly extracted from the xgboost model. I just know that this was actually not that easy. Thank you for pointing this out!

Could you let me know what's your xgboost and shapiq version?

Linh-nk · 2025-03-26T13:42:24Z

It's xgboost 2.1.4 and shapiq 1.2.3. Thank you!

mmschlk added the question ❔ Further information is requested label Mar 25, 2025

mmschlk self-assigned this Mar 25, 2025

mmschlk added the explainer 🔍 All issues that are linked to explainers label Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The prediction of XGBoostClassifier doesn't match the output of ShapIQ #353

The prediction of XGBoostClassifier doesn't match the output of ShapIQ #353

Linh-nk commented Mar 24, 2025

mmschlk commented Mar 25, 2025

Linh-nk commented Mar 25, 2025 •

edited

Loading

Linh-nk commented Mar 25, 2025 •

edited

Loading

mmschlk commented Mar 25, 2025

Linh-nk commented Mar 25, 2025 •

edited

Loading

mmschlk commented Mar 26, 2025

Linh-nk commented Mar 26, 2025

The prediction of XGBoostClassifier doesn't match the output of ShapIQ #353

The prediction of XGBoostClassifier doesn't match the output of ShapIQ #353

Comments

Linh-nk commented Mar 24, 2025

mmschlk commented Mar 25, 2025

Linh-nk commented Mar 25, 2025 • edited Loading

Linh-nk commented Mar 25, 2025 • edited Loading

mmschlk commented Mar 25, 2025

Linh-nk commented Mar 25, 2025 • edited Loading

mmschlk commented Mar 26, 2025

Linh-nk commented Mar 26, 2025

Linh-nk commented Mar 25, 2025 •

edited

Loading

Linh-nk commented Mar 25, 2025 •

edited

Loading

Linh-nk commented Mar 25, 2025 •

edited

Loading