Skip to content

Adding plot_shapley_projection function to the plot_evaluation_metrics.py file. #632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Yh-Cherif
Copy link

The original issue for this Pull Request is this one.

This Pull Request adds the plot_shapley_projection function, which correspond to the first objective of the issue (1. Shapley Projection Plot).

Test plan.

To test this code, i tried using it on multiple datasets. The following example use this dataset.
The following code :

image

Produces the following result :

image

Description

Generate a 2-dimensional scatter plot of the shapley values of the data and model used in the explainer. This plot allows the user to have a visual interpretation of the impact of variables on predictions. The only library needed is the UMAP library.

Type of change

New feature (non-breaking change which adds functionality or feature that would cause existing functionality to not work as expected)

Test Configuration:

  • OS: Windows
  • Python version: 3.11.9
  • Shapash version: 2.7.9

@guillaume-vignal
Copy link
Collaborator

Thank you so much @Yh-Cherif for this great contribution 🙌
You've achieved a really nice and clean visualization — well done!

I’ve reviewed your work, and here are a few suggestions to help align it more closely with the rest of the library’s design and ensure maximum usability and flexibility:

  • Axes Titles and Values: Since the data is projected into a 2D space (e.g., via UMAP), the axes themselves don’t carry interpretable meaning. For clarity, it might be best to remove the axis titles and tick values, as they could be misleading.

  • Color Bar and Plot Titles: It would be great to make the color bar title (e.g., predictions, targets, errors) and the overall plot title configurable parameters of the function. That way, users can adapt the visualization to suit different use cases.

  • Function Parameters: In plot_evaluation_metrics.py, the functions should be kept as generic as possible. Rather than passing the entire explainer object, it would be better to pass only the specific data needed for the visualization — but with names that reflect their role in the plot. For example, instead of y_pred or contributions, use something like values_to_project (for the 2D projection) and color_values (for the color scale). This helps clarify their purpose, keeps the function flexible regardless of whether it’s classification or regression, and makes it easier to reuse in other contexts.

  • UMAP Dependency: Since UMAP is an external dependency, we’ll need to add it explicitly to the pyproject.toml file so that it's properly tracked and installed.

  • Function Naming: Could you please rename the function from plot_shapley_projection to plot_contributions_projection? This will make it clearer that it can be used with different types of contribution values, not just Shapley (e.g., LIME, etc.).


Next steps:

Once these changes are in place, the next step would be to integrate the function into shapash/explainer/smart_plotter.py, so that users can call it directly from the explainer like:

xpl.contributions_projection_plot()

Eventually, the goal would be to include this visualization in the web app, allowing users to interactively click on each point and inspect the local contributions. That would be incredibly helpful for exploring and understanding individual predictions 👏

Thanks again for your great work and all the time you're putting into this — we really appreciate it!

@Yh-Cherif
Copy link
Author

Yh-Cherif commented Apr 20, 2025

Hello @guillaume-vignal, thanks for your review.

I've taken account of every of your suggestion and i've updated changes to the files.

I've also realized simulation tests for classification and regression cases to ensure that the function is flexible :

  • For regression i used this dataset and passed it to the function.

image

Here is the output :

image

image

Here is the output :

image

I've added "title" and "colorbar_title" plot options to allow the user to customize a little more the output, as requested.

I hope these changes are what you expected. If not, please let me know and i'll change them.

Thanks again for your patience !

@guillaume-vignal
Copy link
Collaborator

Thanks again for the great work on this feature—it’s a really valuable addition to the library!

The natural next step in this evolution would be to integrate the function into shapash/explainer/smart_plotter.py, so it can be called directly from the explainer like:

xpl.contributions_projection_plot()

This method would simply act as a wrapper around the existing plot_contributions_projection function. It would significantly improve usability by allowing users to access the projection plot with minimal setup.

Embedding it into the explainer would also allow us to handle the logic internally—for example, adapting automatically to regression or classification cases, and selecting color_value based on predictions, targets, or prediction errors, depending on the context or a user-specified argument.

Of course, the current standalone function would still be available for advanced or customized usage:

plot_contributions_projection(
    values_to_project=xpl.contributions,
    color_value=xpl.y_pred,
    random_state=100
)

But exposing it directly through the explainer would really streamline the experience and make it much more user-friendly.

@Yh-Cherif
Copy link
Author

Hello Guillaume. I've added the plot method to the explainer object as you can see.

  • Regression case :

image
image

  • Classification case :

image
image

PS : Note that i've truncated the code picture since its the same as before.

@guillaume-vignal
Copy link
Collaborator

guillaume-vignal commented Apr 22, 2025

Thanks for the quick update—the integration is going in a great direction and having the method available directly on the explainer definitely improves usability.

That said, one part of the original suggestion is missing: the ability to choose how the points are colored via a color_value parameter, with options like "prediction", "target", or "error". This enable to make the plot more informative and adapts it to different analysis contexts (e.g., spotting outliers via prediction error).

Also, regarding the use of **kwargs: I don’t think it’s ideal here. Having clearly defined parameters makes the method much more user-friendly and easier to understand—especially for people less familiar with the internal implementation. With explicit arguments, users can quickly see what can be customized and benefit from autocompletion and documentation. **kwargs tends to hide those options and can make things harder to grasp.

@Yh-Cherif
Copy link
Author

Hello @guillaume-vignal, Thanks for your review again. I've included your suggestion into the explainer method and here is an example of the changes :

  • I've incorporated the predictions/targets/errors option for both classification and regression (options must be passed on the 'color_value' option) :
    image

Note that title and colorbar title are both filled automatically.

  • Made sure that each function key-word arguments are compatible with auto-completion :
    image

  • I've also corrected the 'example' part of the function description (that still displayed an example with the explainer).

@guillaume-vignal guillaume-vignal self-requested a review May 6, 2025 08:09
Copy link
Collaborator

@guillaume-vignal guillaume-vignal May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Yh-Cherif,

First of all, thank you for the great work so far — it's really appreciated!

Can you please put the method in shapash/explainer/smart_plotter.py, like others method, sorry I wasn't enough clear in my explaination. After we'll be able to call it this way:

xpl.plot.contributions_projection_plot()

Currently, the plot_contributions_projection method is called twice, but this can be streamlined into a single call. To achieve this, you can store the values to be projected in a variable, values_to_project, to improve efficiency and readability.

Additionally, it would be beneficial to extend this method to support two additionnal parameters, similar to those in the contribution_plot method.

selection (list, optional):
A list of indices, representing a subset of the input DataFrame to plot. If not provided, the entire dataset will be used.

label (integer or string, default = -1):
The label to select a specific subset of the DataFrame. If the label is given as a string, ensure it can be converted to an integer to select the corresponding dataframe object.

This enhancement will make the method more versatile, allowing users to choose specific data points and labels for more focused visualizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants