This app demonstrates how to use the
SHAP
library to explain models employing the popular streamlit
framework for the application frontend. The application is
deployed and can be accessed at
https://shap-app.streamlit.app/.
SHAP (SHapley Additive exPlanations) is a unified measure of feature importance that originates from game theory. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.
In the context of machine learning, SHAP values provide a measure for the contribution of each feature to the prediction for individual samples. They can help to interpret the output of any machine learning model. Essentially, Shapley values answer the question: What is the relative contribution of each feature value to the prediction?
The package can be installed using pip
:
pip install shap-app
git clone [email protected]:RodrigoGonzalez/streamlit-shap-app.git
This project was created using poetry. To install poetry, run the following:
curl -sSL https://install.python-poetry.org | python -
On MacOS, you can also install poetry using Homebrew:
brew install poetry
Verify Installation: You can verify the installation by running:
poetry --version
To install the dependencies, run the following command:
make local
This generates a virtual environment and installs the
dependencies listed in the pyproject.toml
file.
I have also included a setup.py file for those who prefer to use pip. To install the dependencies, run the following command:
pip install -r pip/requirements.txt
To run simply type:
shap-app
To see all the options, type (currently limited to only running the application.)
shap-app --help
A Vital Component for Ensuring Transparency and Trustworthiness
In a world increasingly driven by automated decision-making, the capacity to comprehend and articulate the underlying mechanisms of machine learning models is paramount. This understanding, referred to as model interpretability, enables critical insight into the actions and justifications of algorithmic systems that profoundly impact human lives.
Interpretability plays a vital role, enhancing understanding and communication.
-
Model Debugging:
- Analytical Evaluation
- What instigated this model's error?
- What adjustments are necessary to enhance the model's performance?
-
Human-AI Collaboration:
- Mutual Understanding
- How can users interpret and place faith in the model's resolutions?
-
Regulatory Compliance:
- Legal Assurance
- Does the model adhere to statutory mandates and ethical guidelines?
The interpretability facet of the model training and deploying pipeline instrumental during the "diagnosis" phase of the model lifecycle workflow. AI Explainability elucidates the model's predictions through human-intelligible descriptions, offering multifaceted insights into model behavior:
-
Global Explanations: E.g., What variables shape the comprehensive conduct of a loan allocation model?
-
Local Explanations: E.g., What rationale led to the approval or denial of a specific customer's loan application?
Observation of model explanations for subgroups of data points is invaluable, particularly when assessing fairness in predictions for specific demographic classifications, for example.
The interpretability component leverages the SHAP (SHapley Additive exPlanations) package, a robust tool that facilitates the analytical understanding of model behavior, providing insights into feature importance and contributions to individual predictions
Utilize interpretability to:
-
Ascertain the reliability of AI system predictions by recognizing significant factors.
-
Strategize model debugging by first comprehending its functionality and discerning between legitimate relationships and misleading associations.
-
Detect potential biases by analyzing the basis of predictions on sensitive or highly correlated attributes.
-
Foster user confidence through local explanations that explain decision outcomes.
-
Execute regulatory audits to authenticate models and supervise the influence of model determinations on human interests.
The nuanced task of model interpretation extends beyond mere technical necessity; it fosters transparency, accountability, and trust in AI systems. Embracing interpretability ensures that decisions derived from artificial intelligence are not only proficient but principled, aligning with both legal obligations and ethical values.
In the dynamic field of machine learning, understanding and explaining model predictions is vital for understanding and being able to take actionable insights from model predictions. This project focuses on Shapley values, a concept from game theory, that can be used to interpret complex models.
The primary goal of this project is to provide an intuitive introduction to Shapley values as well as how to use the SHAP library. Shapley values provide a robust understanding of how each feature individually contributes to a prediction, making complex models easier to understand.
Streamlit is utilized to create an interactive interface for visualizing SHAP (SHapley Additive exPlanations) prediction explanations, making the technical concepts easier to comprehend.
The project also highlights the real-world utility of prediction explanations, demonstrating that it's not merely a theoretical concept but a valuable tool for informed decision-making. Additionally, SHAP's potential for providing a consistent feature importance measure across various models and versatility in handling diverse datasets is demonstrated.
-
Python: The project is implemented in Python, a popular language for data science due to its readability and vast ecosystem of scientific libraries. https://www.python.org/
-
SHAP: SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. https://shap.readthedocs.io/en/latest/index.html
-
Streamlit: Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In this project, Streamlit is used to create an interactive web application to visualize the SHAP values. https://streamlit.io/
-
Pandas: Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. https://pandas.pydata.org/
-
Numpy: Numpy is a library for the Python programming language, adding support for large, multidimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. https://numpy.org/
-
Scikit-learn: Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms. https://scikit-learn.org/stable/
-
Matplotlib: Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications. https://matplotlib.org/
The project follows a structured approach starting from data exploration, data cleaning, feature engineering, model building, and finally model explanation using SHAP values. The codebase is modular and follows good software engineering practices.
In writing this app, the motivation was to explore and use streamlit and the SHAP library. Streamlit for building web applications, and SHAP for understanding decision-making within models. The following section will outline key takeaways from working with these tools.
Streamlit is an excellent open source library for creating web applications that showcase machine learning and data science projects. It's easy to use, the documentation is excellent, and it integrates well with the open source libraries used in this project. However, it may not be the best choice for scalable or enterprise-level applications. Streamlit lacks some of the more advanced customizations available in other web development frameworks, but my biggest concerns for using outside smaller projects and prototyping are that state management can be challenging and performance will be an issue for very large datasets or highly complex applications. A problem I ran into was that testing Streamlit apps can be challenging, as it's not a typical Python library.
Overall, I think Streamlit is a great tool to have at your disposal, and the problem it solves, getting something up and running quickly, is what it excels at.
In this project, I used the SHAP (SHapley Additive exPlanations) library to interpret complex machine learning models.
The experience with SHAP in the project revealed a few
advantages. The interpretability it provided turned
previously black-box models into useful explanations,
making it easy to understand the relative contributions of
each feature. Its compatibility with various machine
learning models and good integration with streamlit
allowed for interactive visualizations. Moreover,
SHAP's ability to uncover the influence of each feature
through easy to generate plots, is especially useful for
explaining predictions to non-technical stakeholders.
However, the implementation was not without challenges. SHAP's computational intensity, especially with larger datasets and complex models, required careful optimization. While SHAP values were insightful, interpreting them can still be challenging, especially for non-technical audiences. The beautiful visualizations, although informative, can become overwhelming when dealing with a large number of features, but feature selection techniques and careful design can be utilized to keep the user experience interesting.
Using the SHAP package was overwhelmingly positive, with the pros far outweighing the cons. It's easy to see that this library can be used to bridge the gap between machine learning experts and other stakeholders. Any challenges using the package can be dealt with, with careful consideration and planning, and a thorough understanding of the dataset.
Many of the ideas implemented in this repository were first detailed in the following blog posts, papers, and tutorials:
- A Unified Approach to Interpreting Model Predictions
- Consistent Individualized Feature Attribution for Tree Ensembles
- Explainable AI for Trees: From Local Explanations to Global Understanding
- Fairness-aware Explainable AI: A Decision-Making Perspective
- Interpretable Machine Learning: Definitions, Methods, and Applications
- SHAP-Sp: A Data-efficient Algorithm for Model Interpretation
- On the Robustness of Interpretability Methods
- Towards Accurate Model Interpretability by Training Interpretable Models
- Understanding Black-box Predictions via Influence Functions
- From Local Explanations to Global Understanding with Explainable AI for Trees
- GitHub - slundberg/shap
- Interpretable Machine Learning with SHAP
- Understanding SHAP Values
- Kaggle - Machine Learning Explainability
- SHAP Values Explained Exactly How You Wished Someone Explained to You
- Interpreting complex models with SHAP values
- Shapley Values Wikipedia Page
- This plugin is currently only compatible with Python 3.10+
- Full documentation is not yet available
- Does not support user defined datasets and packages yet.
Issues and pull requests are welcome.
All code in this repository is released under the MIT License.